Crawl

Start a Crawl Job

Coming soon

This endpoint is documented but not yet available in production. The contract below reflects the planned behavior. Reach out via support for early access or launch notification.

Crawl a website starting from a seed URL up to a given depth and page limit

POST
/v1/crawl
X-Api-Key<token>

In: header

urlstring

Seed URL to start crawling from

Formaturi
Length1 <= length
depth?integer

Maximum BFS depth from the seed URL (1 = seed only)

Default2
Range1 <= value <= 10
limit?integer

Maximum number of pages to crawl

Default50
Range1 <= value <= 10000
formats?Formats

Output formats to extract per page

Default["markdown"]
render_js?boolean

Use a headless browser to render each page (slower, handles JS-heavy sites)

Defaultfalse
same_domain_only?boolean

Only follow links that stay on the same domain as the seed URL

Defaulttrue
include_subdomains?boolean

When same_domain_only is true, also include subdomains of the seed domain

Defaultfalse
proxy?|null
wait_config?|null

Response Body

application/json

curl -X POST "https://scraper.geonode.io/v1/crawl" \  -H "Content-Type: application/json" \  -d '{    "url": "http://example.com"  }'
{
  "job_id": "453bd7d7-5355-4d6d-a38e-d9e7eb218c3f",
  "url": "string",
  "status": "queued",
  "status_url": "string",
  "estimated_pages": 0
}
Empty
Empty
Empty
Empty
Empty