Crawl
Start a Crawl Job
Coming soon
This endpoint is documented but not yet available in production. The contract below reflects the planned behavior. Reach out via support for early access or launch notification.
Crawl a website starting from a seed URL up to a given depth and page limit
X-Api-Key<token>
In: header
urlstring
Seed URL to start crawling from
Format
uriLength
1 <= lengthdepth?integer
Maximum BFS depth from the seed URL (1 = seed only)
Default
2Range
1 <= value <= 10limit?integer
Maximum number of pages to crawl
Default
50Range
1 <= value <= 10000formats?Formats
Output formats to extract per page
Default
["markdown"]render_js?boolean
Use a headless browser to render each page (slower, handles JS-heavy sites)
Default
falsesame_domain_only?boolean
Only follow links that stay on the same domain as the seed URL
Default
trueinclude_subdomains?boolean
When same_domain_only is true, also include subdomains of the seed domain
Default
falseproxy?|null
wait_config?|null
Response Body
application/json
curl -X POST "https://scraper.geonode.io/v1/crawl" \ -H "Content-Type: application/json" \ -d '{ "url": "http://example.com" }'{
"job_id": "453bd7d7-5355-4d6d-a38e-d9e7eb218c3f",
"url": "string",
"status": "queued",
"status_url": "string",
"estimated_pages": 0
}Empty
Empty
Empty
Empty
Empty