Scraper apiGeneratedCrawl

Start a crawl job

Crawl a website starting from a seed URL up to a given depth and page limit

POST
/v1/crawl
X-Api-Key<token>

In: header

urlstring

Seed URL to start crawling from

Formaturi
Length1 <= length
depth?integer

Maximum BFS depth from the seed URL (1 = seed only)

Default2
Range1 <= value <= 10
limit?integer

Maximum number of pages to crawl

Default50
Range1 <= value <= 10000
formats?Formats

Output formats to extract per page

Default["markdown"]
render_js?boolean

Use a headless browser to render each page (slower, handles JS-heavy sites)

Defaultfalse
same_domain_only?boolean

Only follow links that stay on the same domain as the seed URL

Defaulttrue
include_subdomains?boolean

When same_domain_only is true, also include subdomains of the seed domain

Defaultfalse
proxy?|null
wait_config?|null

Response Body

application/json

curl -X POST "https://scraper.geonode.io/v1/crawl" \  -H "Content-Type: application/json" \  -d '{    "url": "http://example.com"  }'
{
  "job_id": "453bd7d7-5355-4d6d-a38e-d9e7eb218c3f",
  "url": "string",
  "status": "queued",
  "status_url": "string",
  "estimated_pages": 0
}
Empty
Empty
Empty
Empty
Empty