Get Crawl Job Status
Poll a crawl job and retrieve paginated page results.
GET /v1/crawl/{job_id} returns progress counters and a paginated slice of page results. Use it after creating a crawl with Start a Crawl Job.
Poll this endpoint until status becomes completed, failed, or cancelled.
Request
Replace the example job ID with the job_id returned by POST /v1/crawl.
export SCRAPER_API_BASE_URL="https://scraper.geonode.io"
export GEONODE_SCRAPER_API_KEY="YOUR_API_KEY"
curl "$SCRAPER_API_BASE_URL/v1/crawl/9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019?page=1&page_size=10" \
-H "X-Api-Key: $GEONODE_SCRAPER_API_KEY"Query Parameters
Crawl results are paginated because one crawl can discover many pages.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
page | integer | No | 1 | Page of crawl results to return. |
page_size | integer | No | 10 | Results per page. Must be between 1 and 50. |
Response
The response includes crawl progress, usage accounting fields, and the current result slice selected by page and page_size.
{
"job_id": "9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019",
"url": "https://quotes.toscrape.com/",
"status": "processing",
"total_pages": 25,
"completed_pages": 8,
"failed_pages": 0,
"cancelled_pages": 0,
"created_at": "2026-05-26T10:30:00Z",
"completed_at": null,
"token_summary": {
"tokens_charged_total": 8,
"tokens_reserved": 17
},
"results": [
{
"url": "https://quotes.toscrape.com/",
"parent_url": null,
"depth": 1,
"status": "completed",
"error_code": null,
"error_message": null,
"links": [
"https://quotes.toscrape.com/page/2/"
],
"data": {
"markdown": "...",
"html": null
},
"metadata": {
"http_status": 200,
"duration_ms": 631,
"retry_count": 0,
"tokens_charged": 1
}
}
]
}total_pages is the number of crawl pages discovered so far, not the number of result pages in API pagination. Each result item includes the crawled URL, parent URL when available, crawl depth, status, discovered links, and extracted content when the page completed. Result items can also include metadata with HTTP status, duration, retry count, and tokens charged for that page. Failed pages include error_code and error_message instead of data.