Crawl

Get Crawl Job Status

Coming soon

This endpoint is documented but not yet available in production. The contract below reflects the planned behavior. Reach out via support for early access or launch notification.

Poll for the current status and results of a crawl job

GET
/v1/crawl/{job_id}
X-Api-Key<token>

In: header

Path Parameters

job_idstring
Formatuuid

Query Parameters

page?integer

Page of results

Default1
Range1 <= value
page_size?integer

Results per page (max 50)

Default10
Range1 <= value <= 50

Response Body

application/json

application/json

curl -X GET "https://scraper.geonode.io/v1/crawl/497f6eca-6276-4993-bfeb-53cbbbba6f08"
{
  "job_id": "453bd7d7-5355-4d6d-a38e-d9e7eb218c3f",
  "url": "string",
  "status": "queued",
  "crawl_config": {
    "render_js": true,
    "formats": [
      "markdown"
    ],
    "same_domain_only": true,
    "include_subdomains": true,
    "proxy": {
      "country": "string",
      "type": "datacenter"
    },
    "wait_config": {
      "wait_until": "commit",
      "wait_for": "string",
      "wait_timeout": 30000
    }
  },
  "token_summary": {
    "tokens_charged_total": 0,
    "tokens_reserved": 0
  },
  "total_pages": 0,
  "completed_pages": 0,
  "failed_pages": 0,
  "cancelled_pages": 0,
  "created_at": "2019-08-24T14:15:22Z",
  "completed_at": "2019-08-24T14:15:22Z",
  "results": [
    {
      "url": "string",
      "parent_url": "string",
      "depth": 0,
      "status": "string",
      "error_code": "string",
      "error_message": "string",
      "links": [
        "string"
      ],
      "data": {
        "markdown": "string",
        "html": "string"
      },
      "metadata": {
        "http_status": 0,
        "duration_ms": 0,
        "tokens_charged": 0
      }
    }
  ]
}
Empty
Empty
{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}