Extraction

Get Extraction Job

Poll one async extraction job and retrieve the result when it is complete.

GET /v1/extract/{job_id} returns the current status of an async extraction job. Use it after starting an extraction with processing_mode: "async".

Request

Replace the example job ID with the job_id returned by POST /v1/extract.

export SCRAPER_API_BASE_URL="https://scraper.geonode.io"
export GEONODE_SCRAPER_API_KEY="YOUR_API_KEY"

curl -X GET "$SCRAPER_API_BASE_URL/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48" \
  -H "X-Api-Key: $GEONODE_SCRAPER_API_KEY"

Response While Running

A running job can return queued or processing. At this point, data, metadata, and tokens_charged are usually null because extraction has not finished yet.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "processing",
  "created_at": "2026-05-26T10:30:00Z",
  "completed_at": null,
  "data": null,
  "metadata": null,
  "error": null,
  "tokens_charged": null
}

Response After Completion

A completed job includes data, metadata, and tokens_charged.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "completed",
  "created_at": "2026-05-26T10:30:00Z",
  "completed_at": "2026-05-26T10:30:04Z",
  "data": {
    "markdown": "...",
    "html": null,
    "links": null
  },
  "metadata": {
    "url": "https://docs.python.org/3/library/json.html",
    "render_js": false,
    "http_status": 200,
    "duration_ms": 631,
    "retry_count": 0,
    "formats": ["markdown"],
    "processing_mode": "async"
  },
  "error": null,
  "tokens_charged": 1
}

Job statuses are queued, processing, completed, failed, and cancelled.

Starting an Async Job

To create an async extraction job, send processing_mode: "async" to POST /v1/extract.

curl -X POST "$SCRAPER_API_BASE_URL/v1/extract" \
  -H "X-Api-Key: $GEONODE_SCRAPER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.python.org/3/library/json.html",
    "formats": ["markdown"],
    "render_js": false,
    "processing_mode": "async"
  }'

The API returns 202 with a job ID.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "queued",
  "status_url": "/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "estimated_tokens": 1
}

The status_url can be relative. Prefix it with https://scraper.geonode.io when you call it directly.

On this page