Extraction

Checking Job Status

When you run an extraction in asynchronous mode, the API immediately returns a job ID instead of waiting for the extraction to finish.

You can use that job ID to check the current status of the extraction and retrieve the result once processing is complete.

How It Works

The asynchronous extraction workflow follows these steps:

  1. Start an extraction using processing_mode: "async".
  2. Receive a job_id.
  3. Poll the job status endpoint.
  4. Retrieve the extracted content when the job is completed.

Get an Extraction Job

Use the following endpoint to retrieve the current status of an extraction job.

GET /v1/extract/{job_id}

Replace {job_id} with the value returned by your extraction request.

Example Request

curl -X GET "https://scraper.geonode.io/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48" \
  -H "X-Api-Key: YOUR_API_KEY"

Response While Processing

A job may still be running when you check its status.

During this stage, the extracted content is not yet available.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "processing",
  "created_at": "2026-05-26T10:30:00Z",
  "completed_at": null,
  "data": null,
  "metadata": null,
  "error": null,
  "tokens_charged": null
}

What This Means

FieldDescription
statusCurrent state of the extraction job
created_atTime the job was created
completed_atnull until processing finishes
dataExtracted content, available after completion
metadataExtraction details, available after completion
errorError information if the job fails
tokens_chargedToken usage after processing completes

Response After Completion

Once the extraction finishes successfully, the response includes the extracted content and metadata.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "completed",
  "created_at": "2026-05-26T10:30:00Z",
  "completed_at": "2026-05-26T10:30:04Z",
  "data": {
    "markdown": "# Example Page Content"
  },
  "metadata": {
    "url": "https://docs.python.org/3/library/json.html",
    "render_js": false,
    "http_status": 200,
    "duration_ms": 631,
    "formats": ["markdown"],
    "processing_mode": "async"
  },
  "error": null,
  "tokens_charged": 1
}

Job Status Values

The API can return the following job statuses.

StatusDescription
queuedThe job has been accepted and is waiting to start
processingThe extraction is currently running
completedThe extraction finished successfully
failedThe extraction could not be completed
cancelledThe job was cancelled before completion

Starting an Async Extraction

To use this endpoint, first create an asynchronous extraction job.

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.python.org/3/library/json.html",
    "formats": ["markdown"],
    "processing_mode": "async"
  }'

Async Job Response

The extraction endpoint returns a job ID that can be used for polling.

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "queued",
  "status_url": "/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "estimated_tokens": 1
}

Polling for Results

A common pattern is to periodically check the job status until the extraction completes.

POST /v1/extract

Receive job_id

GET /v1/extract/{job_id}

status = processing

GET /v1/extract/{job_id}

status = completed

Read extracted content

Next Step

Now that you can monitor individual extraction jobs, the next guide explains how to view and filter multiple extraction jobs using the Jobs endpoint.

On this page