Start a Batch Job

POST /v1/batch starts an asynchronous extraction job for a list of URLs. Use it when you already know the pages you want to extract and want one job ID to track the whole run.

A batch job accepts up to 1,000 URLs. The API applies the same output format, rendering, proxy, and header settings to every accepted URL in the batch.

Request

The example below is a shell command. It stores the production base URL and API key in environment variables, then sends two URLs as one batch.

export SCRAPER_API_BASE_URL="https://scraper.geonode.io"
export GEONODE_SCRAPER_API_KEY="YOUR_API_KEY"

curl -X POST "$SCRAPER_API_BASE_URL/v1/batch" \
  -H "X-Api-Key: $GEONODE_SCRAPER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://quotes.toscrape.com/",
      "https://quotes.toscrape.com/page/2/"
    ],
    "formats": ["markdown"],
    "render_js": false,
    "ignore_invalid_urls": true,
    "proxy": {
      "country": "US",
      "type": "residential"
    }
  }'

If the batch is accepted, the API returns 202 with a job_id. Use that ID with Get Batch Job Status.

The request body controls which URLs enter the batch and how each URL should be extracted. Start with urls and formats, then add proxy, header, or rendering options only when the whole batch needs them.

Field	Type	Required	Default	Description
`urls`	array	Yes	None	URLs to extract. Must contain 1 to 1,000 items.
`ignore_invalid_urls`	boolean	No	`true`	Skips invalid URLs and returns them in `invalid_urls` instead of failing the whole request.
`formats`	array	No	`["markdown"]`	Output formats to return for each URL. Supported values are `markdown` and `html`.
`render_js`	boolean	No	`false`	Uses a headless browser for every batch item. Useful for JavaScript-heavy pages, but slower.
`proxy`	object	No	Default proxy routing	Proxy country and type to use for each batch item.
`headers`	object or null	No	`null`	Custom HTTP headers to include in each extraction request.

Use ignore_invalid_urls: true when you prefer a partial batch to start even if a few URLs are malformed. Set it to false if the whole batch should fail validation unless every URL is valid.

Response

A successful request returns a queued job with a status URL and the number of accepted URLs.

{
  "job_id": "9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019",
  "status": "queued",
  "status_url": "/v1/batch/9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019",
  "accepted_urls": 2,
  "invalid_urls": []
}

If invalid_urls is not empty, those URLs were not queued.

Start a Batch Job

Request

Request Body

Response

On this page