Start a Batch Job
Queue multiple URLs for asynchronous extraction.
POST /v1/batch starts an asynchronous extraction job for a list of URLs. Use it when you already know the pages you want to extract and want one job ID to track the whole run.
A batch job accepts up to 1,000 URLs. The API applies the same output format, rendering, proxy, and header settings to every accepted URL in the batch.
Request
The example below is a shell command. It stores the production base URL and API key in environment variables, then sends two URLs as one batch.
export SCRAPER_API_BASE_URL="https://scraper.geonode.io"
export GEONODE_SCRAPER_API_KEY="YOUR_API_KEY"
curl -X POST "$SCRAPER_API_BASE_URL/v1/batch" \
-H "X-Api-Key: $GEONODE_SCRAPER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://quotes.toscrape.com/",
"https://quotes.toscrape.com/page/2/"
],
"formats": ["markdown"],
"render_js": false,
"ignore_invalid_urls": true,
"proxy": {
"country": "US",
"type": "residential"
}
}'If the batch is accepted, the API returns 202 with a job_id. Use that ID with Get Batch Job Status.
Request Body
The request body controls which URLs enter the batch and how each URL should be extracted. Start with urls and formats, then add proxy, header, or rendering options only when the whole batch needs them.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | None | URLs to extract. Must contain 1 to 1,000 items. |
ignore_invalid_urls | boolean | No | true | Skips invalid URLs and returns them in invalid_urls instead of failing the whole request. |
formats | array | No | ["markdown"] | Output formats to return for each URL. Supported values are markdown and html. |
render_js | boolean | No | false | Uses a headless browser for every batch item. Useful for JavaScript-heavy pages, but slower. |
proxy | object | No | Default proxy routing | Proxy country and type to use for each batch item. |
headers | object or null | No | null | Custom HTTP headers to include in each extraction request. |
Use ignore_invalid_urls: true when you prefer a partial batch to start even if a few URLs are malformed. Set it to false if the whole batch should fail validation unless every URL is valid.
Response
A successful request returns a queued job with a status URL and the number of accepted URLs.
{
"job_id": "9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019",
"status": "queued",
"status_url": "/v1/batch/9d7b2c8e-8a4b-4c10-9af3-65f4f8f6c019",
"accepted_urls": 2,
"invalid_urls": []
}If invalid_urls is not empty, those URLs were not queued.