Create a Webhook
Register a webhook subscription for Scraper API completion events.
POST /v1/webhooks registers a new webhook endpoint. Use webhooks when your application needs a callback after asynchronous extract, batch, or crawl work completes.
The callback URL must be an absolute URI that your application can receive HTTP callbacks on. You can register up to 10 webhooks per account.
Event Types
Choose one event type for each webhook subscription.
| Event type | When it is used |
|---|---|
extract_completed | An asynchronous extraction job completes. |
batch_completed | A batch extraction job completes. |
crawl_completed | A crawl job completes. |
Request
The example below creates a webhook for batch completion events.
export SCRAPER_API_BASE_URL="https://scraper.geonode.io"
export GEONODE_SCRAPER_API_KEY="YOUR_API_KEY"
curl -X POST "$SCRAPER_API_BASE_URL/v1/webhooks" \
-H "X-Api-Key: $GEONODE_SCRAPER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/webhooks/geonode-scraper",
"event_type": "batch_completed",
"description": "Production batch completion webhook"
}'Webhook Delivery
When a registered event occurs, the Scraper API sends an HTTP POST to your callback URL with a JSON envelope. Every delivery includes these HTTP headers:
| Header | Description |
|---|---|
Content-Type | application/json |
User-Agent | scraper-api-webhook/1.0 |
X-Webhook-Signature | HMAC-SHA256 signature for verification (see below). |
X-Webhook-Delivery-Id | Unique ID for this delivery attempt. |
X-Webhook-Event | The event type that triggered the delivery. |
Delivery Envelope
Every delivery body is a JSON object with this envelope:
{
"event": "batch_completed",
"webhook_id": "2a936d3b-5a5d-47a0-b68d-5df0d8b8327c",
"delivery_id": "395cf7e2-41e0-4e44-85f1-5f00f537a44f",
"timestamp": "2026-05-27T08:16:09Z",
"data": { ... }
}The data field is event-specific:
| Event type | data shape |
|---|---|
extract_completed | Full job result matching the GET /v1/extract/{job_id} response (job_id, status, created_at, completed_at, data with markdown/html/links, metadata, error, tokens_charged). |
batch_completed | { "job_id": "...", "status": "completed", "counts": { "total_urls": N, "completed_urls": N, "failed_urls": N, "cancelled_urls": N, "pending_urls": N }, "status_url": "/v1/batch/{job_id}" } |
crawl_completed | { "job_id": "...", "status": "completed", "counts": { "total_pages": N, "completed_pages": N, "failed_pages": N, "cancelled_pages": N }, "status_url": "/v1/crawl/{job_id}" } |
Signature Verification
Every delivery includes an X-Webhook-Signature header with an HMAC-SHA256 signature computed over the raw request body using your webhook secret. The signature value is prefixed with sha256=.
Verify this signature in your receiver to confirm the delivery originated from the Scraper API and was not tampered with in transit:
import hmac
import hashlib
secret = "whsec_example" # The secret returned when you created the webhook
body = request.body # Raw bytes, not the parsed dict
header = request.headers.get("X-Webhook-Signature", "")
expected = "sha256=" + hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, header):
raise ValueError("Invalid signature")Retry Behavior
The Scraper API retries failed deliveries with exponential backoff. A delivery is retried when:
- Your receiver returns HTTP
429or a5xxstatus code. - The connection times out or a network error occurs.
The API does not retry on client errors (4xx other than 429). Those deliveries are immediately abandoned.
Up to 5 retry attempts are made, with the following default delays between attempts (configurable per deployment):
| Attempt | Delay before next retry |
|---|---|
| 1st retry | immediate |
| 2nd retry | 1 minute |
| 3rd retry | 5 minutes |
| 4th retry | 30 minutes |
| 5th retry | 2 hours |
After the final retry fails, the delivery is abandoned. Delivery records in GET /v1/webhooks/{webhook_id}/deliveries show the attempt_number and current status for each delivery.
Timeout
The upstream HTTP timeout for outbound webhook POSTs is 10 seconds (configurable per deployment). If your receiver needs more time, acknowledge the webhook quickly with a 2xx status and process the job asynchronously.
Request Body
The create body describes where the API should send the callback and which event should trigger it.
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Absolute webhook callback URL. |
event_type | string | Yes | One of extract_completed, batch_completed, or crawl_completed. |
description | string or null | No | Optional label to help you identify the webhook later. |
Response
A successful create request returns 201 and includes the generated secret. Save the secret when you create the webhook, because it is not shown in normal webhook lookup responses.
{
"id": "2a936d3b-5a5d-47a0-b68d-5df0d8b8327c",
"url": "https://example.com/webhooks/geonode-scraper",
"description": "Production batch completion webhook",
"is_active": true,
"event_type": "batch_completed",
"created_at": "2026-05-27T08:15:30Z",
"updated_at": "2026-05-27T08:15:30Z",
"secret": "whsec_example"
}