Extraction

Extract Content

Extract clean markdown or HTML from any URL with sync or async mode support

POST
/v1/extract
X-Api-Key<token>

In: header

urlstring

URL to extract content from

Formaturi
Length1 <= length
formats?Formats

Output formats to return

Default["html"]
render_js?boolean

If true, uses a headless browser to render the page (slower, more expensive).

Defaultfalse
processing_mode?string

Processing mode: sync (block until done) or async (starts job and return job ID)

Default"sync"
Value in"sync" | "async"
proxy?|null
headers?|null
wait_config?|null

Response Body

application/json

application/json

application/json

application/json

application/json

application/json

application/json

application/json

application/json

curl -X POST "https://scraper.geonode.io/v1/extract" \  -H "Content-Type: application/json" \  -d '{    "url": "http://example.com"  }'
{
  "data": {
    "html": "string",
    "markdown": "string"
  },
  "metadata": {
    "url": "string",
    "render_js": true,
    "http_status": 0,
    "duration_ms": 0,
    "formats": [
      "markdown"
    ],
    "proxy": {
      "country": "string",
      "type": "datacenter"
    },
    "processing_mode": "sync",
    "headers": {
      "property1": "string",
      "property2": "string"
    },
    "wait_config": {
      "wait_until": "commit",
      "wait_for": "string",
      "wait_timeout": 30000
    }
  },
  "tokens_charged": 0
}
{
  "job_id": "string",
  "status": "queued",
  "status_url": "string",
  "estimated_tokens": 0
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "string",
    "retryable": true,
    "details": {
      "property1": "string",
      "property2": "string"
    }
  },
  "tokens_charged": 0
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}
{
  "code": "string",
  "message": "string",
  "correlation_id": "string",
  "retryable": true
}