Start a Crawl Job

Coming soon

This endpoint is documented but not yet available in production. The contract below reflects the planned behavior. Reach out via support for early access or launch notification.

Crawl a website starting from a seed URL up to a given depth and page limit

Authorization

APIKeyHeader

X-Api-Key<token>

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

url*Url

Seed URL to start crawling from

depth?Depth

Maximum BFS depth from the seed URL (1 = seed only)

limit?Limit

Maximum number of pages to crawl

formats?array<>

Output formats to extract per page

render_js?Render Js

Use a headless browser to render each page (slower, handles JS-heavy sites)

same_domain_only?Same Domain Only

Only follow links that stay on the same domain as the seed URL

include_subdomains?Include Subdomains

When same_domain_only is true, also include subdomains of the seed domain

proxy?|

Proxy configuration for page requests. Optional — when omitted or explicitly null, defaults are applied per page: residential type, country auto-resolved from each page URL (domain/path overrides first, then the TLD; if nothing matches, the upstream proxy provider picks the exit region). Send an object to override either field; individual fields inside the object may also be null (see ProxySettings).

wait_config?|

Grouped browser wait policy applied to every crawled page extraction. Optional — when omitted or explicitly null, each page extraction uses the adaptive default settle policy whenever a browser runs. Send an object to opt every page extraction into caller-owned explicit wait mode, which skips the adaptive default settle phase. Only non-null wait_config values count as provided; null fields inside wait_config behave the same as absence. If render_js is not set explicitly, any non-null wait_config value auto-enables browser rendering for the workflow and the effective render_js value is returned in crawl_config. If you explicitly send render_js=false together with non-null wait_config values, the request is rejected as ambiguous.

Response Body

`application/json`

curl -X POST "https://example.com/v1/crawl" \  -H "Content-Type: application/json" \  -d '{    "url": "http://example.com"  }'

{
  "job_id": "453bd7d7-5355-4d6d-a38e-d9e7eb218c3f",
  "url": "string",
  "status": "queued",
  "status_url": "string",
  "estimated_pages": 0
}

Empty

Start a Crawl Job

Authorization

Request Body

Response Body

202application/json

401

402

422

429

503

`application/json`