Extraction

Best Practices

The following recommendations can help improve extraction results and reduce unnecessary processing.

Choose the Right Output Format

Use the output format that matches your use case.

FormatBest For
markdownAI workflows, RAG pipelines, indexing, and text processing
htmlPreserving page structure and rendering content
BothApplications that require both formats

Requesting only the formats you need can reduce response size.

Enable JavaScript Rendering Only When Needed

JavaScript rendering increases extraction time because the page must be rendered before content can be extracted.

Use:

request.json
{
  "render_js": true
}

only for websites that depend on client-side rendering.

Common examples include:

  • React
  • Next.js
  • Vue
  • Single-page applications (SPAs)

Use Asynchronous Processing for Large Workloads

For long-running extractions, use asynchronous processing.

request.json
{
  "processing_mode": "async"
}

This prevents request timeouts and allows your application to continue processing while extraction runs in the background.

Use Geo-Targeting Only When Required

Proxy routing may increase processing time.

Only specify a country when content differs by location.

request.json
{
  "proxy": {
    "country": "DE",
    "type": "residential"
  }
}

Reuse Job IDs

When using asynchronous extraction:

  1. Create the extraction job once.
  2. Store the returned job_id.
  3. Poll the job status endpoint.

Avoid creating duplicate extraction jobs for the same request.

request.json
{
  "extract_links": true
}

Enable link extraction only when you need URLs from the page.

This keeps responses smaller and easier to process.

Monitor Job Status

Before retrieving extraction results, check the job status.

GET /v1/extract/{job_id}

Wait until the status becomes:

completed

before processing the result.

Store Extracted Content

If content does not change frequently, consider storing extraction results instead of repeatedly extracting the same page.

This can reduce costs and improve performance.

Success

You now know the recommended practices for building reliable extraction workflows.

Next Steps

Continue to Common Errors to learn how to troubleshoot common extraction issues.

On this page