Extraction

Extraction Workflows

The Extraction API supports a variety of options that can be combined depending on your use case.

This guide demonstrates common extraction workflows and the recommended settings for each scenario.

Standard Website Extraction

Use the default extraction settings when working with traditional websites that do not rely heavily on JavaScript.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

Best For

  • Blogs
  • News websites
  • Documentation sites
  • Static webpages

AI and RAG Workflows

Markdown is often the preferred format when content will be processed by AI systems.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.python.org/3/library/json.html",
    "formats": ["markdown"]
  }'

Best For

  • Vector databases
  • RAG pipelines
  • Embedding generation
  • Knowledge bases
  • LLM applications

JavaScript-Powered Websites

Some websites render content in the browser and require JavaScript execution before extraction.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geonode.com/",
    "render_js": true
  }'

Best For

  • React applications
  • Next.js websites
  • Vue applications
  • Single-page applications (SPAs)

Geo-Targeted Extraction

Use a proxy configuration when content changes based on visitor location.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "proxy": {
      "country": "DE",
      "type": "residential"
    }
  }'

Best For

  • Local search results
  • Country-specific pricing
  • Regional content
  • Localized webpages

Extract page content and collect links from the same request.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://quotes.toscrape.com/",
    "formats": ["markdown"],
    "extract_links": true
  }'

Best For

  • URL discovery
  • Content analysis
  • Link collection
  • Website research

Large Async Extractions

Use asynchronous processing for pages that may take longer to extract.

Start the Extraction

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geonode.com/",
    "render_js": true,
    "processing_mode": "async"
  }'

Response

response.json
{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "queued",
  "status_url": "/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48"
}

Check Job Status

request.sh
curl -X GET "https://scraper.geonode.io/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48" \
  -H "X-Api-Key: YOUR_API_KEY"

Best For

  • Large pages
  • Slow websites
  • Background processing
  • High-volume workflows

ScenarioRecommended Configuration
Standard webpageDefault request
AI and RAG workflowsformats: ["markdown"]
JavaScript websitesrender_js: true
Country-specific contentproxy
Link discoveryextract_links: true
Long-running extractionsprocessing_mode: "async"

Success

You now know how to combine extraction features for common real-world workflows.

Next Steps

Continue to Best Practices to learn how to improve extraction performance, reliability, and efficiency.

On this page