Extraction Workflows

The Extraction API supports a variety of options that can be combined depending on your use case.

This guide demonstrates common extraction workflows and the recommended settings for each scenario.

Standard Website Extraction

Use the default extraction settings when working with traditional websites that do not rely heavily on JavaScript.

Request

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

Best For

Blogs
News websites
Documentation sites
Static webpages

AI and RAG Workflows

Markdown is often the preferred format when content will be processed by AI systems.

Request

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.python.org/3/library/json.html",
    "formats": ["markdown"]
  }'

Best For

Vector databases
RAG pipelines
Embedding generation
Knowledge bases
LLM applications

JavaScript-Powered Websites

Some websites render content in the browser and require JavaScript execution before extraction.

Request

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geonode.com/",
    "render_js": true
  }'

Best For

React applications
Next.js websites
Vue applications
Single-page applications (SPAs)

Geo-Targeted Extraction

Use a proxy configuration when content changes based on visitor location.

Request

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "proxy": {
      "country": "DE",
      "type": "residential"
    }
  }'

Best For

Local search results
Country-specific pricing
Regional content
Localized webpages

Content and Link Extraction

Extract page content and collect links from the same request.

Request

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://quotes.toscrape.com/",
    "formats": ["markdown"],
    "extract_links": true
  }'

Best For

URL discovery
Content analysis
Link collection
Website research

Large Async Extractions

Use asynchronous processing for pages that may take longer to extract.

Start the Extraction

request.sh

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geonode.com/",
    "render_js": true,
    "processing_mode": "async"
  }'

Response

response.json

{
  "job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
  "status": "queued",
  "status_url": "/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48"
}

Check Job Status

request.sh

curl -X GET "https://scraper.geonode.io/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48" \
  -H "X-Api-Key: YOUR_API_KEY"

Best For

Large pages
Slow websites
Background processing
High-volume workflows

Recommended Settings

Scenario	Recommended Configuration
Standard webpage	Default request
AI and RAG workflows	`formats: ["markdown"]`
JavaScript websites	`render_js: true`
Country-specific content	`proxy`
Link discovery	`extract_links: true`
Long-running extractions	`processing_mode: "async"`

Success

You now know how to combine extraction features for common real-world workflows.

Next Steps

Continue to Best Practices to learn how to improve extraction performance, reliability, and efficiency.

Extraction Workflows

On this page