Extraction Workflows
The Extraction API supports a variety of options that can be combined depending on your use case.
This guide demonstrates common extraction workflows and the recommended settings for each scenario.
Standard Website Extraction
Use the default extraction settings when working with traditional websites that do not rely heavily on JavaScript.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'Best For
- Blogs
- News websites
- Documentation sites
- Static webpages
AI and RAG Workflows
Markdown is often the preferred format when content will be processed by AI systems.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.python.org/3/library/json.html",
"formats": ["markdown"]
}'Best For
- Vector databases
- RAG pipelines
- Embedding generation
- Knowledge bases
- LLM applications
JavaScript-Powered Websites
Some websites render content in the browser and require JavaScript execution before extraction.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://geonode.com/",
"render_js": true
}'Best For
- React applications
- Next.js websites
- Vue applications
- Single-page applications (SPAs)
Geo-Targeted Extraction
Use a proxy configuration when content changes based on visitor location.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"proxy": {
"country": "DE",
"type": "residential"
}
}'Best For
- Local search results
- Country-specific pricing
- Regional content
- Localized webpages
Content and Link Extraction
Extract page content and collect links from the same request.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://quotes.toscrape.com/",
"formats": ["markdown"],
"extract_links": true
}'Best For
- URL discovery
- Content analysis
- Link collection
- Website research
Large Async Extractions
Use asynchronous processing for pages that may take longer to extract.
Start the Extraction
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://geonode.com/",
"render_js": true,
"processing_mode": "async"
}'Response
{
"job_id": "4844831a-a222-4cac-b5e6-7e3f2dd07b48",
"status": "queued",
"status_url": "/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48"
}Check Job Status
curl -X GET "https://scraper.geonode.io/v1/extract/4844831a-a222-4cac-b5e6-7e3f2dd07b48" \
-H "X-Api-Key: YOUR_API_KEY"Best For
- Large pages
- Slow websites
- Background processing
- High-volume workflows
Recommended Settings
| Scenario | Recommended Configuration |
|---|---|
| Standard webpage | Default request |
| AI and RAG workflows | formats: ["markdown"] |
| JavaScript websites | render_js: true |
| Country-specific content | proxy |
| Link discovery | extract_links: true |
| Long-running extractions | processing_mode: "async" |
Success
You now know how to combine extraction features for common real-world workflows.
Next Steps
Continue to Best Practices to learn how to improve extraction performance, reliability, and efficiency.