Link Extraction
By default, the Extraction API returns the extracted page content.
If you also need links found on the page, enable link extraction using the extract_links option.
Enable Link Extraction
Set extract_links to true in the extraction request.
Request
curl -X POST "https://scraper.geonode.io/v1/extract" \
-H "X-Api-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://quotes.toscrape.com/",
"formats": ["markdown"],
"extract_links": true
}'Response
When link extraction is enabled, the response includes a links field inside data.
{
"data": {
"markdown": "...",
"html": null,
"links": [
"https://quotes.toscrape.com/login",
"https://quotes.toscrape.com/author/Albert-Einstein",
"https://www.goodreads.com/quotes"
]
}
}Access Extracted Links
The extracted links are available in:
data.linksEach item in the array contains a URL discovered on the extracted page.
When to Use Link Extraction
Enable extract_links when you need to:
- Collect links from a webpage
- Discover related pages referenced by the content
- Build URL lists for further processing
- Analyze page relationships
Important
extract_links returns links found on the extracted page.
It does not crawl those links or recursively discover additional pages.
For example:
Page A
├─ Link B
├─ Link C
└─ Link DThe API returns links B, C, and D.
It does not visit those pages automatically.
Success
You now know how to return links alongside extracted content in a single extraction request.
Next Steps
Continue to Extraction Workflows to learn how different extraction options can be combined in real-world scenarios.