Extraction

Link Extraction

By default, the Extraction API returns the extracted page content.

If you also need links found on the page, enable link extraction using the extract_links option.

Set extract_links to true in the extraction request.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://quotes.toscrape.com/",
    "formats": ["markdown"],
    "extract_links": true
  }'

Response

When link extraction is enabled, the response includes a links field inside data.

response.json
{
  "data": {
    "markdown": "...",
    "html": null,
    "links": [
      "https://quotes.toscrape.com/login",
      "https://quotes.toscrape.com/author/Albert-Einstein",
      "https://www.goodreads.com/quotes"
    ]
  }
}

The extracted links are available in:

response-path.txt
data.links

Each item in the array contains a URL discovered on the extracted page.

Enable extract_links when you need to:

  • Collect links from a webpage
  • Discover related pages referenced by the content
  • Build URL lists for further processing
  • Analyze page relationships

Important

extract_links returns links found on the extracted page.

It does not crawl those links or recursively discover additional pages.

For example:

Page A
 ├─ Link B
 ├─ Link C
 └─ Link D

The API returns links B, C, and D.

It does not visit those pages automatically.

Success

You now know how to return links alongside extracted content in a single extraction request.

Next Steps

Continue to Extraction Workflows to learn how different extraction options can be combined in real-world scenarios.

On this page