Extraction

Working With Output Formats

The Extraction API can return content in Markdown, HTML, or both formats in a single request.

All examples in this guide use the following endpoint:

POST /v1/extract

The formats field controls which output formats are returned by the extraction request.

{
  "formats": ["markdown"]
}

If the formats field is omitted, the API returns HTML by default.

Output Formats

Markdown returns the extracted content as clean, readable text.

Request

request.sh
curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"]
  }'

Response

response.json
{
  "data": {
    "markdown": "# Example Domain..."
  }
}

The extracted content is available in the data.markdown field.

Common use cases:

  • AI and LLM workflows
  • Search indexing
  • Knowledge bases
  • Text processing pipelines

HTML returns the extracted content with a structure closer to the original webpage.

Request

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["html"]
  }'

Response

{
  "data": {
    "html": "<html>...</html>"
  }
}

The extracted content is available in the data.html field.

Common use cases:

  • Rendering content in applications
  • Preserving page structure
  • Working with HTML elements
  • Content transformation workflows

Request both Markdown and HTML when your application needs both formats from the same extraction.

Request

curl -X POST "https://scraper.geonode.io/v1/extract" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "html"]
  }'

Response

{
  "data": {
    "markdown": "# Example Domain...",
    "html": "<html>...</html>"
  }
}

The extracted content is available in both the data.markdown and data.html fields.

Choosing the Right Format

FormatBest For
MarkdownAI workflows, search indexing, knowledge bases, and text processing
HTMLPreserving page structure and rendering content
BothApplications that need both representations from a single extraction

Success

You now know how to control the format returned by the Extraction API.

Whether you need Markdown, HTML, or both, you can choose the format that best fits your workflow.

Next Steps

Continue to Extracting JavaScript Websites to learn how to extract content from pages that rely on client-side rendering.

On this page