Extraction

Waiting for Dynamic Content

Modern websites often load content after the initial page load using JavaScript. If content appears a few seconds later, extracting immediately may return incomplete results.

The wait_config parameter gives you control over when the extraction should begin.

How wait_config Works

When a wait_config is provided, the extraction process follows this order:

wait_until

wait_for

wait_timeout

Extract Content

This allows you to wait for page events, specific elements, or additional delays before extraction starts.

wait_until

The wait_until option controls which browser lifecycle event must complete before moving to the next step.

commit

Wait until the browser receives the response headers and commits the navigation.

{
  "url": "https://example.com",
  "wait_config": {
    "wait_until": "commit"
  }
}

Best for:

  • Very fast extractions
  • Cases where you only need the initial response

domcontentloaded

Wait until the HTML is parsed and the DOM is ready.

{
  "url": "https://example.com",
  "wait_config": {
    "wait_until": "domcontentloaded"
  }
}

Best for:

  • Most websites
  • Pages where content is already present in the HTML

This is the default value when wait_until is not provided.

load

Wait until the page and all resources are fully loaded.

{
  "url": "https://example.com",
  "wait_config": {
    "wait_until": "load"
  }
}

Best for:

  • Pages that depend on images or external scripts
  • Slower websites that require additional loading time

networkidle

Wait until there is no network activity for 500 milliseconds.

{
  "url": "https://example.com",
  "wait_config": {
    "wait_until": "networkidle"
  }
}

Best for:

  • Single-page applications (SPA)
  • React, Vue, Angular, and Next.js websites
  • Dynamic content loaded through API requests

wait_for

The wait_for option waits until a specific element appears on the page before extraction starts.

Using a CSS Selector

{
  "url": "https://example.com",
  "wait_config": {
    "wait_for": ".product-grid"
  }
}

Extraction begins only after an element matching .product-grid is found.

Using XPath

{
  "url": "https://example.com",
  "wait_config": {
    "wait_for": "//div[@class='product-grid']"
  }
}

XPath expressions must start with:

//

or

xpath=

Otherwise, the value is treated as a CSS selector.

Common Examples

Use CaseSelector
Product listing.product-grid
Search results.search-results
Article contentarticle
Table datatable
Loading completion.loaded

wait_timeout

The wait_timeout option adds an additional delay after all previous waits complete.

{
  "url": "https://example.com",
  "wait_config": {
    "wait_for": ".product-grid",
    "wait_timeout": 5000
  }
}

In this example:

  1. Wait for .product-grid
  2. Wait an additional 5 seconds
  3. Extract content

When to Use wait_timeout

Use this option when content continues updating after the target element appears.

Common examples include:

  • Infinite scrolling pages
  • Late-loading advertisements
  • Client-side rendering delays
  • Dynamic dashboards

Complete Example

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "wait_config": {
    "wait_until": "networkidle",
    "wait_for": ".product-grid",
    "wait_timeout": 3000
  }
}

This configuration:

  1. Waits until network activity stops
  2. Waits for .product-grid to appear
  3. Waits an additional 3 seconds
  4. Extracts the content

Browser Rendering Behavior

When a non-null wait_config is provided, browser rendering is automatically enabled if render_js is not explicitly specified.

Example:

{
  "url": "https://example.com",
  "wait_config": {
    "wait_until": "networkidle"
  }
}

The extraction will automatically use browser rendering.

Invalid Configuration

The following request is rejected:

{
  "url": "https://example.com",
  "render_js": false,
  "wait_config": {
    "wait_until": "networkidle"
  }
}

This is considered ambiguous because wait_config requires browser rendering while render_js explicitly disables it.

Best Practices

  • Use domcontentloaded for most websites.
  • Use networkidle for JavaScript-heavy applications.
  • Use wait_for when a specific element contains the data you need.
  • Use wait_timeout only when additional rendering time is required.
  • Avoid excessive delays, as they increase extraction time and cost.

Next Steps

Now that you understand how to wait for dynamic content, learn how to work with synchronous and asynchronous extraction workflows.

On this page