Geonode Logo light

Overview

Welcome to the Geonode Scraper API! This overview concisely explains the various configuration options available to you. Each option is designed to give you flexibility and control over how the scraper interacts with web pages. Let's dive in:

  • Name
    js_render
    Type
    boolean
    default
    (default = false)
    Description

    Determines if the scraper should execute JavaScript on the target page, which can be essential for dynamic websites.

  • Name
    is_json_response
    Type
    boolean
    default
    (default = false)
    Description

    Ideal for URLs that yield API responses. When enabled, the scraper expects a JSON response instead of a typical web document.

  • Name
    block_resources
    Type
    boolean
    default
    (default = true)
    Description

    Helps save bandwidth by blocking non-essential resources like images and CSS during scraping.

  • Name
    keep_headers
    Type
    boolean
    default
    (default = false)
    Description

    Ensures specific headers are forwarded to the webpage, enhancing the scraper's interaction with the target site.

  • Name
    debug
    Type
    boolean
    default
    (default = false)
    Description

    A diagnostic tool. When enabled with response_format set to 'json', it provides detailed insights about the scraping process.

  • Name
    response_format
    Type
    string (html, json)
    default
    (default = html)
    Description

    Dictates the format of the scraper's output. 'html' provides the raw page content, while 'json' offers a structured response with additional details.

  • Name
    mode
    Type
    string (SPA , longPolling, domLoaded, documentLoaded, load)
    default
    (default = load)
    Description

    Helps save bandwidth by blocking non-essential resources like images and CSS during scraping.

  • Name
    waitForSelector
    Type
    string
    default
    (default = null)
    Description

    A powerful feature that instructs the scraper to wait for a specific DOM element to appear before concluding the request. Useful for pages with delayed content loading.

  • Name
    device_type
    Type
    string
    default
    (default = null)
    Description

    Emulates different devices, allowing you to see content as it appears on various devices.

  • Name
    country_code
    Type
    string
    default
    (default = null)
    Description

    Designates a specific country for proxy usage, ensuring content is accessed as viewed from that region.

  • Name
    cookies
    Type
    object[]
    default
    (default = [])
    Description

    Enables setting browser-specific cookies, which can be crucial for accessing certain web content.

  • Name
    localStorage
    Type
    object
    default
    (default = [])
    Description

    Sets specific local storage data in the browser, further customizing the scraping environment.

  • Name
    HTMLMinifier
    Type
    boolean
    default
    (default = useMinifier: false)
    Description

    If enabled, the scraper will return a minified version of the HTML, reducing the response size.

  • Name
    collect_data_from_requests
    Type
    boolean
    default
    (default = false)
    Description

    Captures additional data from 'fetch' and 'xhr' responses made by the webpage, ensuring dynamically loaded content is also retrieved.

  • Name
    optimizations
    Type
    object
    default
    (default = { skipDomains:[], loadOnlySameOriginRequests:true })
    Description

    Enhancements to optimize bandwidth and speed. Can exclude requests to specific domains or prioritize same-origin requests.

  • Name
    retries
    Type
    object
    default
    (default = { useRetries: true ,maxRetries: 2 })
    Description

    Provides resilience by controlling the number of retry attempts if a scraping request encounters issues.

  • Name
    proxy
    Type
    object
    default
    (default = { useOnlyResidential: false | boolean })
    Description

    Specifies the proxy type. When set to true, the scraper prioritizes using residential proxies, emulating typical residential users.

  • Name
    screenshot
    Type
    object
    default
    (default = { use: false, options: {}} )
    Description

    Configuration settings for capturing visual snapshots of the web pages being scraped.

  • Name
    viewport
    Type
    object
    default
    (default = { width : 1280, height : 800 })
    Description

    Customizes the virtual browser window's dimensions and characteristics, ensuring web pages are rendered as desired.

  • Name
    js_scenario
    Type
    object
    default
    (default = { actions: [ ] })
    Description

    Enables scripted interactions with the target page, emulating user behaviors like clicks, scrolls, and form inputs.