Geonode Logo light

Realtime Mode

The Scraper provides a real-time mode for efficient, fast results. In this mode, you don't need to provide a callback API. Instead, the API will return a response within a maximum timeout of 150 seconds. To use the real-time mode, simply provide the URL of the web page you want to scrape in the payload. If you need to add specific configurations to the request, you can include a 'configurations' object in the payload. With this mode, you can get quick and efficient results without having to set up a separate callback API.

Run in Postman

POST
https://scraper.geonode.com/api/scraper/scrape/realtime

Parameters

  • Name
    url
    Type
    string
    Description

    (Required) The target website's URL to scrape.

  • Name
    configurations.js_render
    Type
    boolean
    Description

    Whether to run the JavaScript in the target page. Defaults to false.

  • Name
    configurations.is_json_response
    Type
    boolean
    Description

    Use this for URLs that return API responses and not documents. Defaults to false.

  • Name
    configurations.block_resources
    Type
    boolean
    Description

    Whether to block images and CSS on the page you want to scrape. Defaults to true.

  • Name
    configurations.keep_headers
    Type
    boolean
    Description

    Whether to forward particular headers to the webpage, as well as other headers generated by the Scraper. Defaults to false.

  • Name
    configurations.debug
    Type
    boolean
    Description

    When set to true and the response_format is set to json, returns information about the requests. Defaults to false.

  • Name
    configurations.response_format
    Type
    'html' | 'json'
    Description

    Can be either html or json. If set to html, returns the HTML of the page at the end of the request. If set to json, returns more information, such as headers, HTML, debug info (if set to true), and usage bandwidth. Defaults to 'html'.

  • Name
    configurations.mode
    Type
    'SPA' | 'longPolling' | 'domLoaded' | 'documentLoaded' | 'load'
    Description

    Specifies the method the Scraper will use to handle the request. Supported methods are:

    • SPA: comes in handy for SPAs that load resources with fetch requests.
    • longPolling: comes in handy for pages that do long-polling or any other side activity.
    • domLoaded: considers the request to be finished when the DOMContentLoaded event is fired.
    • documentLoaded: considers the request to be finished when the Scraper gets the document HTML response and immediately returns the HTML.
    • load: considers the request to be finished when the load event is fired.

    Defaults to 'load'.

  • Name
    configurations.waitForSelector
    Type
    string
    Description

    A CSS selector to wait for in the DOM.

  • Name
    configurations.device_type
    Type
    'desktop' | 'mobile' | 'tablet'
    Description

    Controls the device type the request will be sent from.

  • Name
    configurations.country_code
    Type
    ISO 3166-1 alpha-2
    Description

    Specifies the proxy geolocation.

  • Name
    configurations.cookies
    Type
    {name: string, value: string, domain: string}[]
    Description

    Allows you to pass custom cookies to the webpage you want to scrape. Defaults to [].

  • Name
    configurations.localStorage
    Type
    <localStorageKey | localStorageValue>
    Description

    Allows you to set custom local storage data to the webpage you want to scrape. Defaults to {}.

  • Name
    configurations.collect_data_from_requests
    Type
    boolean
    Description

    Allows you to load additional data from 'fetch' and 'xhr' responses made inside the target page.

  • Name
    configurations.HTMLMinifier
    Type
    object
    Description

    Allows you to set the HTML minifier object to minify the HTML response. Defaults to { "useMinifier": false }. For additional information, see here.

  • Name
    configurations.screenshot
    Type
    object
    Description

    Use this to get the screenshot of the page. For additional information, see here.

  • Name
    configurations.optimizations
    Type
    object
    Description

    Use this to optimize the bandwidth and speed of the request. For additional information, see here.

  • Name
    configurations.retries
    Type
    object
    Description

    Use this to control the number of retries. For additional information, see here.

  • Name
    configurations.proxy
    Type
    object
    Description

    Use this to select if only residential proxy will be used by the Scraper. For additional information, see here.

  • Name
    configurations.js_scenario
    Type
    object
    Description

    This allows you to interact with the target page. The js_scenario object can only contain the "actions" property. For additional information, see here.

  • Name
    configurations.viewport
    Type
    object
    Description

    Use this to set the browser"s viewport. For additional information, see here.

Example Request Body

{
    "url": "https://geonode.com/",
    "configurations": {
        "js_render": true,
        "is_json_response": false,
        "keep_headers": false,
        "debug": false,
        "block_resources": false,
        "response_format": "json",
        "mode": "SPA",
        "waitForSelector": "#buttonId",
        "device_type": "desktop",
        "country_code": "tr",
        "cookies": [ ... ],
        "localStorage": { ... },
        "HTMLMinifier": {
            "useMinifier": false
        },
        "collect_data_from_requests": true,
        "optimizations": { ... },
        "retries": { ... },
        "proxy": { ... },
        "screenshot": { ... },
        "viewport": { ... },
        "js_scenario": { 
            "actions": [ ... ],
        },
    }
}

Screenshot

The screenshot object can contain the following properties:

  • Name
    use
    Type
    boolean
    Description

    If set to true, returns the screenshot of the page.

  • Name
    options.type
    Type
    'png' | 'jpeg' | 'webp'
    Description

    The data type of the image.

  • Name
    options.fullPage
    Type
    boolean
    Description

    If set to true, captures the full-page.

  • Name
    options.omitBackground
    Type
    boolean
    Description

    If set to true, removes the default white background to enable the capture of screenshots with transparency.

  • Name
    options.encoding
    Type
    'base64' | 'binary'
    Description

    The type of encoding of the image.

  • Name
    options.captureBeyondViewport
    Type
    boolean
    Description

    If set to true, captures beyond the viewport.

  • Name
    options.fromSurface
    Type
    boolean
    Description

    If set to true, captures from the surface, rather than the view.

Example Screenshot Object

{
    "use": true,
    "options": {
        "type": "png",
        "fullPage": false,
        "omitBackground": true,
        "encoding": "binary",
        "captureBeyondViewport": false,
        "fromSurface": false,
    }
}

Defaults to

{
    "use": false,
    "options": { 
      "type": "jpeg", 
      "fullPage": true, 
      "omitBackground": false, 
      "encoding": "binary", 
      "captureBeyondViewport": true, 
      "fromSurface": true
    }
}

Optimizations

The optimizations object can contain the following properties:

  • Name
    skipDomains
    Type
    array
    Description

    The Geonode scrapper will not load requests for the domains in this list, based on the includes() logic.

  • Name
    loadOnlySameOriginRequests
    Type
    boolean
    Description

    If set to true, only requests from the same origin will be loaded, which can be used to optimize the bandwidth and speed of the request.

Example Optimizations Object

{
    "skipDomains": [ "geonode" ],
    "loadOnlySameOriginRequests": false,
}

Defaults to

{
    "skipDomains": [],
    "loadOnlySameOriginRequests": true,
}

Retries

The retries object can contain the following properties:

  • Name
    useRetries
    Type
    boolean
    Description

    If set to true, retries will be used.

  • Name
    maxRetries
    Type
    number
    Description

    The maximum number of retries.

Example Retries Object

{
    "useRetries": true,
    "maxRetries": 5,
}

Defaults to

{
    "useRetries": true,
    "maxRetries": 2,
}

Proxy

The proxy object can contain the following properties:

  • Name
    useOnlyResidential
    Type
    boolean
    Description

    If set to true, only residential proxies will be used.

Example Proxy Object

{
    "useOnlyResidential": true,
}

Defaults to

{
    "useOnlyResidential": false,
}

Viewport

The viewport object can contain the following properties:

  • Name
    width
    Type
    number
    Description

    The page width in pixels.

  • Name
    height
    Type
    number
    Description

    The data type of the image.

  • Name
    deviceScaleFactor
    Type
    number
    Description

    The device scale factor, which can be thought of as the device pixel ratio (DPR).

  • Name
    isMobile
    Type
    boolean
    Description

    If set to true, the meta viewport tag will be taken into account when rendering the page.

  • Name
    hasTouch
    Type
    boolean
    Description

    If set to true, the viewport will support touch events.

  • Name
    isLandscape
    Type
    boolean
    Description

    If set to true, the viewport will be in landscape mode.

Example Viewport Object

{
    "width": 800,
    "height": 640,
    "deviceScaleFactor": 0.5,
    "isMobile": true,
    "hasTouch": true,
    "isLandscape": true,
}

Defaults to

{
    "width": 1280,
    "height": 800,
}