Realtime Mode
The Scraper provides a real-time mode for efficient, fast results. In this mode, you don't need to provide a callback API. Instead, the API will return a response within a maximum timeout of 150 seconds. To use the real-time mode, simply provide the URL of the web page you want to scrape in the payload. If you need to add specific configurations to the request, you can include a 'configurations' object in the payload. With this mode, you can get quick and efficient results without having to set up a separate callback API.
https://scraper.geonode.com/api/scraper/scrape/realtime
Parameters
- Name
url
- Type
- string
- Description
(Required) The target website's URL to scrape.
- Name
configurations.js_render
- Type
- boolean
- Description
Whether to run the JavaScript in the target page. Defaults to
false
.
- Name
configurations.is_json_response
- Type
- boolean
- Description
Use this for URLs that return API responses and not documents. Defaults to
false
.
- Name
configurations.block_resources
- Type
- boolean
- Description
Whether to block images and CSS on the page you want to scrape. Defaults to
true
.
- Name
configurations.keep_headers
- Type
- boolean
- Description
Whether to forward particular headers to the webpage, as well as other headers generated by the Scraper. Defaults to
false
.
- Name
configurations.debug
- Type
- boolean
- Description
When set to true and the response_format is set to json, returns information about the requests. Defaults to
false
.
- Name
configurations.response_format
- Type
- 'html' | 'json'
- Description
Can be either html or json. If set to html, returns the HTML of the page at the end of the request. If set to json, returns more information, such as headers, HTML, debug info (if set to true), and usage bandwidth. Defaults to
'html'
.
- Name
configurations.mode
- Type
- 'SPA' | 'longPolling' | 'domLoaded' | 'documentLoaded' | 'load'
- Description
Specifies the method the Scraper will use to handle the request. Supported methods are:
- SPA: comes in handy for SPAs that load resources with fetch requests.
- longPolling: comes in handy for pages that do long-polling or any other side activity.
- domLoaded: considers the request to be finished when the DOMContentLoaded event is fired.
- documentLoaded: considers the request to be finished when the Scraper gets the document HTML response and immediately returns the HTML.
- load: considers the request to be finished when the load event is fired.
Defaults to
'load'
.
- Name
configurations.waitForSelector
- Type
- string
- Description
A CSS selector to wait for in the DOM.
- Name
configurations.device_type
- Type
- 'desktop' | 'mobile' | 'tablet'
- Description
Controls the device type the request will be sent from.
- Name
configurations.country_code
- Type
- ISO 3166-1 alpha-2
- Description
Specifies the proxy geolocation.
- Name
configurations.cookies
- Type
- {name: string, value: string, domain: string}[]
- Description
Allows you to pass custom cookies to the webpage you want to scrape. Defaults to [].
- Name
configurations.localStorage
- Type
- <localStorageKey | localStorageValue>
- Description
Allows you to set custom local storage data to the webpage you want to scrape. Defaults to {}.
- Name
configurations.collect_data_from_requests
- Type
- boolean
- Description
Allows you to load additional data from 'fetch' and 'xhr' responses made inside the target page.
- Name
configurations.HTMLMinifier
- Type
- object
- Description
Allows you to set the HTML minifier object to minify the HTML response. Defaults to
{ "useMinifier": false }
. For additional information, see here.
- Name
configurations.screenshot
- Type
- object
- Description
Use this to get the screenshot of the page. For additional information, see here.
- Name
configurations.optimizations
- Type
- object
- Description
Use this to optimize the bandwidth and speed of the request. For additional information, see here.
- Name
configurations.retries
- Type
- object
- Description
Use this to control the number of retries. For additional information, see here.
- Name
configurations.proxy
- Type
- object
- Description
Use this to select if only residential proxy will be used by the Scraper. For additional information, see here.
- Name
configurations.js_scenario
- Type
- object
- Description
This allows you to interact with the target page. The js_scenario object can only contain the "actions" property. For additional information, see here.
- Name
configurations.viewport
- Type
- object
- Description
Use this to set the browser"s viewport. For additional information, see here.
Example Request Body
{
"url": "https://geonode.com/",
"configurations": {
"js_render": true,
"is_json_response": false,
"keep_headers": false,
"debug": false,
"block_resources": false,
"response_format": "json",
"mode": "SPA",
"waitForSelector": "#buttonId",
"device_type": "desktop",
"country_code": "tr",
"cookies": [ ... ],
"localStorage": { ... },
"HTMLMinifier": {
"useMinifier": false
},
"collect_data_from_requests": true,
"optimizations": { ... },
"retries": { ... },
"proxy": { ... },
"screenshot": { ... },
"viewport": { ... },
"js_scenario": {
"actions": [ ... ],
},
}
}
Screenshot
The screenshot object can contain the following properties:
- Name
use
- Type
- boolean
- Description
If set to true, returns the screenshot of the page.
- Name
options.type
- Type
- 'png' | 'jpeg' | 'webp'
- Description
The data type of the image.
- Name
options.fullPage
- Type
- boolean
- Description
If set to true, captures the full-page.
- Name
options.omitBackground
- Type
- boolean
- Description
If set to true, removes the default white background to enable the capture of screenshots with transparency.
- Name
options.encoding
- Type
- 'base64' | 'binary'
- Description
The type of encoding of the image.
- Name
options.captureBeyondViewport
- Type
- boolean
- Description
If set to true, captures beyond the viewport.
- Name
options.fromSurface
- Type
- boolean
- Description
If set to true, captures from the surface, rather than the view.
Example Screenshot Object
{
"use": true,
"options": {
"type": "png",
"fullPage": false,
"omitBackground": true,
"encoding": "binary",
"captureBeyondViewport": false,
"fromSurface": false,
}
}
Defaults to
{
"use": false,
"options": {
"type": "jpeg",
"fullPage": true,
"omitBackground": false,
"encoding": "binary",
"captureBeyondViewport": true,
"fromSurface": true
}
}
Optimizations
The optimizations object can contain the following properties:
- Name
skipDomains
- Type
- array
- Description
The Geonode scrapper will not load requests for the domains in this list, based on the includes() logic.
- Name
loadOnlySameOriginRequests
- Type
- boolean
- Description
If set to true, only requests from the same origin will be loaded, which can be used to optimize the bandwidth and speed of the request.
Example Optimizations Object
{
"skipDomains": [ "geonode" ],
"loadOnlySameOriginRequests": false,
}
Defaults to
{
"skipDomains": [],
"loadOnlySameOriginRequests": true,
}
Retries
The retries object can contain the following properties:
- Name
useRetries
- Type
- boolean
- Description
If set to true, retries will be used.
- Name
maxRetries
- Type
- number
- Description
The maximum number of retries.
Example Retries Object
{
"useRetries": true,
"maxRetries": 5,
}
Defaults to
{
"useRetries": true,
"maxRetries": 2,
}
Proxy
The proxy object can contain the following properties:
- Name
useOnlyResidential
- Type
- boolean
- Description
If set to true, only residential proxies will be used.
Example Proxy Object
{
"useOnlyResidential": true,
}
Defaults to
{
"useOnlyResidential": false,
}
Viewport
The viewport object can contain the following properties:
- Name
width
- Type
- number
- Description
The page width in pixels.
- Name
height
- Type
- number
- Description
The data type of the image.
- Name
deviceScaleFactor
- Type
- number
- Description
The device scale factor, which can be thought of as the device pixel ratio (DPR).
- Name
isMobile
- Type
- boolean
- Description
If set to true, the meta viewport tag will be taken into account when rendering the page.
- Name
hasTouch
- Type
- boolean
- Description
If set to true, the viewport will support touch events.
- Name
isLandscape
- Type
- boolean
- Description
If set to true, the viewport will be in landscape mode.
Example Viewport Object
{
"width": 800,
"height": 640,
"deviceScaleFactor": 0.5,
"isMobile": true,
"hasTouch": true,
"isLandscape": true,
}
Defaults to
{
"width": 1280,
"height": 800,
}