Overview
Welcome to the Geonode Scraper API! This overview concisely explains the various configuration options available to you. Each option is designed to give you flexibility and control over how the scraper interacts with web pages. Let's dive in:
- Name
js_render
- Type
- boolean
- default
- (default = false)
- Description
Determines if the scraper should execute JavaScript on the target page, which can be essential for dynamic websites.
- Name
is_json_response
- Type
- boolean
- default
- (default = false)
- Description
Ideal for URLs that yield API responses. When enabled, the scraper expects a JSON response instead of a typical web document.
- Name
block_resources
- Type
- boolean
- default
- (default = true)
- Description
Helps save bandwidth by blocking non-essential resources like images and CSS during scraping.
- Name
keep_headers
- Type
- boolean
- default
- (default = false)
- Description
Ensures specific headers are forwarded to the webpage, enhancing the scraper's interaction with the target site.
- Name
debug
- Type
- boolean
- default
- (default = false)
- Description
A diagnostic tool. When enabled with response_format set to 'json', it provides detailed insights about the scraping process.
- Name
response_format
- Type
- string (html, json)
- default
- (default = html)
- Description
Dictates the format of the scraper's output. 'html' provides the raw page content, while 'json' offers a structured response with additional details.
- Name
mode
- Type
- string (SPA , longPolling, domLoaded, documentLoaded, load)
- default
- (default = load)
- Description
Helps save bandwidth by blocking non-essential resources like images and CSS during scraping.
- Name
waitForSelector
- Type
- string
- default
- (default = null)
- Description
A powerful feature that instructs the scraper to wait for a specific DOM element to appear before concluding the request. Useful for pages with delayed content loading.
- Name
device_type
- Type
- string
- default
- (default = null)
- Description
Emulates different devices, allowing you to see content as it appears on various devices.
- Name
country_code
- Type
- string
- default
- (default = null)
- Description
Designates a specific country for proxy usage, ensuring content is accessed as viewed from that region.
- Name
cookies
- Type
- object[]
- default
- (default = [])
- Description
Enables setting browser-specific cookies, which can be crucial for accessing certain web content.
- Name
localStorage
- Type
- object
- default
- (default = [])
- Description
Sets specific local storage data in the browser, further customizing the scraping environment.
- Name
HTMLMinifier
- Type
- boolean
- default
- (default = useMinifier: false)
- Description
If enabled, the scraper will return a minified version of the HTML, reducing the response size.
- Name
collect_data_from_requests
- Type
- boolean
- default
- (default = false)
- Description
Captures additional data from 'fetch' and 'xhr' responses made by the webpage, ensuring dynamically loaded content is also retrieved.
- Name
optimizations
- Type
- object
- default
- (default = { skipDomains:[], loadOnlySameOriginRequests:true })
- Description
Enhancements to optimize bandwidth and speed. Can exclude requests to specific domains or prioritize same-origin requests.
- Name
retries
- Type
- object
- default
- (default = { useRetries: true ,maxRetries: 2 })
- Description
Provides resilience by controlling the number of retry attempts if a scraping request encounters issues.
- Name
proxy
- Type
- object
- default
- (default = { useOnlyResidential: false | boolean })
- Description
Specifies the proxy type. When set to true, the scraper prioritizes using residential proxies, emulating typical residential users.
- Name
screenshot
- Type
- object
- default
- (default = { use: false, options: {}} )
- Description
Configuration settings for capturing visual snapshots of the web pages being scraped.
- Name
viewport
- Type
- object
- default
- (default = { width : 1280, height : 800 })
- Description
Customizes the virtual browser window's dimensions and characteristics, ensuring web pages are rendered as desired.
- Name
js_scenario
- Type
- object
- default
- (default = { actions: [ ] })
- Description
Enables scripted interactions with the target page, emulating user behaviors like clicks, scrolls, and form inputs.