Optimizations
optimizations: object;
What's This?
The optimizations option is like a filter for your Scraper. It helps you tell the Scraper exactly where to look and where not to, ensuring it only fetches the data you're interested in, saving time and resources.
Properties
-
skipDomains: A list of domains for which the Geonode scraper will not load requests. If a domain is in this list, any request to it will be skipped, based on the includes() logic. Default is an empty array ([]).
-
loadOnlySameOriginRequests: When set to true, the Scraper only fetches data directly from the website you're targeting, ignoring any external sources. This is on by default. Default is true.
How Does it Work?
If you set optimizations= {skipDomains: ["example.com"], loadOnlySameOriginRequests: true}, the Scraper won't fetch data from "example.com" and will only focus on the main website you're scraping.
When Should I Use This?
Use optimizations when you want your scraping to be more precise, especially if you're not interested in data from ads or other external sources.
Methods
setLoadOnlySameOriginRequests(boolean)
Configuration Name: optimizations.loadOnlySameOriginRequests
Initializing Scraper
const GeonodeScraperApi = require('geonode-scraper-api');
const scraper = new GeonodeScraperApi('<Your_username>', '<Your_password>');
Using Method
scraper.setLoadOnlySameOriginRequests(true);
scraper.scrape('https://example.com/');
Using Configuration Object
const config = {
optimizations: {
loadOnlySameOriginRequests: true,
},
};
scraper.scrape('https://example.com/', config);
addSkipDomains(string[])
Configuration Name: optimizations.skipDomains
Initializing Scraper
const GeonodeScraperApi = require('geonode-scraper-api');
const scraper = new GeonodeScraperApi('<Your_username>', '<Your_password>');
Using Method
scraper.addSkipDomains('example.com');
scraper.scrape('https://example.com/');
Using Configuration Object
const config = {
optimizations: {
skipDomains: ['example.com'],
},
};
scraper.scrape('https://example.com/', config);