Optimizations

optimizations: object;

What's This?

The optimizations option is like a filter for your Scraper. It helps you tell the Scraper exactly where to look and where not to, ensuring it only fetches the data you're interested in, saving time and resources.

Properties

  • skipDomains: A list of domains for which the Geonode scraper will not load requests. If a domain is in this list, any request to it will be skipped, based on the includes() logic. Default is an empty array ([]).

  • loadOnlySameOriginRequests: When set to true, the Scraper only fetches data directly from the website you're targeting, ignoring any external sources. This is on by default. Default is true.

How Does it Work?

If you set optimizations= {skipDomains: ["example.com"], loadOnlySameOriginRequests: true}, the Scraper won't fetch data from "example.com" and will only focus on the main website you're scraping.

When Should I Use This?

Use optimizations when you want your scraping to be more precise, especially if you're not interested in data from ads or other external sources.

Methods

setLoadOnlySameOriginRequests(boolean) 

Configuration Name: optimizations.loadOnlySameOriginRequests 

Initializing Scraper

const GeonodeScraperApi = require('geonode-scraper-api');
const scraper = new GeonodeScraperApi('<Your_username>', '<Your_password>');

Using Method

scraper.setLoadOnlySameOriginRequests(true);
scraper.scrape('https://example.com/');

Using Configuration Object

const config = {
  optimizations: {
    loadOnlySameOriginRequests: true,
  },
};
scraper.scrape('https://example.com/', config);

addSkipDomains(string[]) 

Configuration Name: optimizations.skipDomains 

Initializing Scraper

const GeonodeScraperApi = require('geonode-scraper-api');
const scraper = new GeonodeScraperApi('<Your_username>', '<Your_password>');

Using Method

scraper.addSkipDomains('example.com');
scraper.scrape('https://example.com/');

Using Configuration Object

const config = {
  optimizations: {
    skipDomains: ['example.com'],
  },
};
scraper.scrape('https://example.com/', config);