Extract From Selectors
setDataToExtract: Object | {};
What's This?
The extractFromSelectors configuration property allows you to define a dictionary where the keys represent the names you want to give to the extracted data, and the values are the corresponding CSS/XPATH selectors for scraping that data. This makes it easy to extract multiple pieces of information from a webpage in a structured manner.
Properties
-
key: The name you want to give to the extracted data (e.g., 'title', 'priceWhole').
-
value: The CSS/XPATH selector that targets the HTML element containing the data you want to extract.
How Does it Work?
By default, the scraper captures the entire webpage content. With extractFromSelectors, you can specify what data to extract:
-
Default (extractFromSelectors=): The scraper retrieves the entire content of the webpage.
-
Example: If you set extractFromSelectors={title: '.title', priceWhole: '.priceWhole'}, the scraper will extract the data corresponding to these selectors and return it in a dictionary.
When Should I Use This?
Use extractFromSelectors when you want to scrape specific pieces of data from a webpage and label them with custom names. This is particularly useful for structured data extraction.
Methods
setDataToExtract(Object)
Configuration Name: device_type
Initializing Scraper
const GeonodeScraperApi = require('geonode-scraper-api');
const scraper = new GeonodeScraperApi('<Your_username>', '<Your_password>');
Using Method
const dataToExtract = {
title: `.s-title-instructions-style`,
priceWhole: `.puis-padding-right-small > span.a-price-whole`,
priceFraction: `span > span:nth-child(2) > span.a-price-fraction`,
};
scraper.setDataToExtract(dataToExtract);
scraper.scrape('https://example.com/');
Using Configuration Object
const selectors = {
title: `.s-title-instructions-style`,
priceWhole: `.puis-padding-right-small > span.a-price-whole`,
priceFraction: `span > span:nth-child(2) > span.a-price-fraction`,
};
const config: IConfigurations = { extractFromSelectors: selectors };
scraper.scrape('https://example.com/', config);