Skip to content
Advertisement

I’m trying to scrap data from a website and getting back basic HTML with JS function in the body

Hi everyone,

I’m playing around with Node.js and cheerio package as part of my node.js learning and im trying to build a web scrapper that will get the title and the price of an item from a shopping site but when I try to console.log the html variable it returns a basic html structure with some Js functions that are trying to prevent the scraping.

my code:

JavaScript

I guess it’s some kind of protection layer from scrapers but this what i get as a result:

JavaScript

Any idea how can i overcome this ? Thanks everyone

Advertisement

Answer

This likely is not scraper protection. Instead, this site is probably using some web framework that loads in the viewable data and DOM elements after the JS has run. The easiest way to get past this would be to use a library like puppeteer that will load the site and process it like how a real browser would. Here is a basic example of what you might want:

JavaScript

You can read more about puppeteer more broadly, method 1, method 2 and method 3.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement