when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried –
JavaScript
x
12
12
1
const request = require("request");
2
const cheerio = require("cheerio");
3
4
const URL = "https://pydata-jal.netlify.com/";
5
6
request(URL, (err, res, body) => {
7
if (!err && res.statusCode == 200) {
8
const $ = cheerio.load(body);
9
console.log($.html());
10
}
11
});
12
What should i do to get the whole of tags that were used in react website.
And do tell i can scrape the hackernoon website ? (for just example) if its legal?
Advertisement
Answer
Cheerio parses only already rendered HTML (eg: static HTML) In order to get the React render you should rely on headless browsers controlled with tools like Puppeteer