when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried –
const request = require("request"); const cheerio = require("cheerio"); const URL = "https://pydata-jal.netlify.com/"; request(URL, (err, res, body) => { if (!err && res.statusCode == 200) { const $ = cheerio.load(body); console.log($.html()); } });
What should i do to get the whole of tags that were used in react website.
And do tell i can scrape the hackernoon website ? (for just example) if its legal?
Advertisement
Answer
Cheerio parses only already rendered HTML (eg: static HTML) In order to get the React render you should rely on headless browsers controlled with tools like Puppeteer