when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried - What should i do to get the whole of tags that were used in react website. And do tell i can scrape the hackernoon website

Getting index.html content while trying to scrape a react website

when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried –

    const request = require("request");
    const cheerio = require("cheerio");

    const URL = "https://pydata-jal.netlify.com/";

    request(URL, (err, res, body) => {
      if (!err && res.statusCode == 200) {
        const $ = cheerio.load(body);
        console.log($.html());
      }
    });

JavaScript
​x
 
    const request = require("request");
    const cheerio = require("cheerio");
​
    const URL = "https://pydata-jal.netlify.com/";
​
    request(URL, (err, res, body) => {
      if (!err && res.statusCode == 200) {
        const $ = cheerio.load(body);
        console.log($.html());
      }
    });
​

What should i do to get the whole of tags that were used in react website.

And do tell i can scrape the hackernoon website ? (for just example) if its legal?

Answer

Cheerio parses only already rendered HTML (eg: static HTML) In order to get the React render you should rely on headless browsers controlled with tools like Puppeteer

Advertisement

Answer