I was experimenting with puppeteer and I built a simple scraper that gets information from youtube and it works fine what I was trying to add was to display that scraped information on my web page with <p>
tags. Is there any way to do this? Where I’m am stuck is my name
and avatarUrl
variables are inside my scrape function as a local variable so how can I get those values and insert them in my <p>
tag. For a rough sketch of what I tried, I did: document.getElementById('nameId')=name;
after importing my js script(on HTML side) but this wont work because name
is a local variable and it can’t be accessed outside the scope. Any help is appreciated. Thanks in advance
const puppeteer = require('puppeteer'); async function scrapeChannel(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); const [el] = await page.$x('/html/body/ytd-app/div/ytd-page-manager/ytd-browse/div[3]/ytd-c4-tabbed-header-renderer/tp-yt-app-header-layout/div/tp-yt-app-header/div[2]/div[2]/div/div[1]/div/div[1]/ytd-channel-name/div/div/yt-formatted-string'); const text = await el.getProperty('textContent'); const name = await text.jsonValue(); const [el2] = await page.$x('//*[@id="img"]'); const src = await el2.getProperty('src'); const avatarURL = await src.jsonValue(); browser.close(); console.log({ name, avatarlURL }) return { name, avatarURL } } scrapeChannel('https://www.youtube.com/channel/UCQOtt1RZbIbBqXhRa9-RB5g') module.exports = { scrapeChannel, }
<body onload="scrapeChannel()"> <p id="nameId">'put the scraped name here'</p> <p id="avatarUrlId">'put the scraped avatar url here'</p> <!-- document.getElementById('nameId')=name; document.getElementById('avatartUrlId')=avatarURL; --> </body>
Advertisement
Answer
I have used cheerio in one of my projects and this is what I did in the backend and in the front end.
Node & Express JS Backend
In order to access your backend from the frontend, you need to set Routes in your backend. All your frontend requests are redirected to these routes. For more information read this Express Routes.
E.g Route.js code
const router = require("express").Router(); const { callscrapeChannel } = require("../scrape-code/scrape"); router.route("/scrapedata").get(async (req, res) => { const Result = await callscrapeChannel(); return res.json(Result); }); module.exports = router;
scrapeChannel.js file
const puppeteer = require('puppeteer'); async function scrapeChannel(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); const [el] = await page.$x('/html/body/ytd-app/div/ytd-page-manager/ytd-browse/div[3]/ytd-c4-tabbed-header-renderer/tp-yt-app-header-layout/div/tp-yt-app-header/div[2]/div[2]/div/div[1]/div/div[1]/ytd-channel-name/div/div/yt-formatted-string'); const text = await el.getProperty('textContent'); const name = await text.jsonValue(); const [el2] = await page.$x('//*[@id="img"]'); const src = await el2.getProperty('src'); const avatarURL = await src.jsonValue(); browser.close(); console.log({ name, avatarURL }) return { name, avatarURL } } async function callscrapeChannel() { const data = await scrapeChannel('https://www.youtube.com/channel/UCQOtt1RZbIbBqXhRa9-RB5g') return data } module.exports = { callscrapeChannel, }
in your server.js file
const express = require("express"); const cors = require("cors"); const scrapeRoute = require("./Routes/routes"); require("dotenv").config({ debug: process.env.DEBUG }); const port = process.env.PORT || 5000; const app = express(); app.use(cors()); app.use(express.json()); app.use("/api", scrapeRoute); app.listen(port, () => { console.log(`server is running on port: http://localhost:${port}`); });
dependencies you need (package.json)
"dependencies": { "axios": "^0.21.1", "body-parser": "^1.19.0", "cors": "^2.8.5", "cross-env": "^7.0.3", "dotenv": "^8.2.0", "esm": "^3.2.25", "express": "^4.17.1", "nodemon": "^2.0.7", "puppeteer": "^8.0.0" }
Frontend
In the front-end, I have used fetch. You need to send a get request to your backend. All you have to do is
<html> <head> <script> async function callScrapeData(){ await fetch(`http://localhost:5000/api/scrapedata`) .then((res) => { return new Promise((resolve, reject) => { setTimeout(()=> { resolve(res.json()) }, 1000) }) }).then((response) => { console.log(response) document.getElementById("nameId").innerHTML = response.name document.getElementById("avatartUrlId").innerHTML = response.avatarURL } ) } </script> </head> <body> <div> <h1>scrape</h1> <p id="nameId"></p> <p id="avatartUrlId"></p> <button onclick="callScrapeData()">click</button> </div> </body> </html>
Remember, my backend server is running on port 5000
The above code is just an example and I have modified it to fit your question. I hope this helps you to some extent. It’s straightforward. Let me know if you have any questions.
Note: I assume you have a server.js file in your backend and it is configured properly.