Puppeteer in chrome extension, without puppeteer-web

Question

Is it possible to create a chrome extension , containing a puppeteer script to scrape and do some browser automation. I would like to create one where a user would enter a url click a button then a puppeteer script runs, is this possible if so what would be the best way to implement? Seen some answers referring to puppeteer-web,

Accepted Answer

The short answer is: no, it is not possible.Puppeteer runs only on Node.Js at the moment which means it is a backend side solution, there is no alternative way to run your script other than running it on a server (browser extension is considered client-side).In theory:*However, you could use Express to expose your puppeteer results to an API endpoint, where you could define which page you want to scrape with a GET url parameter (e.g. Google&#8217;s homepage: https://my-server.com/my-puppeteer-endpoint?url=https://google.com). This could be called by your extension&#8217;s click.Note: this means https://my-server.com should be available 24/7 to serve your extension. As an example, this is how Grammarly or Google Translate browser extensions communicate with their official APIs.Fragments of the advised solution:// puppeteerconst getPage = async (url) => {...  await page.goto(url)...  return resultsOfScraping}// expressapp.get('/my-puppeteer-endpoint', async (req, res) => {  try {    const url = req.query.url    const response = await getPage(url)    res.json(response)    console.log(`/my-puppeteer-endpoint?url=${url} endpoint has been called!`)  } catch (e) {    console.error(e)  }})You can get more ideas from Thomas Dondorf&#8217;s evergreen answer on client-side puppeteer usage: How to make Puppeteer work with a ReactJS application on the client-sideOn the extension side, you need to make sure that you give permission to your server https://my-server.com to be called without CORS errors, see this question/answer.*EDIT/WARNING: as on the server you will need the '--no-sandbox' puppeteer launch flag, in general, I advise instead to set up your own sandbox on a Linux server if you&#8217;d go this way (see in the link above).Another possible way would be if you&#8217;d create a whitelisted domain list where you could allow pages you trust, others would be forbidden by the extension (required to be implemented on the server-side).

Advertisement

Answer