Skip to content
Advertisement

Puppeteer in chrome extension, without puppeteer-web

Is it possible to create a chrome extension , containing a puppeteer script to scrape and do some browser automation.

I would like to create one where a user would enter a url click a button then a puppeteer script runs, is this possible if so what would be the best way to implement?

Seen some answers referring to puppeteer-web, but seems the Puppeteer team removed puppeteer-web, is there a new way of implementing this?

Advertisement

Answer

The short answer is: no, it is not possible.

Puppeteer runs only on Node.Js at the moment which means it is a backend side solution, there is no alternative way to run your script other than running it on a server (browser extension is considered client-side).

In theory:*
However, you could use Express to expose your puppeteer results to an API endpoint, where you could define which page you want to scrape with a GET url parameter (e.g. Google’s homepage: https://my-server.com/my-puppeteer-endpoint?url=https://google.com). This could be called by your extension’s click.

Note: this means https://my-server.com should be available 24/7 to serve your extension. As an example, this is how Grammarly or Google Translate browser extensions communicate with their official APIs.

Fragments of the advised solution:

// puppeteer
const getPage = async (url) => {
...
  await page.goto(url)
...
  return resultsOfScraping
}
// express
app.get('/my-puppeteer-endpoint', async (req, res) => {
  try {
    const url = req.query.url
    const response = await getPage(url)
    res.json(response)
    console.log(`/my-puppeteer-endpoint?url=${url} endpoint has been called!`)
  } catch (e) {
    console.error(e)
  }
})

You can get more ideas from Thomas Dondorf’s evergreen answer on client-side puppeteer usage: How to make Puppeteer work with a ReactJS application on the client-side


On the extension side, you need to make sure that you give permission to your server https://my-server.com to be called without CORS errors, see this question/answer.


*EDIT/WARNING: as on the server you will need the '--no-sandbox' puppeteer launch flag, in general, I advise instead to set up your own sandbox on a Linux server if you’d go this way (see in the link above).

Another possible way would be if you’d create a whitelisted domain list where you could allow pages you trust, others would be forbidden by the extension (required to be implemented on the server-side).

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement