Skip to content
Advertisement

How to do a web scraping using Puppeteer and publish it?

I would like to do a web-scraping using Puppeteer. It would be to obtain data from an external URL when the user clicks a button within my application. My application would have to visit an external URL, fill out a form, click on a button, get the data returned and display it to the user within my application. It is an internal project, at first, a small project.

I wrote a test code to use Puppeteer using the website:
https://try-puppeteer.appspot.com/

Worked perfectly. Great!

  1. However, I was unable to get my code to run on my domain with shared hosting on Locaweb. It seems to me that I depend on changes on the server to be able to run Puppeteer, is that right?

  2. Is there a free place where I can host my code and run with Puppeteer like I did at https://try-puppeteer.appspot.com/?
    If you don’t have a free option to suggest, could you suggest a low cost option that works?

Thank you!

Answer

Steps:

  1. You would need to create a simple expressjs api.
  2. Host the api somewhere (there are thousands of vps and cloud hosting providers like digitalocean/linode etc.)
  3. Access that rest api using frontend (typically an ajax call)

There is indeed some free services, but they would be a bit more complex. You would need to learn about some serverless functions. Try searching for,

  • AWS lambda
  • Netlify functions
  • Firebase functions
  • Google cloud functions
  • Google cloud run.
Advertisement