Skip to content
Advertisement

How to get the complete html AFTER javascript on RPi in a file

I have a RPi 4 and I want, via terminal, to generate a website.html that has the complete rendered html of a webpage. I want to do this for example in order to search the whole page for a string or pattern etc… I can do this using something like wget or curl for example wget -O website.html https://www.example.com The above is all I want, however it doesn’t support javascript.

Some websites (like Google) have almost everything in javascript, so I cannot get the final html by that way.

  • I have been searching all day for a working solution, and I have found that I need something like a headless browser. I have tried things like PhantomJs but they don’t work and are not longer maintained.
  • I have tried Puppeteer but I was only able to grab a screenshot. Not the Html. I thought that page.content() had what I wanted but I couldn’t get it/write it to a file. When I console.loged it I saw javascript there as well… If someone knows how to do that (write a file with the final html) using Puppeteer then please tell me.

Isn’t there any ‘easy’ solution like wget that does javascript as well? Isn’t there a simple workflow/instructions in order to achieve something like this?

If you could tell me some working commands to do this please tell me. I find some tools very complicated and I am not familiar with all programming languages in order to make this work.

Any help would be greatly appreciated.

Advertisement

Answer

If you get Node.js and Puppeteer installed, you can use this simple script to get the HTML with JavaScript executed. Use it as:

node script.js url pagename

For test purposes, the default url is 'http://example.com/' and the default pagename is 'page-timestamp.html' in the current directory.

const fs = require('fs');
const puppeteer = require('puppeteer');

const url = process.argv[2] || 'http://example.com/';
const path = process.argv[3] || `page-${Date.now()}.html`;

(async function main() {
  const browser = await puppeteer.launch();
  const [page] = await browser.pages();

  await page.goto(url, { waitUntil: 'networkidle0' });
  fs.writeFileSync(path, await page.content());

  await browser.close();
})().catch(console.error);
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement