I have a RPi 4 and I want, via terminal, to generate a website.html that has the complete rendered html of a webpage.
I want to do this for example in order to search the whole page for a string or pattern etc…
I can do this using something like wget
or curl
for example wget -O website.html https://www.example.com
The above is all I want, however it doesn’t support javascript.
Some websites (like Google) have almost everything in javascript, so I cannot get the final html by that way.
- I have been searching all day for a working solution, and I have
found that I need something like a headless browser. I have tried
things like
PhantomJs
but they don’t work and are not longer maintained. - I have tried
Puppeteer
but I was only able to grab a screenshot. Not the Html. I thought thatpage.content()
had what I wanted but I couldn’t get it/write it to a file. When Iconsole.log
ed it I saw javascript there as well… If someone knows how to do that (write a file with the final html) using Puppeteer then please tell me.
Isn’t there any ‘easy’ solution like wget
that does javascript as well?
Isn’t there a simple workflow/instructions in order to achieve something like this?
If you could tell me some working commands to do this please tell me. I find some tools very complicated and I am not familiar with all programming languages in order to make this work.
Any help would be greatly appreciated.
Advertisement
Answer
If you get Node.js and Puppeteer installed, you can use this simple script to get the HTML with JavaScript executed. Use it as:
node script.js url pagename
For test purposes, the default url
is 'http://example.com/'
and the default pagename
is 'page-timestamp.html'
in the current directory.
const fs = require('fs'); const puppeteer = require('puppeteer'); const url = process.argv[2] || 'http://example.com/'; const path = process.argv[3] || `page-${Date.now()}.html`; (async function main() { const browser = await puppeteer.launch(); const [page] = await browser.pages(); await page.goto(url, { waitUntil: 'networkidle0' }); fs.writeFileSync(path, await page.content()); await browser.close(); })().catch(console.error);