Why does headless need to be false for Puppeteer to work?

Question

I'm creating a web api that scrapes a given url and sends that back. I am using Puppeteer to do this. I asked this question: Puppeteer not behaving like in Developer Console and recieved an answer that suggested it would only work if headless was set to be false. I don't want to be constantly opening up a browser UI

Accepted Answer

The reason it might work in UI mode but not headless is that sites who aggressively fight scraping will detect that you are running in a headless browser.Some possible workarounds:Use puppeteer-extraFound here: https://github.com/berstend/puppeteer-extraCheck out their docs for how to use it. It has a couple plugins that might help in getting past headless-mode detection:puppeteer-extra-plugin-anonymize-ua &#8212; anonymizes your User Agent. Note that this might help with getting past headless mode detection, but as you&#8217;ll see if you visit https://amiunique.org/ it is unlikely to be enough to keep you from being identified as a repeat visitor.puppeteer-extra-plugin-stealth &#8212; this might help win the cat-and-mouse game of not being detected as headless. There are many tricks that are employed to detect headless mode, and as many tricks to evade them.Run a &#8220;real&#8221; Chromium instance/UIIt&#8217;s possible to run a single browser UI in a manner that let&#8217;s you attach puppeteer to that running instance. Here&#8217;s an article that explains it: https://medium.com/@jaredpotter1/connecting-puppeteer-to-existing-chrome-window-8a10828149e0Essentially you&#8217;re starting Chrome or Chromium (or Edge?) from the command line with --remote-debugging-port=9222 (or any old port?) plus other command line switches depending on what environment you&#8217;re running it in. Then you use puppeteer to connect to that running instance instead of having it do the default behavior of launching a headless Chromium instance: const browser = await puppeteer.connect({ browserURL: ENDPOINT_URL });.  Read the puppeteer docs here for more info: https://pptr.dev/#?product=Puppeteer&version=v5.2.1&show=api-puppeteerlaunchoptionsThe ENDPOINT_URL is displayed in the terminal when you launch the browser from the command line with the --remote-debugging-port=9222 option.This option is going to require some server/ops mojo, so be prepared to do a lot more Stack Overflow searches. 🙂There are other strategies I&#8217;m sure but those are the two I&#8217;m most familiar with. Good luck!

Why does headless need to be false for Puppeteer to work?

Advertisement

Answer

Use `puppeteer-extra`

Run a “real” Chromium instance/UI