Why does headless need to be false for Puppeteer to work?

Question

I&#8217;m creating a web api that scrapes a given url and sends that back. I am using Puppeteer to do this. I asked this question: Puppeteer not behaving like in Developer Console and recieved an answer that suggested it would only work if headless was set to be false. I don&#8217;t want to be constantly open…

Accepted Answer

The reason it might work in UI mode but not headless is that sites who aggressively fight scraping will detect that you are running in a headless browser.Some possible workarounds:Use puppeteer-extraFound here: https://github.com/berstend/puppeteer-extraCheck out their docs for how to use it. It has a couple plugins that might help in getting past headless-mode detection:puppeteer-extra-plugin-anonymize-ua &#8212; anonymizes your User Agent. Note that this might help with getting past headless mode detection, but as you&#8217;ll see if you visit https://amiunique.org/ it is unlikely to be enough to keep you from being identified as a repeat visitor.puppeteer-extra-plugin-stealth &#8212; this might help win the cat-and-mouse game of not being detected as headless. There are many tricks that are employed to detect headless mode, and as many tricks to evade them.Run a &#8220;real&#8221; Chromium instance/UIIt&#8217;s possible to run a single browser UI in a manner that let&#8217;s you attach puppeteer to that running instance. Here&#8217;s an article that explains it: https://medium.com/@jaredpotter1/connecting-puppeteer-to-existing-chrome-window-8a10828149e0Essentially you&#8217;re starting Chrome or Chromium (or Edge?) from the command line with --remote-debugging-port=9222 (or any old port?) plus other command line switches depending on what environment you&#8217;re running it in. Then you use puppeteer to connect to that running instance instead of having it do the default behavior of launching a headless Chromium instance: const browser = await puppeteer.connect({ browserURL: ENDPOINT_URL });.  Read the puppeteer docs here for more info: https://pptr.dev/#?product=Puppeteer&version=v5.2.1&show=api-puppeteerlaunchoptionsThe ENDPOINT_URL is displayed in the terminal when you launch the browser from the command line with the --remote-debugging-port=9222 option.This option is going to require some server/ops mojo, so be prepared to do a lot more Stack Overflow searches. 🙂There are other strategies I&#8217;m sure but those are the two I&#8217;m most familiar with. Good luck!

Why does headless need to be false for Puppeteer to work?

Advertisement

Answer

Use `puppeteer-extra`

Run a “real” Chromium instance/UI