Skip to content
Advertisement

Puppeteer , bringing back blank array

I’m trying to grab products from ebay and open them on amazon.

So far, I have them being searched on amazon but I’m struggling with getting the products selected from the search results.

Currently its outputting a blank array and im not sure why. Have tested in a separate script without the grabTitles and the for loop. So im guessing there is something in that causing an issue.

Is there something i am missing here thats preventing the data coming back for prodResults?

const puppeteer = require('puppeteer');

const URL = "https://www.amazon.co.uk/";
const selectors = {
  searchBox: '#twotabsearchtextbox',
  productLinks: 'span.a-size-base-plus.a-color-base.a-text-normal',
  productTitle: '#productTitle'
};

(async() => {
  const browser = await puppeteer.launch({
    headless: false
  });
  const page = await browser.newPage();
  await page.goto('https://www.ebay.co.uk/sch/jmp_supplies/m.html?_trkparms=folent%3Ajmp_supplies%7Cfolenttp%3A1&rt=nc&_trksid=p2046732.m1684');

  //Get product titles from ebay
  const grabTitles = await page.evaluate(() => {
    const itemTitles = document.querySelectorAll('#e1-11 > #ResultSetItems > #ListViewInner > li > .lvtitle > .vip');
    var items = []
    itemTitles.forEach((tag) => {
      items.push(tag.innerText)
    })
    return items
  })

  //Search for the products on amazon in a new tab for each product 
  for (i = 0; i < grabTitles.length; i++) {

    const page = await browser.newPage();

    await page.goto(URL)
    await page.type(selectors.searchBox, grabTitles[i++])
    await page.keyboard.press('Enter');

    //get product titles from amazon search results
    const prodResults = await page.evaluate(() => {
      const prodTitles = document.querySelectorAll('span.a-size-medium.a-color-base.a-text-normal');
      let results = []
      prodTitles.forEach((tag) => {
        results.push(tag.innerText)
      })
      return results
    })
    console.log(prodResults)
  }
})()

Advertisement

Answer

There are a few potential problems with the script:

  1. await page.keyboard.press('Enter'); triggers a navigation, but your code never waits for the navigation to finish before trying to select the result elements. Use waitForNavigation, waitForSelector or waitForFunction (not waitForTimeout).

    If you do wait for a navigation, there’s a special pattern using Promise.all needed to avoid a race condition, shown in the docs.

    Furthermore, you might be able to skip a page load by going directly to the search URL by building the string yourself. This should provide a significant speedup.

  2. Your code spawns a new page for every item that needs to be processed, but these pages are never closed. I see grabTitles.length as 60. So you’ll be opening 60 tabs. That’s a lot of resources being wasted. On my machine, it’d probably hang everything. I’d suggest making one page and navigating it repeatedly, or close each page when you’re done. If you want parallelism, consider a task queue or run a few pages simultaneously.

  3. grabTitles[i++] — why increment i here? It’s already incremented by the loop, so this appears to skip elements, unless your selectors have duplicates or you have some other reason to do this.

  4. span.a-size-medium doesn’t work for me, which could be locality-specific. I see a span.a-size-base-plus.a-color-base.a-text-normal, but you may need to tweak this to taste.

Here’s a minimal example. I’ll just do the first 2 items from the eBay array since that’s coming through fine.

const puppeteer = require("puppeteer"); // ^13.5.1

let browser;
(async () => {
  browser = await puppeteer.launch({headless: true});
  const [page] = await browser.pages();
  const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
  await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
  await page.setUserAgent(ua);
  const titles = [
    "Chloraethyl | Dr. Henning | Spray 175 ml",
    "Elmex Decays Prevention Toothpaste 2 x 75ml",
  ];

  for (const title of titles) {
    await page.goto("https://www.amazon.co.uk/");
    await page.type("#twotabsearchtextbox", title);
    await Promise.all([
      page.keyboard.press("Enter"),
      page.waitForNavigation(),
    ]);
    const titleSel = "a span.a-size-base-plus.a-color-base.a-text-normal";
    await page.waitForSelector(titleSel);
    const results = await page.$$eval(titleSel, els =>
      els.map(el => el.textContent)
    );
    console.log(title, results.slice(0, 5));
  }
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close())
;

Output:

Chloraethyl | Dr. Henning | Spray 175 ml [
  'Chloraethyl | Dr. Henning | Spray 175 ml',
  'Wild Fire (Shetland)',
  'A Dark Sin: A chilling British detective crime thriller (The Hidden Norfolk Murder Mystery Series Book 8)',
  'A POLICE DOCTOR INVESTIGATES: the Sussex murder mysteries (books 1-3)',
  'Rites of Spring: Sunday Times Crime Book of the Month (Seasons Quartet)'
]
Elmex Decays Prevention Toothpaste 2 x 75ml [
  'Janina Ultra White Whitening Toothpaste (75ml) – Diamond Formula. Extra Strength. Clinically Proven. Low Abrasion. For Everyday Use. Excellent for Stain Removal',
  'Elmex Decays Prevention Toothpaste 2 x 75ml',
  'Elmex Decays Prevention Toothpaste 2 x 75ml by Elmex',
  'Elmex Junior Toothpaste 2 x 75ml',
  'Elmex Sensitive Professional 2 x 75ml'
]

Note that I added user agents and headers to be able to use headless: true but it’s incidental to the main solution above. You can return to headless: false or check out canonical threads like How to avoid being detected as bot on Puppeteer and Phantomjs? and Why does headless need to be false for Puppeteer to work? if you have further issues with detection.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement