Skip to content
Advertisement

Puppeteer & cycling a process through multiple users

I’m trying to scrape information from a webpage behind a login wall for two users. As it stands, I’ve managed to get the code to do what I want for the first user i.e. go to webpage, login, gather the links associated with properties in a saved list, use that list to gather more details and log them to console.

The challenge I have now is getting the code to loop this round the second user without having to dupe the code. How would you suggest I go about it?

Secondly I need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.

How can I produce a new array for each user?

How can I access the array outside the function?

Here is the code:

const puppeteer = require('puppeteer');

//Code to locate text and enable it to be clicked
const escapeXpathString = str => {
  const splitedQuotes = str.replace(/'/g, `', "'", '`);
  return `concat('${splitedQuotes}', '')`;
};

const clickByText = async (page, text) => {
  const escapedText = escapeXpathString(text);
  const linkHandlers = await page.$x(`//a[contains(text(), ${escapedText})]`);
  
  if (linkHandlers.length > 0) {
    await linkHandlers[0].click();
  } else {
    throw new Error(`Link not found: ${text}`);
  }
};

//User credentials
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
  
//Logout
const LogOut = async (page) => {
  await page.goto('https://www.website.com');
  await clickByText(page, 'Log out');
  await page.waitForNavigation({waitUntil: 'load'});
  console.log('Signed out');
};


/////////////////////////// 
//SCRAPE PROCESS
async function userProcess() {
  try {

  const browser = await puppeteer.launch({ headless : false });
  const page = await browser.newPage();
  page.setUserAgent('BLAHBLAHBLAH');

  //Go to Website saved list
  await page.goto('https://www.website.com/shortlist.html', {waitUntil: 'networkidle2'});
  console.log('Page loaded');

  
  //User A log in
  await page.type('input[name=email]', userAEmail, {delay: 10});
  await page.type('input[name=password]', userAPassword, {delay: 10});
  await page.click('.mrm-button',{delay: 10});
  await page.waitForNavigation({waitUntil: 'load'})
  console.log('Signed in');

  //Wait for website saved list to load
  const propertyList = await page.$$('.title');
  console.log(propertyList.length);

  //Collecting links from saved list and de-duping into an array
  const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
  let uniquePropertyLinks = [...new Set(propertyLinks)];
  console.log(uniquePropertyLinks);

  //Sign out
  LogOut(page);

} catch (err) {
    console.log('Our error - ', err.message);
  } 
  
};

userProcess();

Advertisement

Answer

Let’s see some of the things you might need to complete your task. I think it’s better to take time and develop the skills yourself, but I can perhaps point out a few key things.

You use:

const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';

but then you’re talking about looping. With such a data structure, it will be difficult to loop these two users. I recommend putting it into an object like so:

const users = {
    a: {
        email: 'abc@hotmail.com',
        password: '123',
    },
    b: {
        email: 'def@hotmail.com',
        password: '456',
    },
};

then you can easily look with for example for .. in:

for (const user in users) {
    console.log(users[user]);
}

or with .forEach():

Object.values(users).forEach(user => {
    console.log(user);
});

need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.

Then declare the array outside of the funtion:

let uniquePropertyLinks = [];

async function userProcess() {
    // you can access uniquePropertyLinks here
}

// and you can access uniquePropertyLinks here as well

How can I produce a new array for each user? How can I access the array outside the function?

Again, it’d be better to choose a differen data structure, let’s day an object with keys that would represent each user and values would be arrays. It’d look like so:

let uniquePropertyLinks = {};

uniquePropertyLinks.a = [];
uniquePropertyLinks.b = [];

which looks like this:

{ a: [], b: [] }

so you can save whatever values for user a into uniquePropertyLinks.a array and whatever values you need into uniquePropertyLinks.b array:

uniquePropertyLinks.a.push('new_value_for_a_user');

similarly for user b.

Now you should have all the bits you need in order to go back to your code and make the necessary changes.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement