I’m trying to scrape information from a webpage behind a login wall for two users. As it stands, I’ve managed to get the code to do what I want for the first user i.e. go to webpage, login, gather the links associated with properties in a saved list, use that list to gather more details and log them to console.
The challenge I have now is getting the code to loop this round the second user without having to dupe the code. How would you suggest I go about it?
Secondly I need to make the array for each user, declared as uniquePropertyLinks
in the below, accessible outside of the function userProcess.
How can I produce a new array for each user?
How can I access the array outside the function?
Here is the code:
const puppeteer = require('puppeteer');
//Code to locate text and enable it to be clicked
const escapeXpathString = str => {
const splitedQuotes = str.replace(/'/g, `', "'", '`);
return `concat('${splitedQuotes}', '')`;
};
const clickByText = async (page, text) => {
const escapedText = escapeXpathString(text);
const linkHandlers = await page.$x(`//a[contains(text(), ${escapedText})]`);
if (linkHandlers.length > 0) {
await linkHandlers[0].click();
} else {
throw new Error(`Link not found: ${text}`);
}
};
//User credentials
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
//Logout
const LogOut = async (page) => {
await page.goto('https://www.website.com');
await clickByText(page, 'Log out');
await page.waitForNavigation({waitUntil: 'load'});
console.log('Signed out');
};
///////////////////////////
//SCRAPE PROCESS
async function userProcess() {
try {
const browser = await puppeteer.launch({ headless : false });
const page = await browser.newPage();
page.setUserAgent('BLAHBLAHBLAH');
//Go to Website saved list
await page.goto('https://www.website.com/shortlist.html', {waitUntil: 'networkidle2'});
console.log('Page loaded');
//User A log in
await page.type('input[name=email]', userAEmail, {delay: 10});
await page.type('input[name=password]', userAPassword, {delay: 10});
await page.click('.mrm-button',{delay: 10});
await page.waitForNavigation({waitUntil: 'load'})
console.log('Signed in');
//Wait for website saved list to load
const propertyList = await page.$$('.title');
console.log(propertyList.length);
//Collecting links from saved list and de-duping into an array
const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href));
let uniquePropertyLinks = [new Set(propertyLinks)];
console.log(uniquePropertyLinks);
//Sign out
LogOut(page);
} catch (err) {
console.log('Our error - ', err.message);
}
};
userProcess();
Advertisement
Answer
Let’s see some of the things you might need to complete your task. I think it’s better to take time and develop the skills yourself, but I can perhaps point out a few key things.
You use:
const userAEmail = 'abc@hotmail.com';
const userAPassword = '123';
const userBEmail = 'def@hotmail.com';
const userBPassword = '456';
but then you’re talking about looping. With such a data structure, it will be difficult to loop these two users. I recommend putting it into an object like so:
const users = {
a: {
email: 'abc@hotmail.com',
password: '123',
},
b: {
email: 'def@hotmail.com',
password: '456',
},
};
then you can easily look with for example for .. in
:
for (const user in users) {
console.log(users[user]);
}
or with .forEach()
:
Object.values(users).forEach(user => {
console.log(user);
});
need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.
Then declare the array outside of the funtion:
let uniquePropertyLinks = [];
async function userProcess() {
// you can access uniquePropertyLinks here
}
// and you can access uniquePropertyLinks here as well
How can I produce a new array for each user? How can I access the array outside the function?
Again, it’d be better to choose a differen data structure, let’s day an object with keys that would represent each user and values would be arrays. It’d look like so:
let uniquePropertyLinks = {};
uniquePropertyLinks.a = [];
uniquePropertyLinks.b = [];
which looks like this:
{ a: [], b: [] }
so you can save whatever values for user a into uniquePropertyLinks.a
array and whatever values you need into uniquePropertyLinks.b
array:
uniquePropertyLinks.a.push('new_value_for_a_user');
similarly for user b.
Now you should have all the bits you need in order to go back to your code and make the necessary changes.