I’m trying to scrape information from a webpage behind a login wall for two users. As it stands, I’ve managed to get the code to do what I want for the first user i.e. go to webpage, login, gather the links associated with properties in a saved list, use that list to gather more details and log them to console.
The challenge I have now is getting the code to loop this round the second user without having to dupe the code. How would you suggest I go about it?
Secondly I need to make the array for each user, declared as uniquePropertyLinks
in the below, accessible outside of the function userProcess.
How can I produce a new array for each user?
How can I access the array outside the function?
Here is the code:
const puppeteer = require('puppeteer'); //Code to locate text and enable it to be clicked const escapeXpathString = str => { const splitedQuotes = str.replace(/'/g, `', "'", '`); return `concat('${splitedQuotes}', '')`; }; const clickByText = async (page, text) => { const escapedText = escapeXpathString(text); const linkHandlers = await page.$x(`//a[contains(text(), ${escapedText})]`); if (linkHandlers.length > 0) { await linkHandlers[0].click(); } else { throw new Error(`Link not found: ${text}`); } }; //User credentials const userAEmail = 'abc@hotmail.com'; const userAPassword = '123'; const userBEmail = 'def@hotmail.com'; const userBPassword = '456'; //Logout const LogOut = async (page) => { await page.goto('https://www.website.com'); await clickByText(page, 'Log out'); await page.waitForNavigation({waitUntil: 'load'}); console.log('Signed out'); }; /////////////////////////// //SCRAPE PROCESS async function userProcess() { try { const browser = await puppeteer.launch({ headless : false }); const page = await browser.newPage(); page.setUserAgent('BLAHBLAHBLAH'); //Go to Website saved list await page.goto('https://www.website.com/shortlist.html', {waitUntil: 'networkidle2'}); console.log('Page loaded'); //User A log in await page.type('input[name=email]', userAEmail, {delay: 10}); await page.type('input[name=password]', userAPassword, {delay: 10}); await page.click('.mrm-button',{delay: 10}); await page.waitForNavigation({waitUntil: 'load'}) console.log('Signed in'); //Wait for website saved list to load const propertyList = await page.$$('.title'); console.log(propertyList.length); //Collecting links from saved list and de-duping into an array const propertyLinks = await page.evaluate(() => Array.from(document.querySelectorAll('.sc-jbKcbu'), e => e.href)); let uniquePropertyLinks = [...new Set(propertyLinks)]; console.log(uniquePropertyLinks); //Sign out LogOut(page); } catch (err) { console.log('Our error - ', err.message); } }; userProcess();
Advertisement
Answer
Let’s see some of the things you might need to complete your task. I think it’s better to take time and develop the skills yourself, but I can perhaps point out a few key things.
You use:
const userAEmail = 'abc@hotmail.com'; const userAPassword = '123'; const userBEmail = 'def@hotmail.com'; const userBPassword = '456';
but then you’re talking about looping. With such a data structure, it will be difficult to loop these two users. I recommend putting it into an object like so:
const users = { a: { email: 'abc@hotmail.com', password: '123', }, b: { email: 'def@hotmail.com', password: '456', }, };
then you can easily look with for example for .. in
:
for (const user in users) { console.log(users[user]); }
or with .forEach()
:
Object.values(users).forEach(user => { console.log(user); });
need to make the array for each user, declared as uniquePropertyLinks in the below, accessible outside of the function userProcess.
Then declare the array outside of the funtion:
let uniquePropertyLinks = []; async function userProcess() { // you can access uniquePropertyLinks here } // and you can access uniquePropertyLinks here as well
How can I produce a new array for each user? How can I access the array outside the function?
Again, it’d be better to choose a differen data structure, let’s day an object with keys that would represent each user and values would be arrays. It’d look like so:
let uniquePropertyLinks = {}; uniquePropertyLinks.a = []; uniquePropertyLinks.b = [];
which looks like this:
{ a: [], b: [] }
so you can save whatever values for user a into uniquePropertyLinks.a
array and whatever values you need into uniquePropertyLinks.b
array:
uniquePropertyLinks.a.push('new_value_for_a_user');
similarly for user b.
Now you should have all the bits you need in order to go back to your code and make the necessary changes.