Skip to content
Advertisement

Can I simplify this code to avoid the type error for reading properties?

I am writing this code to scrape a webpage. I need to get specific information from the website and there is a lot of information needed to be scraped.

The code that I write works but when do it repeatedly it encounters error on some of the line, e.g. line 20, line 24.

Below is the code

const browser = await puppeteer.launch()
const page = await browser.newPage();

await page.goto("https://startupjobs.asia/job/search?q=&job-list-dpl-page=1", {timeout: 3000000})

const b = (await page.$x("/html/body/div[1]/div[3]/div[1]/div/div[1]/ul/li[1]/div/div[1]/div/h5/a"))[0]
b.click()

//const elm = await page.$('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5');
//const text = await page.evaluate(elm => elm.textContent, elm[0]);

const [el1] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5');
const job_name = await (await el1.getProperty('textContent')).jsonValue();

const [el2] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a');
const company = await (await el2.getProperty('textContent')).jsonValue();

const [el3] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[3]/p');
const job_type= await (await el3.getProperty('textContent')).jsonValue();

const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p');
const salary = await (await el4.getProperty('textContent')).jsonValue();

const [el5] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[4]/p');
const skills = await (await el5.getProperty('textContent')).jsonValue();

There are like 13 data I need to scrape.

The error that I got is

const salary = await (await el4.getProperty(‘textContent’)).jsonValue(); TypeError: Cannot read properties of undefined (reading ‘getProperty’)

Advertisement

Answer

The quick fix would be to check if the destructured ElementHandle actually exists before trying to call getProperty on it, for example:

const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p');
const salary = !el4 ? 'Not Found' : await (await el4.getProperty('textContent')).jsonValue();

A less repetitive script would look more like:

const elementsToFind = [
    { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5', propName: 'job_name' },
    { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a', propName: 'company' },
    // ...
];
const results = {};
for (const { xpath, propName } of elementsToFind) {
    const [el] = await page.$x(xpath);
    results[propName] = !el ? 'Not Found' : await (await el.getProperty('textContent')).jsonValue();
}

And then iterate through the results object.

Advertisement