I am writing this code to scrape a webpage. I need to get specific information from the website and there is a lot of information needed to be scraped.
The code that I write works but when do it repeatedly it encounters error on some of the line, e.g. line 20, line 24.
Below is the code
const browser = await puppeteer.launch() const page = await browser.newPage(); await page.goto("https://startupjobs.asia/job/search?q=&job-list-dpl-page=1", {timeout: 3000000}) const b = (await page.$x("/html/body/div[1]/div[3]/div[1]/div/div[1]/ul/li[1]/div/div[1]/div/h5/a"))[0] b.click() //const elm = await page.$('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5'); //const text = await page.evaluate(elm => elm.textContent, elm[0]); const [el1] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5'); const job_name = await (await el1.getProperty('textContent')).jsonValue(); const [el2] = await page.$x('//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a'); const company = await (await el2.getProperty('textContent')).jsonValue(); const [el3] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[3]/p'); const job_type= await (await el3.getProperty('textContent')).jsonValue(); const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p'); const salary = await (await el4.getProperty('textContent')).jsonValue(); const [el5] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[4]/p'); const skills = await (await el5.getProperty('textContent')).jsonValue();
There are like 13 data I need to scrape.
The error that I got is
const salary = await (await el4.getProperty(‘textContent’)).jsonValue(); TypeError: Cannot read properties of undefined (reading ‘getProperty’)
Advertisement
Answer
The quick fix would be to check if the destructured ElementHandle actually exists before trying to call getProperty
on it, for example:
const [el4] = await page.$x('/html/body/div[1]/div[3]/div[2]/div[2]/div[1]/div[2]/div[1]/div[1]/p'); const salary = !el4 ? 'Not Found' : await (await el4.getProperty('textContent')).jsonValue();
A less repetitive script would look more like:
const elementsToFind = [ { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[1]/h5', propName: 'job_name' }, { xpath: '//*[@id="suj-single-jobdetail-wrapper"]/div[1]/div[2]/div/h6[1]/a', propName: 'company' }, // ... ]; const results = {}; for (const { xpath, propName } of elementsToFind) { const [el] = await page.$x(xpath); results[propName] = !el ? 'Not Found' : await (await el.getProperty('textContent')).jsonValue(); }
And then iterate through the results
object.