Skip to content
Advertisement

How can I get the img src from this page with puppeteer?

I am trying to get some data from this wikipedia page: https://en.wikipedia.org/wiki/List_of_mango_cultivars

img src that I need

I can get everything that I need except the img src with this code

const recordList = await page.$$eval(
      'div#mw-content-text > div.mw-parser-output > table > tbody > tr',
      (trows) => {
         let rowList = []
         trows.forEach((row) => {
            let record = { name: '', image: '', origin: '', notes: '' }

            record.image = row.querySelector('a > img').src

            const tdList = Array.from(row.querySelectorAll('td'), (column) => column.innerText) 
            const imageSrc = row.querySelectorAll('a > img').getAttribute('src')
            

            record.name = tdList[0] 
            record.origin = tdList[2]
            record.notes = tdList[3]
            rowList.push(record)
         })

         return rowList
      }
   )

The error I am getting: Evaluation failed: TypeError: Cannot read properties of null (reading 'src')

Advertisement

Answer

You can wrap your record.image line in a conditional like this

if(row.querySelector('a > img')){
    record.image = row.querySelector('a > img').src
}

This will ask if an img inside of an a tag exists, and if it does, then add it to the object.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement