Skip to content
Advertisement

Puppeter delete node inside element

I want to scrape a page with some news inside. Here it’s an HTML simplified version of what I have :

<info id="random_number" class="news"> 
    <div class="author">
        Name of author  
    </div>
    <div class="news-body">
        <blockquote>...<blockquote>
        Here it's the news text
    </div>
</info>
<info id="random_number" class="news"> 
    <div class="author">
        Name of author  
    </div>
    <div class="news-body">
        Here it's the news text
    </div>
</info>

I want to get the author and text body of each news, without the blockquote part. So I wrote this code :

let newsPage = await newsPage.$$("info.news");
for (var news of newsPage){ // Loop through each element
      let author = await news.$eval('.author', s => s.textContent.trim());
      let textBody = await news.$eval('.news-body', s => s.textContent.trim());
      console.log('Author :'+ author);
      console.log('TextBody :'+ textBody);
}

It works well, but I don’t know how to remove the blockquote part of the “news-body” part, before getting the text, how can I do this ?

EDIT : Sometimes there is blockquote exist, sometime not.

Advertisement

Answer

You can use optional chaining with ChildNode.remove(). Also you may consider innerText more readable.

let textMessage = await comment.$eval('.news-body', (element) => { 
  element.querySelector('blockquote')?.remove();
  return element.innerText.trim();
});
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement