Skip to content
Advertisement

How to escape a sup tag in xpath selector

I want to extract the text content from the below HTML tag, but the <sup> tag is preventing me from getting the desired text.

The text I want to extract is simply (4:6, 6:7). how can I extract this text at the same time escaping the <sup> tag.

I tried this "//p/text()", but I am only getting the part before the <sup> tag (4:6, 6

my html tag

'<p class="result"><span class="bold">Final result </span><strong>0:2</strong> (4:6, 6<sup>5</sup>:7)</p>

Advertisement

Answer

It’s the only text that is a direct text of p, the rest are texts inside a child tag.

scrapy shell file:///path/to/file.html

In [1]: ''.join(response.xpath('//p[@class="result"]/text()').getall())
Out[1]: ' (4:6, 6:7)'
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement