I want to extract the text content from the below HTML tag, but the <sup>
tag is preventing me from getting the desired text.
The text I want to extract is simply (4:6, 6:7)
. how can I extract this text at the same time escaping the <sup>
tag.
I tried this "//p/text()"
, but I am only getting the part before the <sup>
tag (4:6, 6
my html tag
'<p class="result"><span class="bold">Final result </span><strong>0:2</strong> (4:6, 6<sup>5</sup>:7)</p>
Advertisement
Answer
It’s the only text that is a direct text of p
, the rest are texts inside a child tag.
scrapy shell file:///path/to/file.html In [1]: ''.join(response.xpath('//p[@class="result"]/text()').getall()) Out[1]: ' (4:6, 6:7)'