Skip to content
Advertisement

Parse property page URLs using xpath

I am trying to parse the main property page https://www.realtyatlas.co.za/search?areas%5B0%5D%5Btown%5D=Bellville&status=For%20Sale, more precisely I would like to extract the href from attribute class that is here, and make a follow link:

<div class="col-md-4">
     <a class="property-item__wrap" href="/loevenstein-apartment-for-sale-1917472">

However all the combinations I have tried result in None. I am also aware of API (https://jf6e1ij07f.execute-api.eu-west-1.amazonaws.com/p/search), however, in the response, I do not see the URL to the properties, which is then not useful. Am I missing something or any ideas on what I am doing wrong?

Here is some code:

 for prop in response.xpath("//div[@class='col-md-4']"):
...     link = prop.xpath("./a[@class='property-item__wrap']/@href").get()

Advertisement

Answer

As you already discovered, the properties information comes from the API call you mentioned and not directly embedded in the site you are doing the request to (because of javascript rendering), so you need to call the API directly from your scrapy request to get that information (it’s a POST request with certain data, so you need to build it yourself)

I could be wrong, but it looks like this site generates the URL on the fly, depending on the type of property it is offering, so you can still create the url yourself with the data that comes from the API:

https://www.realtyatlas.co.za/{suburb}-{propertyType}-{propertyStatus}-{propertyid}

where the following variables could be replaced with the ones coming from the API

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement