Scrapy + splash: can’t select element

Question

I&#8217;m learning to use scrapy with splash. As an exercise, I&#8217;m trying to visit https://www.ubereats.com/stores/, click on the address text box, enter a location and then press the Enter button to move to next page containing the restaurants available for that location. I have the following lua code: …

Accepted Answer

Not a complete solution, but here is what I have so far:import jsonimport reimport scrapyfrom scrapy_splash import SplashRequestclass UberEatsSpider(scrapy.Spider):    name = "ubereatspider"    allowed_domains = ["ubereats.com"]    def start_requests(self):        script = """        function main(splash)            local url = splash.args.url            assert(splash:go(url))            assert(splash:wait(10))            splash:set_viewport_full()            local search_input = splash:select('#address-selection-input')            search_input:send_text("Wall Street, New York")            assert(splash:wait(5))            local submit_button = splash:select('button[class^=submitButton_]')            submit_button:click()            assert(splash:wait(10))            return {                html = splash:html(),                png = splash:png(),            }          end        """        headers = {            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'        }        yield SplashRequest('https://www.ubereats.com/new_york/', self.parse, endpoint='execute', args={            'lua_source': script,            'wait': 5        }, splash_headers=headers, headers=headers)    def parse(self, response):        script = response.xpath("//script[contains(., 'cityName')]/text()").extract_first()        pattern = re.compile(r"window.INITIAL_STATE = ({.*?});", re.MULTILINE | re.DOTALL)        match = pattern.search(script)        if match:            data = match.group(1)            data = json.loads(data)            for place in data["marketplace"]["marketplaceStores"]["data"]["entity"]:                print(place["title"])Note the changes in the Lua script: I&#8217;ve located the search input, send the search text to it, then located the &#8220;Find&#8221; button and clicked it. On the screenshot, I did not see the search results loaded no matter the time delay I&#8217;ve set, but I&#8217;ve managed to get the restaurant names from the script contents. The place objects contain all the necessary information to filter the desired restaurants. Also note that the URL I&#8217;m navigating to is the &#8220;New York&#8221; one (not the general &#8220;stores&#8221;).I&#8217;m not completely sure why the search result page is not being loaded though, but hope it&#8217;ll be a good start for you and you can further improve this solution.

Advertisement

Answer