Goal
To log in to this website (https://www.reliant.com) using python requests etc. (I know this could be done with selenium or PhantomJS or something, but would prefer not to)
Problem
During the log in process there a couple of redirects where “session ID” type params are passed. Most of these i can get but there’s one called dtPC
that appears to come from a cookie that you get when first visiting the page. As far as I can tell, the cookie originates from this JS file (https://www.reliant.com/ruxitagentjs_ICA2QSVfhjqrux_10175190917092722.js). This url is the next GET request the browser performs after the initial GET of the main url. All the methods i’ve tried so far have failed to get me that cookie.
Code thus far
from requests_html import HTMLSession url=r'https://www.reliant.com' url2=r'https://www.reliant.com/ruxitagentjs_ICA2QSVfhjqrux_10175190917092722.js' headers={ 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3', 'Accept-Encoding': 'gzip, deflate, br', 'Accept-Language': 'en-US,en;q=0.9', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Host': 'www.reliant.com', 'Sec-Fetch-Mode': 'navigate', 'Sec-Fetch-Site': 'none', 'Sec-Fetch-User': '?1', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.3' } headers2={ 'Referer': 'https://www.reliant.com', 'Sec-Fetch-Mode': 'no-cors', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36' } s=HTMLSession() r=s.get(url,headers=headers) js=s.get(url2,headers=headers2).text r.html.render() #works but doesn't get the cookie r.html.render(script=js) #fails on Network error
Advertisement
Answer
Alright I figured this one out, despite it fighting me the whole way. Idk why dtPC
wasn’t showing up in the s.cookies
like it should, but I wasn’t using the script
keyword quite right. Apparently, whatever JS you pass it will be executed after everything else has rendered, like you opened the console on your browser and pasted it in there. When i actually tried that in Chrome, I got some errors. Eventually i realized i could just run a simple JS script to return the cookies generated by the other JS.
s=HTMLSession() r=s.get(url,headers=headers) print(r.status_code) c=r.html.render(script='document.cookie') c=urllib.parse.unquote(c) c=[x.split('=') for x in c.split(';')] c={x[0]:x[1] for x in c} print(c)
at this point, c
will be a dict with 'dtPC'
as a key and the corresponding value.