original reddit post: https://www.reddit.com/r/webscraping/comments/1kpwou5/login_form_questions/
So, they wanted to use an API to log in but received a non-200 response.
target website: https://www.costar.com/
Click the login button at the top right. The website redirects to another URL:
https://secure.costargroup.com/login?signin=8e10875e6eeb2ea3856ae6da5659d78c

Click Log In and we get a POST request.


Look at the response. If we input the correct username and password, I guess it will redirect us to an after-login page. But here, we can use these keywords to see if we did it correctly: Invalid username/password combination.

If I delete all cookies and send a POST request to this API, that sentence will not show. I also noticed that if I delete idsrv.xsrf in the params, the returned page will not show that sentence, which means my request is not correct. I’ve tried several times and finally confirmed that there are two necessary cookie values: SignInMessage.8e10875e6eeb2ea3856ae6da5659d78c and idsrv.xsrf.
its not a necessary step btw XD

signin and idsrv.xsrf
There are two idsrv.xsrf values: one in the params, one in the cookies. Here, we are talking about the one in the params.
Search for the value of signin. It’s a redirect link, which means that if we send a GET request to this URL, the response.text will be the login page.

Note: You can’t retrieve the location key in the response header because the URL is redirected to the login one.
We can retrieve the value of both signin and idsrv.xsrf in the login’s response text, which is the redirected response of the authorize API.
Use XPath to extract them.
def extract(resp):
tree = etree.HTML(resp)
signinform = tree.xpath('//form[@id="signinform"]/@action')[0]
print(signinform) # /login?signin=11.......d736d93e8c1b15ee
xsrf_token = tree.xpath('//input[@name="idsrv.xsrf"]/@value')[0]
print(xsrf_token)
return signinform, xsrf_token
cookies
Search for the value of SignInMessage.8e10875e6eeb2ea3856ae6da5659d78c.

Search for the value of idsrv.xsrf.

Use session to keep the session. It automatically manages cookies.
session = requests.session()
code
from lxml import etree
import requests
def authorize():
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en',
'cache-control': 'no-cache',
'pragma': 'no-cache',
'priority': 'u=0, i',
'referer': 'https://www.costar.com/',
'sec-ch-ua': '"Chromium";v="136", "Google Chrome";v="136", "Not.A/Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'cross-site',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
}
params = {
'client_id': 'costar',
'nonce': '7e746f9b-9ca8-467a-25d1-71e3fb9ba889',
'response_type': 'code',
'response_mode': 'form_post',
'scope': 'openid profile email address phone offline_access product_user session',
'redirect_uri': 'https://product.costar.com/home/auth-callback',
'acr_values': '',
'locale': 'en-US',
}
response = session.get('https://secure.costargroup.com/connect/authorize', params=params, cookies={}, headers=headers)
# print(response.headers)
return response.text
def extract(resp):
tree = etree.HTML(resp)
signinform = tree.xpath('//form[@id="signinform"]/@action')[0]
print(signinform) # /login?signin=11.......d736d93e8c1b15ee
xsrf_token = tree.xpath('//input[@name="idsrv.xsrf"]/@value')[0]
print(xsrf_token)
return signinform, xsrf_token
if __name__ == '__main__':
session = requests.session()
resp = authorize()
signinform, xsrf_token = extract(resp)
signin = signinform.split('=')[-1]
params = {
'signin': signin,
}
referer = 'https://secure.costargroup.com' + signinform
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'accept-language': 'en',
'cache-control': 'no-cache',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://secure.costargroup.com',
'pragma': 'no-cache',
'priority': 'u=0, i',
'referer': referer,
'sec-ch-ua': '"Chromium";v="136", "Google Chrome";v="136", "Not.A/Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'same-origin',
'sec-fetch-user': '?1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
}
data = {
'idsrv.xsrf': xsrf_token,
'sessionId': '',
'username': 'your username',
'password': 'your password',
}
response = session.post("https://secure.costargroup.com/login", params=params, headers=headers, data=data)
print(response.text)
