494 words
2 minutes
a login api

original reddit post: https://www.reddit.com/r/webscraping/comments/1kpwou5/login_form_questions/

So, they wanted to use an API to log in but received a non-200 response.

target website: https://www.costar.com/

Click the login button at the top right. The website redirects to another URL:

https://secure.costargroup.com/login?signin=8e10875e6eeb2ea3856ae6da5659d78c

image-20250524230140999

Click Log In and we get a POST request.

image-20250524230323637

image-20250524230332388

Look at the response. If we input the correct username and password, I guess it will redirect us to an after-login page. But here, we can use these keywords to see if we did it correctly: Invalid username/password combination.

image-20250524231201783


If I delete all cookies and send a POST request to this API, that sentence will not show. I also noticed that if I delete idsrv.xsrf in the params, the returned page will not show that sentence, which means my request is not correct. I’ve tried several times and finally confirmed that there are two necessary cookie values: SignInMessage.8e10875e6eeb2ea3856ae6da5659d78c and idsrv.xsrf.

its not a necessary step btw XD

image-20250524231257868


signin and idsrv.xsrf#

There are two idsrv.xsrf values: one in the params, one in the cookies. Here, we are talking about the one in the params.

Search for the value of signin. It’s a redirect link, which means that if we send a GET request to this URL, the response.text will be the login page.

image-20250524232744058

Note: You can’t retrieve the location key in the response header because the URL is redirected to the login one.


We can retrieve the value of both signin and idsrv.xsrf in the login’s response text, which is the redirected response of the authorize API.image-20250524233728675

Use XPath to extract them.

def extract(resp):
    tree = etree.HTML(resp)
    signinform = tree.xpath('//form[@id="signinform"]/@action')[0]
    print(signinform) # /login?signin=11.......d736d93e8c1b15ee

    xsrf_token = tree.xpath('//input[@name="idsrv.xsrf"]/@value')[0]
    print(xsrf_token)

    return signinform, xsrf_token

cookies#

Search for the value of SignInMessage.8e10875e6eeb2ea3856ae6da5659d78c.

image-20250524234720664

Search for the value of idsrv.xsrf.

image-20250524234842744

Use session to keep the session. It automatically manages cookies.

session = requests.session()

code#

from lxml import etree
import requests


def authorize():
    headers = {
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'accept-language': 'en',
        'cache-control': 'no-cache',
        'pragma': 'no-cache',
        'priority': 'u=0, i',
        'referer': 'https://www.costar.com/',
        'sec-ch-ua': '"Chromium";v="136", "Google Chrome";v="136", "Not.A/Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'cross-site',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
    }

    params = {
        'client_id': 'costar',
        'nonce': '7e746f9b-9ca8-467a-25d1-71e3fb9ba889',
        'response_type': 'code',
        'response_mode': 'form_post',
        'scope': 'openid profile email address phone offline_access product_user session',
        'redirect_uri': 'https://product.costar.com/home/auth-callback',
        'acr_values': '',
        'locale': 'en-US',
    }

    response = session.get('https://secure.costargroup.com/connect/authorize', params=params, cookies={}, headers=headers)
    # print(response.headers)
    return response.text
    

def extract(resp):
    tree = etree.HTML(resp)
    signinform = tree.xpath('//form[@id="signinform"]/@action')[0]
    print(signinform) # /login?signin=11.......d736d93e8c1b15ee

    xsrf_token = tree.xpath('//input[@name="idsrv.xsrf"]/@value')[0]
    print(xsrf_token)

    return signinform, xsrf_token
    

if __name__ == '__main__':
    session = requests.session()
    resp = authorize()
    signinform, xsrf_token = extract(resp)
    signin = signinform.split('=')[-1]
    params = {
        'signin': signin,
    }
    referer = 'https://secure.costargroup.com' + signinform
    headers = {
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'accept-language': 'en',
        'cache-control': 'no-cache',
        'content-type': 'application/x-www-form-urlencoded',
        'origin': 'https://secure.costargroup.com',
        'pragma': 'no-cache',
        'priority': 'u=0, i',
        'referer': referer,
        'sec-ch-ua': '"Chromium";v="136", "Google Chrome";v="136", "Not.A/Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
    }    

    data = {
        'idsrv.xsrf': xsrf_token,
        'sessionId': '',
        'username': 'your username',
        'password': 'your password',
    }
    response = session.post("https://secure.costargroup.com/login", params=params, headers=headers, data=data)
    print(response.text)
a login api
https://zycreverse.netlify.app/posts/costargroup/
Author
会写点代码的本子画手
Published at
2025-05-24