Skip to content Skip to sidebar Skip to footer

Why My Program To Scrape Nse Website Gets Blocked In Servers But Works In Local?

This python code is running on the local computer but is not running on Digital Ocean Amazon AWS Google Collab Heroku and many other VPS. It shows different errors at different t

Solution 1:

There are 2 things that are to be noted.

  1. Request header needs to have 'host' and 'user-agent'
__request_headers = {
        'Host':'www.nseindia.com', 
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0',
        'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 
        'Accept-Language':'en-US,en;q=0.5', 
        'Accept-Encoding':'gzip, deflate, br',
        'DNT':'1', 
        'Connection':'keep-alive', 
        'Upgrade-Insecure-Requests':'1',
        'Pragma':'no-cache',
        'Cache-Control':'no-cache',    
    }
  1. Following cookies are dynamically set, which needs to be fetched and set dynamically.
'nsit',
'nseappid',
'ak_bmsc'

These are set from nse based on the functionality that is being used. This example: top gainers / losers. I tried to get top gainers and losers list, in which the request is blocked without these cookies.

try:
            nse_url = 'https://www.nseindia.com/market-data/top-gainers-loosers'
            url = 'https://www.nseindia.com/api/live-analysis-variations?index=gainers'
            resp = requests.get(url=nse_url, headers=__request_headers)
            if resp.ok:
                req_cookies = dict(nsit=resp.cookies['nsit'], nseappid=resp.cookies['nseappid'], ak_bmsc=resp.cookies['ak_bmsc'])
                tresp = requests.get(url=url, headers=__request_headers, cookies=req_cookies)
                result = tresp.json()
                res_data = result["NIFTY"]["data"] if"NIFTY"in result and"data"in result["NIFTY"] else []
                if res_data != Noneandlen(res_data) > 0:
                    __top_list = res_data
        except OSError as err:
            logger.error('Unable to fetch data')

Another thing to be noted is that these requests are blocked by NSE from most of the cloud VMs like AWS, GCP. I was able to get it from personal windows machine, but not from AWS or GCP.

Solution 2:

I stumbled into the same problem. I do not know the proper pythonic solution with the python-requests module. There is a high chance NSE just blocks it.

So here is a pythonic solution that will work. It looks lame but I'm using it without digging deep -

import subprocess
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))

subprocess.Popen('curl "https://www.nseindia.com/api/quote-derivative?symbol=BANKNIFTY"-H"authority: beta.nseindia.com"-H"cache-control: max-age=0"-H"dnt: 1"-H"upgrade-insecure-requests: 1"-H"user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36"-H"sec-fetch-user: ?1"-H"accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"-H"sec-fetch-site: none"-H"sec-fetch-mode: navigate"-H"accept-encoding: gzip, deflate, br"-H"accept-language: en-US,en;q=0.9,hi;q=0.8"--compressed  -o maxpain.txt', shell=True)

f=open("maxpain.txt","r")
var=f.read()
print(var)

It basically runs the curl function and sends the output to a file and read the file back. That's it.

Post a Comment for "Why My Program To Scrape Nse Website Gets Blocked In Servers But Works In Local?"