Skip to content Skip to sidebar Skip to footer

Pandas Read_html Missing Some Tables

I am using pandas read_html to find all tables in a specific webpage; however, the process seems to be missing some of the tables. Here is the webpage: https://www.uspto.gov/web/o

Solution 1:

It seems that pd.read_html function can't find all table tags. I can suggest you to use BeautifulSoup and urllib2 packages for this task. You can install it via pip install <package_name>.

import urllib2
from bs4 import BeautifulSoup

html_text = urllib2.urlopen("https://www.uspto.gov/web/offices/ac/ido/oeip/taf/mclsstc/mcls1.htm")
bs_obj = BeautifulSoup(html_text)
tables = bs_obj.findAll('table')
dfs = list()
for table in tables:
    df = pd.read_html(str(table))[0]
    dfs.append(df)

In result, you'l have all tables (in DataFrame type) in dfs list.

Post a Comment for "Pandas Read_html Missing Some Tables"