Pandas Read_html Missing Some Tables
I am using pandas read_html to find all tables in a specific webpage; however, the process seems to be missing some of the tables. Here is the webpage: https://www.uspto.gov/web/o
Solution 1:
It seems that pd.read_html
function can't find all table tags.
I can suggest you to use BeautifulSoup and urllib2 packages for this task. You can install it via pip install <package_name>
.
import urllib2
from bs4 import BeautifulSoup
html_text = urllib2.urlopen("https://www.uspto.gov/web/offices/ac/ido/oeip/taf/mclsstc/mcls1.htm")
bs_obj = BeautifulSoup(html_text)
tables = bs_obj.findAll('table')
dfs = list()
for table in tables:
df = pd.read_html(str(table))[0]
dfs.append(df)
In result, you'l have all tables (in DataFrame type) in dfs list.
Post a Comment for "Pandas Read_html Missing Some Tables"