Trying To Scrape Table Using Pandas From Selenium's Result
Solution 1:
You can get the table using the following code
import time
from selenium import webdriver
import pandas as pd
chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS02'
page = driver.get(url)
time.sleep(2)
df = pd.read_html(driver.page_source)[0]
print(df.head())
This is the output
No Code Name Rem Last Done LACP Chg % Chg Vol ('00) Buy Vol ('00) Buy Sell Sell Vol ('00) High Low
0 1 5284CB LCTITAN-CB s 0.025 0.020 0.005 +25.00 406550 19878 0.020 0.025 106630 0.025 0.015
1 2 1201 SUMATEC [S] s 0.050 0.050 - - 389354 43815 0.050 0.055 187301 0.055 0.050
2 3 5284 LCTITAN [S] s 4.470 4.700 -0.230 -4.89 367335 430 4.470 4.480 34 4.780 4.140
3 4 0176 KRONO [S] - 0.875 0.805 0.070 +8.70 300473 3770 0.870 0.875 797 0.900 0.775
4 5 5284CE LCTITAN-CE s 0.130 0.135 -0.005 -3.70 292379 7214 0.125 0.130 50 0.155 0.100
To get data from all pages you can crawl the remaining pages and use df.append
Solution 2:
Answer:
df = pd.read_html(target[0].get_attribute('outerHTML'))
Result:
Reason for target[0]
:
driver.find_elements_by_id('bm_equities_prices_table')
returns a list of selenium webelements, in your case, there's only 1 element, hence [0]
Reason for get_attribute('outerHTML')
:
we want to get the 'html' of the element. There are 2 types of such get_attribute methods
: 'innerHTML'
vs 'outerHTML'
. We chose the 'outerHTML'
becasue we need to include the current element, where the table headers are, I suppose, instead of only the inner contents of the element.
Reason for df[0]
pd.read_html()
returns a list of data frames, the first of which is the result we want, hence [0]
.
Post a Comment for "Trying To Scrape Table Using Pandas From Selenium's Result"