Skip to content Skip to sidebar Skip to footer

How To Get The Second Element From The Sublist Using Python

I have automated the script using python selenium to get the data from the site and generated as a list. Now from the output of sublist i need to get only the second element. Plea

Solution 1:

The information that you are looking for is not under any tag. It is just a part of the bodyContent tag. But each separation creates a text Node in HTML. So one can the text Nodes using /text() XPATH on the node.

<divid="bodyContent"><h1><b>Search Results</b></h1><br><br><b>Query Date: </b> Wed Aug 16 2017<br><b>Latitude: </b> 33.4484<br><b>Longitude: </b> -112.0740<br><br><b>ASCE 7-10 Windspeeds <br>(3-sec peak gust in mph*):</b><br><br><b>Risk Category I:</b> 105<br><b>Risk Category II:</b> 115<br><b>Risk Category III-IV:</b> 120<br><b>MRI** 10-Year:</b> 76<br><b>MRI** 25-Year:</b> 84<br><b>MRI** 50-Year:</b> 90<br><b>MRI** 100-Year:</b> 96<br><br><b>ASCE 7-05 Windspeed:</b><br>&nbsp; 90 (3-sec peak gust in mph)<br><b>ASCE 7-93 Windspeed:</b><br>&nbsp; 72 (fastest mile in mph)<br><br><p></p><br><p>*Miles per hour<br>**Mean Recurrence Interval</p><br><p>Users should consult with local building officials <br> to determine if there are community-specific wind speed <br> requirements that govern.</p><br><formid="createPDF"action="/pdf/create.php"method="post"><inputtype="hidden"name="cont"value="<h1><b>Search Results</b></h1><br/><br/><b>Query Date: </b> Wed Aug 16 2017<br/><b>Latitude: </b> 33.4484<br/><b>Longitude: </b> -112.0740<br/><br/><b>ASCE 7-10 Windspeeds <br/>(3-sec peak gust in mph*):</b><br/><br/><b>Risk Category I:</b> 105<br/><b>Risk Category II:</b> 115<br/><b>Risk Category III-IV:</b> 120<br/><b>MRI** 10-Year:</b> 76<br/><b>MRI** 25-Year:</b> 84<br/><b>MRI** 50-Year:</b> 90<br/><b>MRI** 100-Year:</b> 96<br/><br/><b>ASCE 7-05 Windspeed:</b><br/> &nbsp; 90 (3-sec peak gust in mph)<br/><b>ASCE 7-93 Windspeed:</b><br/> &nbsp; 72 (fastest mile in mph)<br/><br/><p></p><br/><p>*Miles per hour<br/>**Mean Recurrence Interval</p><br/><p>Users should consult with local building officials <br/> to determine if there are community-specific wind speed <br/> requirements that govern.</p><br/>"><inputtype="hidden"name="lat"value="33.4484"><inputtype="hidden"name="lan"value="-112.0740"><inputtype="hidden"name="zoom"id="google-map-zoom"value="4"><inputtype="hidden"name="maptype"id="google-map-maptype"value="roadmap"><!-- <a href="#" onclick="document.getElementById('createPDF').submit(); return false;"><img src="/images/pdf.png" border=0 /> Download a PDF of your results</a> --></form><br><ahref="#"onclick="window.print(); return false;"><imgsrc="/images/print.png"border="0"> Print your results</a><br><br></div>

Now most scraping frameworks or html parse would allow you to extract node data but Selenium is always interested in elements. So it cannot extract things on its own. You need to write some javascript for the same. One can use the document.evaluate method to run a XPath and get all its results. The javascript for the same would be as below

script_extract_data="""
var data = document.evaluate("./text()", 
             document.getElementById("bodyContent"), null, 
             XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

var text = [];

for (var i=0; i < data.snapshotLength; i++) 
{
   text.push(data.snapshotItem(i).textContent.trim());
}

return text;
"""

And you can execute the script using

data = driver.execute_script(script_extract_data)
print(data)

The will produce the below output

In[4]: driver.execute_script(script_extract_data)
Out[4]: 
['Wed Aug 16 2017',
 '33.4484',
 '-112.0740',
 '105',
 '115',
 '120',
 '76',
 '84',
 '90',
 '96',
 '90 (3-sec peak gust in mph)',
 '72 (fastest mile in mph)']

Post a Comment for "How To Get The Second Element From The Sublist Using Python"