Python Beautifulsoup Paragraph Text Only
I am very new to anything webscraping related and as I understand Requests and BeautifulSoup are the way to go in that. I want to write a program which emails me only one paragrap
Solution 1:
Loop over the soup.findAll('p')
to find all the p
tags and then use .text
to get their text:
Furthermore, do all that under a div
with the class rte
since you don't want the footer paragraphs.
from bs4 import BeautifulSoup
import requests
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
divTag = soup.find_all("div", {"class": "rte"})
for tag in divTag:
pTags = tag.find_all('p')
for tag in pTags[:-2]: # to trim the last two irrelevant looking linesprint(tag.text)
OUTPUT:
Mental models are how we understand the world. Not only do they shape what we think and how we understand but they shape the connections and opportunities that we see.
.
.
.
5. Mutually Assured Destruction
Somewhat paradoxically, the stronger two opponents become, the less likely they may be to destroy one another. This process of mutually assured destruction occurs not just in warfare, aswith the development ofglobal nuclear warheads, but also in business, aswith the avoidance of destructive price wars between competitors. However, in a fat-tailed world, it is also possible that mutually assured destruction scenarios simply make destruction more severe in the eventof a mistake (pushing destruction into the “tails” of the distribution).
Solution 2:
If you want the text of all the p
tag, you can just loop on them using the find_all
method:
from bs4 import BeautifulSoup
import re
import requests
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup)
data = soup.find_all('p')
for p indata:
text = p.get_text()
print(text)
EDIT:
Here is the code in order to have them separatly in a list. You can them apply a loop on the result list to remove empty string, unused characters like\n
etc...
from bs4 import BeautifulSoup
import re
import requests
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('p')
result = []
for p indata:
result.append(p.get_text())
print(result)
Solution 3:
Here is the solution:
from bs4 import BeautifulSoup
import requests
import Clock
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('p')
result = []
for p indata:
result.append(p.get_text())
Clock.schedule_interval(print(result), 60)
Post a Comment for "Python Beautifulsoup Paragraph Text Only"