Python Beautifulsoup Paragraph Text Only

June 11, 2024 Post a Comment

I am very new to anything webscraping related and as I understand Requests and BeautifulSoup are the way to go in that. I want to write a program which emails me only one paragrap

Solution 1:

Loop over the soup.findAll('p') to find all the p tags and then use .text to get their text:

Furthermore, do all that under a div with the class rte since you don't want the footer paragraphs.

from bs4 import BeautifulSoup
import requests

url = 'https://fs.blog/mental-models/'    
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

divTag = soup.find_all("div", {"class": "rte"})    
for tag in divTag:
    pTags = tag.find_all('p')
    for tag in pTags[:-2]:  # to trim the last two irrelevant looking linesprint(tag.text)

OUTPUT:

Mental models are how we understand the world. Not only do they shape what we think and how we understand but they shape the connections and opportunities that we see.
.
.
.
5. Mutually Assured Destruction
Somewhat paradoxically, the stronger two opponents become, the less likely they may be to destroy one another. This process of mutually assured destruction occurs not just in warfare, aswith the development ofglobal nuclear warheads, but also in business, aswith the avoidance of destructive price wars between competitors. However, in a fat-tailed world, it is also possible that mutually assured destruction scenarios simply make destruction more severe in the eventof a mistake (pushing destruction into the “tails” of the distribution).

Solution 2:

If you want the text of all the p tag, you can just loop on them using the find_all method:

Baca Juga

from bs4 import BeautifulSoup
import re
import requests


url = 'https://fs.blog/mental-models/'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup)

data = soup.find_all('p')
for p indata:
    text = p.get_text()
    print(text)

EDIT:

Here is the code in order to have them separatly in a list. You can them apply a loop on the result list to remove empty string, unused characters like\n etc...

from bs4 import BeautifulSoup
import re
import requests


url = 'https://fs.blog/mental-models/'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

data = soup.find_all('p')
result = []
for p indata:
    result.append(p.get_text())

print(result)

Solution 3:

Here is the solution:

from bs4 import BeautifulSoup
import requests
import Clock

url = 'https://fs.blog/mental-models/'  
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('p')

result = []

for p indata:
    result.append(p.get_text())

Clock.schedule_interval(print(result), 60)

Learn Python Tutorials

Python Beautifulsoup Paragraph Text Only

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Python Beautifulsoup Paragraph Text Only"