Parsing A Script Tag With Dicts In Beautifulsoup

December 27, 2023 Post a Comment

Working on a partial answer to this question, I came across a bs4.element.Tag that is a mess of nested dicts and lists (s, below). Is there a way to return a list of urls contain

Solution 1:

You can use s.text to get the content of the script. It's JSON, so you can then just parse it with json.loads. From there, it's simple dictionary access:

import json

from bs4 import BeautifulSoup
import requests

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)

soup = BeautifulSoup(r.text, 'html.parser')

s = soup.find('script', type='application/ld+json')

urls = [el['url'] for el in json.loads(s.text)['itemListElement']]

print(urls)

Solution 2:

More easy:

from bs4 import BeautifulSoup
import requests

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')

s = soup.find('script', type='application/ld+json')

# JUST THIS
json = json.loads(s.string)

Learn Python Tutorials

Parsing A Script Tag With Dicts In Beautifulsoup

Solution 1:

Solution 2:

Post a Comment for "Parsing A Script Tag With Dicts In Beautifulsoup"