Silent Erroer Handling In Python?
Solution 1:
Try defining a function that does the "car" checking first and the use the .apply
method of a pandas Series
to get your 1
, 0
or Wrong URL
. The following should help:
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
defcontains_car(link):
try:
returnint('car'in requests.get(link).text)
except:
return"Wrong/Missing URL"
df = pd.DataFrame(data)
df['extra_column'] = df.URLs.apply(contains_car)
# URLs electric extra_column# 0 https://www.mercedes-benz.de 1 1# 1 https://www.audi.de 0 1# 2 https://ww.audo.e 0 Wrong/Missing URL# 3 NaN 0 Wrong/Missing URL
Edit:
You can search for more than just one keyword in the returned text from your HTTP request. Depending on the condition you set up, this can be done with either the builtin function any
or the builtin function all
. Using any
means that finding any of the keywords should return 1, while using all
means that all the keywords have to be matched in order to return 1. In the following example, I am using any
with keywords such as 'car', 'automobile', 'vehicle':
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
defcontains_keywords(link, keywords):
try:
output = requests.get(link).text
returnint(any(x in output for x in keywords))
except:
return"Wrong/Missing URL"
df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
df['extra_column'] = df.URLs.apply(lambda l: contains_keywords(l, mykeywords))
Should yield:
# URLs electric extra_column# 0 https://www.mercedes-benz.de 1 1# 1 https://www.audi.de 0 1# 2 https://ww.audo.e 0 Wrong/Missing URL# 3 NaN 0 Wrong/Missing URL
I hope this helps.
Solution 2:
I hope I do get you right, that 'NaN'
is a "wrong/missing" URL. In this case you can just check for that. There are endless ways to indicate a missing URL. I'd prefere a missing value for car
: Try this:
import pandas as pd
csv = [{"URLs" : "www.mercedes-benz.de", "electric" : 1}, {"URLs" : "www.audi.de", "electric" : 0}, {"URLs" : "ww.audo-car.e", "electric" : 0}, {"URLs" : "NaN", "electric" : 0}]
df = pd.DataFrame(csv)
print(df)
for i, row in df.iterrows():
page_content = row['URLs']
if page_content isNoneor page_content is"NaN":
df.loc[i, 'car'] = Noneelif"car"in page_content:
df.loc[i, 'car'] = Trueelse:
df.loc[i, 'car'] = Falseprint(df.loc[i, 'car'])
print(df)
I edited some more things in your code, as they did not work. E.g this line with page_content = requests.get(row['URLs'])
- requests
is not defined. I guess you mean row
.
Post a Comment for "Silent Erroer Handling In Python?"