Python - Efficient Way Of Checking If Part Of String Is In The List
Solution 1:
Turning your text into a set of words and computing its intersection with the set of bad words will give you amortized speed:
text = "The Dormouse's story. Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well....badword..."
badwords = set(["badword", "badword1", ....])
textwords = set(word for word in text.split())
for badword in badwords.intersection(textwords):
print("The bad word '{}' was found in the text".format(badword))
Solution 2:
No need to get all the words of the text, you can directly check if a string is in another string, e.g.:
In [1]: 'bad word' in 'do not say bad words!'
Out[1]: True
So you can just do:
for bad_word in bad_words_list:
if bad_word in huge_string:
print "BAD!!"
Solution 3:
You can use any
:
To test if bad_words are pre/suffixes:
>>> bad_words = ["badword", "badword1"]
>>> text ="some text with badwords or not"
>>> any(i in text for i in bad_words)
True
>>> text ="some text with words or not"
>>> any(i in text for i in bad_words)
False
It will compare any of the bad_words' item are in text
, using "substring".
To test exact matches:
>>> text ="some text with badwords or not"
>>> any(i in text.split() for i in bad_words)
False
>>> text ="some text with badword or not"
>>> any(i in text.split() for i in bad_words)
True
It will compare any of the bad_words' item are in text.split()
, that is, if it's an exact item.
Solution 4:
s
is the long string. use &
operator or set.intersection
method.
In [123]: set(s.split()) & set(bad_words)
Out[123]: {'badword'}
In [124]: bool(set(s.split()) & set(bad_words))
Out[124]: True
Or even better Use set.isdisjoint
.
This will short circuit as soon as match is found.
In [127]: bad_words = set(bad_words)
In [128]: not bad_words.isdisjoint(s.split())
Out[128]: True
In [129]: not bad_words.isdisjoint('for bar spam'.split())
Out[129]: False
Solution 5:
On top of all the excellent answers, the for now, whole words
clause in your comment points in the direction of regular expressions.
You may want to build a composed expression like bad|otherbad|yetanother
r = re.compile("|".join(badwords))
r.search(text)
Post a Comment for "Python - Efficient Way Of Checking If Part Of String Is In The List"