Skip to content Skip to sidebar Skip to footer

Python - Efficient Way Of Checking If Part Of String Is In The List

I have a huge string like: The Dormouse's story. Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom

Solution 1:

Turning your text into a set of words and computing its intersection with the set of bad words will give you amortized speed:

text  = "The Dormouse's story. Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well....badword..."

badwords = set(["badword", "badword1", ....])

textwords = set(word for word in text.split())
for badword in badwords.intersection(textwords):
    print("The bad word '{}' was found in the text".format(badword))

Solution 2:

No need to get all the words of the text, you can directly check if a string is in another string, e.g.:

In [1]: 'bad word' in 'do not say bad words!'
Out[1]: True

So you can just do:

for bad_word in bad_words_list:
    if bad_word in huge_string:
        print "BAD!!"

Solution 3:

You can use any:

To test if bad_words are pre/suffixes:

>>> bad_words = ["badword", "badword1"]
>>> text ="some text with badwords or not"
>>> any(i in text for i in bad_words)
True
>>> text ="some text with words or not"
>>> any(i in text for i in bad_words)
False

It will compare any of the bad_words' item are in text, using "substring".

To test exact matches:

>>> text ="some text with badwords or not"
>>> any(i in text.split() for i in bad_words)
False
>>> text ="some text with badword or not"
>>> any(i in text.split() for i in bad_words)
True

It will compare any of the bad_words' item are in text.split(), that is, if it's an exact item.


Solution 4:

s is the long string. use & operator or set.intersection method.

In [123]: set(s.split()) & set(bad_words)
Out[123]: {'badword'}

In [124]: bool(set(s.split()) & set(bad_words))
Out[124]: True

Or even better Use set.isdisjoint. This will short circuit as soon as match is found.

In [127]: bad_words = set(bad_words)

In [128]: not bad_words.isdisjoint(s.split())
Out[128]: True

In [129]: not bad_words.isdisjoint('for bar spam'.split())
Out[129]: False

Solution 5:

On top of all the excellent answers, the for now, whole words clause in your comment points in the direction of regular expressions.

You may want to build a composed expression like bad|otherbad|yetanother

r = re.compile("|".join(badwords))
r.search(text)

Post a Comment for "Python - Efficient Way Of Checking If Part Of String Is In The List"