Skip to content Skip to sidebar Skip to footer

Regular Expression To Find A Series Of Uppercase Words In A String

text = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE.' pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+' re.findall(pattern, text) gives an output

Solution 1:

You may use this regex:

\b[A-Z]+(?:\s+[A-Z]+)*\b

RegEx Demo

RegEx Details:

  • \b: Word boundary
  • [A-Z]+: Match a word comprising only uppercase letters
  • (?:\s+[A-Z]+)*: Match 1+ whitespace followed by another word with uppercase letters. Match this group 0 or more times
  • \b: Word boundary

Code:

>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']

Solution 2:

  1. Improving regex, you want at least 2 uppercase letter, so use the dedicated syntax {2,} for 2 or more, and use word boundary to be sure to catch the whole word

    r'\b[A-Z]{2,}\b'
  2. Do the job for each sentence : find them with a basic regex, and for each sentence, look for the uppercase words, then save them in an array by joining with a space

    result= []
    sentences = re.findall("[^.]+.", text)
    for sentence in sentences:
        uppercase = re.findall(pattern, sentence)
        result.append(" ".join(uppercase))
    print(result)  # ['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
    

In a list-comprehension, it looks like

res = [" ".join(re.findall(pattern, sentence)) for sentence in re.findall("[^.]+.", text)]

Post a Comment for "Regular Expression To Find A Series Of Uppercase Words In A String"