Skip to content Skip to sidebar Skip to footer

Python Histogram Using Matplotlib On Top Words

I am reading a file and calculating the frequency of the top 100 words. I am able to find that and create the following list: [('test', 510), ('Hey', 362), ('please', 753), ('take'

Solution 1:

It's a little unclear exactly what you want to graph, and how relevant the matplotlib demo you are adapting actually is.

I'll run through some options, and try and answer your specific questions in each case:

  • Using the matplotlib demo, you only need to give ax.hist the list of word frequencies x = words[n][1] ,but this just gives you the relative frequency of the different frequencies... so most of the words occur <100 times, while a couple of words occur much more frequently. This is why your code above returns a histogram of equal bars, because you are giving ax.hist the numbers from 0 to 99 once each. Note that this approach doesn't show the individual words

  • Otherwise, I think you want a bar chart with each bar labelled as a different word.

This worked for me.

words = [('test', 510), ('Hey', 362), ("please", 753), ('take', 446),     ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175),     ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]
wordsdict = {}
for w in words:
    wordsdict[w[0]]=w[1]

plt.bar(range(len(wordsdict)), wordsdict.values(), align='center')
plt.xticks(range(len(wordsdict)), wordsdict.keys())

plt.show()

Solution 2:

Perhaps this is what you want. This code produces a bar chart with each bar representing individual words and vertical axis provides the frequency of word in your text.

The counts array you provided is not sorted as I had expected though.

import numpy as np
import matplotlib.pyplot as plt

counts = [('test', 510), ('Hey', 362), ("please", 753), ('take', 446), ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175), ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]
words = [x[0] for x in counts]
values = [int(x[1]) for x in counts]
print words
mybar = plt.bar(range(len(words)), values, color='green', alpha=0.4)

plt.xlabel('Word Index')
plt.ylabel('Frequency')
plt.title('Word Frequency Chart')
plt.legend()

plt.show()

You can see the graph following a ziphian curve (power law curve). Modify the code to suit your need.

enter image description here


Post a Comment for "Python Histogram Using Matplotlib On Top Words"