Skip to content Skip to sidebar Skip to footer

In Spell Checker How To Get The Word That Are 3 Edits Away(norvig)

I have been trying to use spell corrector for my database table to correct the address from one table, for which I have used the reference of http://norvig.com/spell-correct.html U

Solution 1:

We can use existing 1 edit list and make 1 edit for member in that list

Algorithm: One_Edit_Words = edits1(word) for each in One_Edit_Words do edits1(each)

def edit2(word): new = edits1(word) # Get list of all the one edits for i in edits1(word): # Iterate through all the objects in one edit list new.update(edits1(i)) # make one edit for each object and add in list return new # Return list with all the edits

Similarly we can use same method to get any number of edits Below Function will help you in getting 3 edits

def edit3(word): new = edit2(word) for i in edit2my(word): new.update(edits1(i)) return new Warning : Takes too much of time even for small computation (Time complexity High)

Solution 2:

The above answers are ok, but there is a faster solution than checking the exponentially increasing set of strings of edit distance k. Suppose we had a data structure that stored the set of all words in a tree structure. This is useful because we know, for example, that we need not search paths in which there are no words. This is both memory efficient and computationally efficient.

Suppose we have a vocabulary stored in a set, dict, or a ideally, a collections.Counter object, then we can set up the data structure as follows:

classVocabTreeNode:def__init__(self):
        self.children = {}
        self.word = None
        
    defbuild(self, vocab):
        for w invocab:self.insert(w)

    definsert( self, word):
        node = selffor letter inword:if letter notin node.children: 
                node.children[letter] = VocabTreeNode()
            node = node.children[letter]
        node.word = word

To search only the set of elements of edit distance k from the word, we may endow this structure with a recursive search.

defsearch(self, word, maxCost):
        currentRow = range( len(word) + 1 )    
        results = []
        for letter in self.children:
            self.searchRecursive(self.children[letter], letter, 
                                 word, currentRow, results, 
                                 maxCost)   
        return results
            
    defsearchRecursive(self, node, letter, word, previousRow, 
                        results, maxCost):
        columns = len( word ) + 1
        currentRow = [ previousRow[0] + 1 ]
        for column inrange( 1, columns ):
            insertCost = currentRow[column - 1] + 1
            deleteCost = previousRow[column] + 1if word[column - 1] != letter:
                replaceCost = previousRow[ column - 1 ] + 1else:                
                replaceCost = previousRow[ column - 1 ]
            currentRow.append( min( insertCost, deleteCost, replaceCost ) )
    
        if currentRow[-1] <= maxCost and node.word != None:
            results.append( (node.word, currentRow[-1] ) )
        ifmin( currentRow ) <= maxCost:
            for next_letter in node.children:
                self.searchRecursive( node.children[next_letter], next_letter, word,
                                      currentRow, results, maxCost)

There is just one problem that I'm not sure how to overcome; transpositions are not valid as paths, so i'm not sure how to incorporate transpositions as edit distance 1 without a somewhat complicated hack.

My corpus of words was 97722 (the set of words in almost any linux distro).

sleep(1)
start=time()

for i inrange(100):
    x = V.search('elephant',3)
    
print(time()-start)

>>>17.5

Which equates to edit distance 3 calculations for this word every 0.175 seconds. Edit distance 4 was able to be done in .377 seconds, whereas consecutive edit distances using the edits1 will quickly cause your system to run out of memory.

With the caveat of not easily handling transpositions, this is a fast effective way of implementing a Norvig-type algorithm for high edit distances.

Post a Comment for "In Spell Checker How To Get The Word That Are 3 Edits Away(norvig)"