Wolf Garbe
1 min readJul 9, 2020

--

Are you referring to "So all possible combinations of the 4 spelling error types (insert, delete, replace and adjacent switch) are generated. This is quite expensive with e.g. 114,324 candidate term generated for a word of length=9 and edit distance=2." ?

I did not use a formula, but I counted the number of generated candidates, using/modifying the faithful C# port from Lorenzo Stoakes of Peter Norvig’s algorithm https://github.com/lorenzo-stoakes/spell-correct/blob/master/cs/faith.cs

114,324 different candidates are generated for edit distance=2, word length=9 and alphabet size=26.

494 different candidates are generated for edit distance=1, word length=9 and alphabet size=26.

So for a maximum edit distance <=2 we need to add 494+114,324=114,818 suggestions.

Please note that I counted *distinct* candidates only.

The formula given by Norvig for edit distance=1 contains also non-distinct candidates, because weather or not identical candidates are created by different operations depends on the specific letters in the word, which are not covered by the formula: number of candidates = n deletions + n-1 transpositions + a*n alterations + a*(n+1) insertions.

Of course this formula can be amended recursively to cover also edit distances > 2.

--

--

Responses (1)