Wolf Garbe
2 min readNov 14, 2020

--

Thank you very much for the feedback.

>> BTW, one aspect that is missing in your blog is accuracy (or recall and precision) data. Beyond speed, the quality of the suggestions does matter. Any number on this?

Of course, quality does matter. The reason I didn't include the F1-score in my benchmark is simply that SymSpell doesn't change/improve anything regarding accuracy.

As its similarity calculation is still based on the Damerau-Levenshtein edit distance, it provides identical results and precision as other Damerau-Levenshtein based solutions (e.g. https://norvig.com/spell-correct.html).

While SymSpells precision is identical to other Damerau-Levenshtein based algorithms, its lookup speed in a large dictionary is much faster by using deletes only instead of deletes + transposes + replaces + inserts.

SymSpell is an algorithm, rather than a turn-key spelling correction product.

Its purpose is to find all terms/candidates from a large dictionary which are within a maximum edit distance (Damerau-Levenshtein) to an input term in a very short time. The possible applications are far beyond spell checking.

For spelling correction, we should think of SymSpell as one step in the pipeline to pre-filter candidates. Then we can rank and re-order those candidates with other algorithms that e.g. take the context of the whole sentence into account, e.g. with word embeddings and word vectors.

https://blog.usejournal.com/a-simple-spell-checker-built-from-word-vectors-9f28452b6f26

https://towardsdatascience.com/embedding-for-spelling-correction-92c93f835d79

>> However, I did achieve interesting results using a delete + transpose + replace + insert + split + soundex comparison routine and moderate suggestions based on word frequency. Up until now, I was able to achieve rather good performance of 0.1 to 0.08 ms/word

Sounds impressive. But of course, it all depends on the maximum edit distance within which the lookup can find similar terms, the size of the dictionary, and the average lookup word length.

--

--

Responses (1)