Wolf Garbe
1 min readAug 23, 2020

--

The benchmark results (100x -1000x faster) given in the SymSpell blog post are referring solely to spelling correction, not to word segmentation. In that post SymSpell was compared to other spelling correction algorithms, not to word segmentation algorithms.

Also, there is an easier way to call a C# library from Python: https://stackoverflow.com/questions/7367976/calling-a-c-sharp-library-from-python

I understand that Python is dominant in the NLP world. But if fast pre-processing in production is critical, then Python is a performance liability, that is difficult to overcome, even with smart algorithms.

From your output examples, I can see that SymSpell word segmentation has some problems with punctuation (among other things). I will look into this and fix it.

--

--