Wolf Garbe
1 min readSep 30, 2018

--

>> What do I need to build my own freq. dictionary using SymSpell.CreateDictionary?

>>Do I need MS Visual Studio to be able to generate this?

The CreateDictionary(string corpus) method expects a path/filename to large text file (corpus). SymSpell will automatically generate a frequency dictionary from that text file, by splitting the text into words and counting the frequency of unique word within that text.

Problem 1: the Python SymSpell port you have chosen is not a complete port of the original C# version. It does not implement the CreateDictionary method.

You can try to add this method though (untested, I’m not a Python expert):

import redef create_dictionary(self, corpus):        
if os.path.exists(corpus):
with open(corpus) as file:
print "Creating dictionary..."
for line in file:# separate words at non-alphabetical characters words = re.findall('[a-z]+', line.lower())
for word in words: self._create_dictionary_entry(word, 1)
if self._deletes is None:
self._deletes = dict()

Problem 2: CreateDictionary() creates the frequency dictionary internally, so SymSpell can use it for spelling correction. But there is currently no method to export the created dictionary to a text file with word/frequency pairs. So either you implement such export method yourself, or your custom frequency dictionary is created from the corpus text file every time again if you start SymSpell.

--

--

Responses (3)