>> What do I need to build my own freq. dictionary using SymSpell.CreateDictionary?
>>Do I need MS Visual Studio to be able to generate this?
The CreateDictionary(string corpus) method expects a path/filename to large text file (corpus). SymSpell will automatically generate a frequency dictionary from that text file, by splitting the text into words and counting the frequency of unique word within that text.
Problem 1: the Python SymSpell port you have chosen is not a complete port of the original C# version. It does not implement the CreateDictionary method.
You can try to add this method though (untested, I’m not a Python expert):
import redef create_dictionary(self, corpus):
if os.path.exists(corpus):
with open(corpus) as file:
print "Creating dictionary..."
for line in file:# separate words at non-alphabetical characters words = re.findall('[a-z]+', line.lower())
for word in words: self._create_dictionary_entry(word, 1)
if self._deletes is None:
self._deletes = dict()
Problem 2: CreateDictionary() creates the frequency dictionary internally, so SymSpell can use it for spelling correction. But there is currently no method to export the created dictionary to a text file with word/frequency pairs. So either you implement such export method yourself, or your custom frequency dictionary is created from the corpus text file every time again if you start SymSpell.