The accuracy of automatic spelling correction should be seen always in relation to the percentage of spelling errors in the query. Spelling correction is only beneficial if more errors are corrected than introduced. That is only the case if error rate of spelling correction < error rate of input. This is difficult as there are only few spelling errors in the queries which we can improve, but many correct queries which we can mess up.
The reported percentage of spelling errors in search queries widely varies:
- 5% Microsoft Speller Challenge TREC Data based on the 2008 Million Query Track
- 10–15% Spelling correction as an iterative process that exploits the collective knowledge of web users, Silviu Cucerzan and Eric Brill
- 26% Spelling Correction in the PUBMED Search engine
Could you share some performance data, e.g. how long it takes to correct a word (on average, in milliseconds)?
Have you thought about using the number of results a query receives from your index to use as feedback to choose between two spelling variants of the query? E.g. to use spelling correction only if the query receives no result, thus preventing the introduction of new errors to an already correct query?