Arabic spell checking technique
First Claim
1. An Arabic language spelling error detection and correction method, comprising:
- comparing digitized text with a corpus database of Arabic words and sentences to identify text errors and grammatical errors in the digitized text;
identifying a plurality of candidate correction options from a dictionary of text for at least one of the text errors and the grammatical errors identified in the comparing, wherein the candidate correction options are detected with a plurality of n-gram statistic model sequences,ranking the candidate correction options in order of highest probability of most correct alternative using the n-gram statistical model sequences,selecting a highest probability correction word according to the context of the text,conducting a final end of file evaluation by comparing the highest probability correction word with the corresponding text error or grammatical error to assess the accuracy of the highest probability correction word; and
initiating a word correction by substituting the text error or the grammatical error with the highest probability correction word when the accuracy of the selected word is positive.
1 Assignment
0 Petitions
Accused Products
Abstract
An Arabic spelling error detection and correction method for identifying real word spelling errors. The method uses a corpus of Arabic text alongside n-gram statistical techniques to detect erroneous words within the text. After identifying the erroneous word the method uses a dictionary formed from the corpus of Arabic text to retrieve candidate correction word to replace the erroneous word with. Using n-gram statistical models candidate correction words are generated and ranked in order of highest probable correction for the word. The generated and ranked correction words are assessed and the best correction word is selected. A final assessment of the correction is conducted and if the result is positive then erroneous word is replaced with the highest statistical correction.
-
Citations
18 Claims
-
1. An Arabic language spelling error detection and correction method, comprising:
-
comparing digitized text with a corpus database of Arabic words and sentences to identify text errors and grammatical errors in the digitized text; identifying a plurality of candidate correction options from a dictionary of text for at least one of the text errors and the grammatical errors identified in the comparing, wherein the candidate correction options are detected with a plurality of n-gram statistic model sequences, ranking the candidate correction options in order of highest probability of most correct alternative using the n-gram statistical model sequences, selecting a highest probability correction word according to the context of the text, conducting a final end of file evaluation by comparing the highest probability correction word with the corresponding text error or grammatical error to assess the accuracy of the highest probability correction word; and initiating a word correction by substituting the text error or the grammatical error with the highest probability correction word when the accuracy of the selected word is positive. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for detecting and correcting spelling errors, the apparatus comprising:
-
circuitry configured to generate a word co-occurrence model for a given set of words to be analyzed; to generate an n-gram language model for the given set of words; and to check an output made by the word co-occurrence model and an output made by the n-gram language model and to compare the two outputs of two models, wherein if the two outputs of the two models are determined to be the same, then this output is considered the output of the combined method of the word co-occurrence model and the n-gram language model and a correct word is output. - View Dependent Claims (16, 17, 18)
-
Specification