Method and apparatus for automatic detection of spelling errors in one or more documents
First Claim
1. A method for detecting a spelling error in one or more documents, comprising:
- obtaining a maximum edit distance at which a word, w, is to be considered a possible misspelling of another word, w′
;
determining if at least one given word in said one or more documents satisfies a predefined misspelling criteria, wherein said predefined misspelling criteria comprises said at least one given word having a frequency below a predefined low threshold and said at least one given word being within the obtained maximum edit distance of one or more other words in said one or more documents having a frequency above a predefined high threshold;
identifying a given word as a potentially misspelled word if said given word satisfies said predefined misspelling criteria; and
maintaining a lexicon such that said lexicon will include said given word if said given word does not satisfy said predefined misspelling criteria and will exclude said given word if said given word satisfies said predefined misspelling criteria, wherein one or more of said steps are performed by a processor.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus are provided for automatically detecting spelling errors in one or more documents, such as documents being processed for the creation of a lexicon According to one aspect of the invention, a spelling error is detected in one or more documents by determining if at least one given word in the one or more documents satisfies a predefined misspelling criteria, wherein the predefined misspelling criteria comprises the at least one given word having a frequency below a predefined low threshold and the at least one given word being within a predefined edit distance of one or more other words in the one or more documents having a frequency above a predefined high threshold; and identifying a given word as a potentially misspelled word if the given word satisfies the predefined misspelling criteria.
9 Citations
25 Claims
-
1. A method for detecting a spelling error in one or more documents, comprising:
-
obtaining a maximum edit distance at which a word, w, is to be considered a possible misspelling of another word, w′
;determining if at least one given word in said one or more documents satisfies a predefined misspelling criteria, wherein said predefined misspelling criteria comprises said at least one given word having a frequency below a predefined low threshold and said at least one given word being within the obtained maximum edit distance of one or more other words in said one or more documents having a frequency above a predefined high threshold; identifying a given word as a potentially misspelled word if said given word satisfies said predefined misspelling criteria; and maintaining a lexicon such that said lexicon will include said given word if said given word does not satisfy said predefined misspelling criteria and will exclude said given word if said given word satisfies said predefined misspelling criteria, wherein one or more of said steps are performed by a processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for detecting a spelling error in one or more documents, said system comprising:
-
a memory; and at least one processor, coupled to the memory, operative to; obtain a maximum edit distance at which a word, w, is to be considered a possible misspelling of another word, w′
;determine if at least one given word in said one or more documents satisfies a predefined misspelling criteria, wherein said predefined misspelling criteria comprises said at least one given word having a frequency below a predefined low threshold and said at least one given word being within the obtained maximum edit distance of one or more other words in said one or more documents having a frequency above a predefined high threshold; identify a given word as a potentially misspelled word if said given word satisfies said predefined misspelling criteria; and maintain a lexicon such that said lexicon will include said given word if said given word does not satisfy said predefined misspelling criteria and will exclude said given word if said given word satisfies said predefined misspelling criteria. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. An article of manufacture for detecting a spelling error in one or more documents, comprising a tangible machine readable recordable medium containing one or more programs which when executed implement the steps of:
-
obtaining a maximum edit distance at which a word, w, is to be considered a possible misspelling of another word, w′
;determining if at least one given word in said one or more documents satisfies a predefined misspelling criteria, wherein said predefined misspelling criteria comprises said at least one given word having a frequency below a predefined low threshold and said at least one given word being within the obtained maximum edit distance of one or more other words in said one or more documents having a frequency above a predefined high threshold; identifying a given word as a potentially misspelled word if said given word satisfies said predefined misspelling criteria; maintaining a lexicon such that said lexicon will include said given word if said given word does not satisfy said predefined misspelling criteria and will exclude said given word if said given word satisfies said predefined misspelling criteria. - View Dependent Claims (25)
-
Specification