System for underlying spelling recovery
First Claim
1. In a grammar checking system, a system for establishing a correct lexical entry for a word in a sentence to permit unambiguous dictionary lookup of said word by identifying if said word is intrinsically capitalized comprising:
- pre-processing means for providing a modified training corpus having words that are not proper nouns or intrinsically capitalized words converted to lower case even if they are at the beginning of a sentence;
means for generating two versions of said sentence in which said word appears in each version in capitalized and uncapitalized form respectively;
means coupled to said modified training corpus for establishing which of said two versions is the more likely; and
,means responsive to said two versions for determining by said more likely version of said sentence if said word is an intrinsically capitalized word.
4 Assignments
0 Petitions
Accused Products
Abstract
In a grammar checking system which includes first tagging a sentence as to parts of speech, underlying spelling is recovered by removing the effects of capitalization of a word so that appropriate inflection and or spelling can be suggested by the system. In order to determine the underlying spelling the system determines whether a noun is a proper noun through the utilization of a part of speech tagger and the utilization of part of speech trigram probabilities, with capitalized and uncapitalized versions of the word having different trigram probabilities. The system also establishes whether a word is an ordinary word as opposed to a proper noun or other intrinsically capitalized word. With the system further determining which of two interpretations of the word is the best one.
-
Citations
8 Claims
-
1. In a grammar checking system, a system for establishing a correct lexical entry for a word in a sentence to permit unambiguous dictionary lookup of said word by identifying if said word is intrinsically capitalized comprising:
-
pre-processing means for providing a modified training corpus having words that are not proper nouns or intrinsically capitalized words converted to lower case even if they are at the beginning of a sentence; means for generating two versions of said sentence in which said word appears in each version in capitalized and uncapitalized form respectively; means coupled to said modified training corpus for establishing which of said two versions is the more likely; and
,means responsive to said two versions for determining by said more likely version of said sentence if said word is an intrinsically capitalized word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification