Adaptive and scalable method for resolving natural language ambiguities
First Claim
1. A method for resolving natural language ambiguities within text documents, comprising the steps of:
- training probabilistic classifiers from annotated training data containing a sense tag for each polysemous word;
processing said text documents into tokens and determining their part-of-speech tags;
computing a measure of confidence using said probabilistic classifiers for each known sense of said tokens defined within a semantic lexicon based on contextual features and assigning a default sense for tokens absent from said semantic lexicon based on their part-of-speech tags;
determining assignment of word senses for each said token in said sentence such that the combined probability across said sentence is maximized; and
integrating additional contextual features as generated by additional natural language processing modules into said probabilistic classifiers whereby said measures of confidence is improved.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for resolving ambiguities in natural language by organizing the task into multiple iterations of analysis done in successive levels of depth. We refer to this method as Adaptive Language Processing (ALP), in that the processing done is adaptive to the users'"'"' need for accuracy and efficiency. At each level of the ALP model the most accurate disambiguation is made based on the available information. As more analysis is done, additional knowledge is incorporated in a systematic manner to improve disambiguation accuracy. This multi-level approach allows for time-consuming steps to be parlayed or omitted, based on the needs of the users. Associated with each level of processing is a measure of confidence, used to gauge the confidence of a process in its disambiguation accuracy. An overall confidence measure is also used to reflect the level of the analysis done, the particular NLP techniques used in the disambiguation, and the amount of training data available. This measure allows for better scalability to technological advances, such as improved algorithms and added training data. Applications based on these disambiguated outputs will automatically improve by accounting for these measures of confidence.
175 Citations
30 Claims
-
1. A method for resolving natural language ambiguities within text documents, comprising the steps of:
-
training probabilistic classifiers from annotated training data containing a sense tag for each polysemous word;
processing said text documents into tokens and determining their part-of-speech tags;
computing a measure of confidence using said probabilistic classifiers for each known sense of said tokens defined within a semantic lexicon based on contextual features and assigning a default sense for tokens absent from said semantic lexicon based on their part-of-speech tags;
determining assignment of word senses for each said token in said sentence such that the combined probability across said sentence is maximized; and
integrating additional contextual features as generated by additional natural language processing modules into said probabilistic classifiers whereby said measures of confidence is improved. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for use in a natural language processing system for resolving natural language ambiguities within text documents, comprising:
-
a trainer that trains probabilistic classifiers from annotated training data containing a sense tag for each polysemous word;
a part-of-speech processor that processes said text documents into tokens and determines their part-of-speech tags;
a classifier module that computes a measure of confidence using said probabilistic classifiers for each known sense of said tokens defined within a semantic lexicon based on contextual features and assigns a default sense for tokens absent from said semantic lexicon based on their part-of-speech tags;
a word sense disambiguator that determines assignment of word senses for each said token in said sentence such that the combined probability across said sentence is maximized; and
a context integrator that integrates additional contextual features as generated by additional natural language processing apparatuses into said probabilistic classifiers whereby said measures of confidence is improved. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification