Methods for part-of-speech determination and usage
First Claim
1. An automated method for assigning parts of speech to words in a message, of the type comprising the steps of:
- storing data in a computer system which is relevant to words likely to be in the message and usages of said words as various parts of speech, andemploying processing means in the computer system to select for each word in the message a likely part of speech responsive to a likely part of speech for at least an adjacent word in the message and responsive to said stored data,said method being characterized in that;
the storing step comprises storing statistical data relating to;
(1) the actual likelihood of occurrence of each one of said likely words as a particular part of speech (hereinafter, the lexical likelihood), and (2) the actual likelihoods of occurrence of each of said words as a particular part of speech when occurring adjacent to words that are particular parts of speech (hereinafter, the contextual likelihood), andthe selecting step comprises maximizing, for each word in the message, its overall likelihood of being a particular part of speech by a finite-state optimization technique commonly known as the "Viterbi" optimization technique, said technique being responsive to both the stored lexical likelihoods for each said word and the stores contextual likelihoods for at least said adjacent word.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods for determination of parts of speech of words in a text or other non-verbal record are extended to include so-called Viterbi optimization based on stored statistical data relating to actual usage and to include noun-phrase parsing. The part-of-speech tagging method optimizes the product of individual word lexical probabilities and normalized three-word contextual probabilities. Normalization involves dividing by the contained two-word contextual probabilities. The method for noun phrase parsing involves optimizing the choices of, typically non-recursive, noun phrases by considering all possible beginnings and endings thereof, preferably based on the output of the part-of-speech tagging method.
333 Citations
6 Claims
-
1. An automated method for assigning parts of speech to words in a message, of the type comprising the steps of:
-
storing data in a computer system which is relevant to words likely to be in the message and usages of said words as various parts of speech, and employing processing means in the computer system to select for each word in the message a likely part of speech responsive to a likely part of speech for at least an adjacent word in the message and responsive to said stored data, said method being characterized in that; the storing step comprises storing statistical data relating to;
(1) the actual likelihood of occurrence of each one of said likely words as a particular part of speech (hereinafter, the lexical likelihood), and (2) the actual likelihoods of occurrence of each of said words as a particular part of speech when occurring adjacent to words that are particular parts of speech (hereinafter, the contextual likelihood), andthe selecting step comprises maximizing, for each word in the message, its overall likelihood of being a particular part of speech by a finite-state optimization technique commonly known as the "Viterbi" optimization technique, said technique being responsive to both the stored lexical likelihoods for each said word and the stores contextual likelihoods for at least said adjacent word. - View Dependent Claims (2, 3, 5, 6)
-
-
4. An automated method for determining beginning and end boundaries of noun phrases in a message comprising a sequence of words,
said method being characterized by the steps of: -
storing data in a computer system, the data regarding the probability of noun phrase boundaries occurring between said words, and in processing means in the computer system, performing the steps of assigning all possible noun phrase boundaries, eliminating all non-paired boundaries, and selecting optimum choices for said boundaries using contextual noun phrase boundary probabilities based on said stored data.
-
Specification