Method of and apparatus for recognizing and labeling instances of name classes in textual environments
First Claim
Patent Images
1. A named-entity recognition system comprising:
- a multilevel hidden Markov model, said hidden Markov model including;
(a) at least one upper level representing named entities; and
(b) at least one lower level representing words generated by the upper level.
12 Assignments
0 Petitions
Accused Products
Abstract
A computer assisted method for recognizing and labeling instances of name classes in textual environments is described. The invention receives training text having instances of named-entity classes labeled, from which it generates a database of bigram and unigram occurrences. The invention uses the database of bigram and unigram occurrences to form a two level Hidden Markov Model with single output states at the lower level. The invention also receives a series of input text to be processed and labeled with respect to the name classes, and the invention uses the two level Hidden Markov Model to recognize and label instances of named-entity classes in the input text.
71 Citations
16 Claims
-
1. A named-entity recognition system comprising:
a multilevel hidden Markov model, said hidden Markov model including; (a) at least one upper level representing named entities; and (b) at least one lower level representing words generated by the upper level. - View Dependent Claims (2, 3, 4, 5)
-
6. A named-entity recognition system comprising:
a multilevel hidden Markov model, said hidden Markov model including; (a) at least one upper level including a first plurality of states of a predefined classification of named entities, and (b) at least one lower level including a second plurality of states of a predefined classification of words, wherein at least one of the transitions between the upper states is conditioned on the output of the previous lower states, and at least one of the transitions between the lower state is conditioned on the output of the previous upper state. - View Dependent Claims (7, 8, 15)
-
9. A computer-assisted method for recognizing and labeling instances of name classes in textual environments, comprising the following steps:
-
A. receiving a marked string of words and using said marked string to estimate a plurality of HMM parameters, and using said plurality of HMM parameters to generate an HMM having N states, each of said N states representing a name class and having a plurality of member states; B. receiving an unmarked string of K words, K being an integer; C. recognizing and labeling a plurality of name class instances in said unmarked string of K words, including the following substeps; i. receiving a kth and a (k+1)th word from said unmarked string of K words; ii. determining, from said HMM parameters, a transition probability representing a likelihood of said (k+1)th word being a member of an ith state, given said kth word being a member of a jth state, and given said (k+1)th word immediately following said kth word, said ith and kth states being member states of said N states; iii. repeating step (ii) for all i from 1 to N and for all j from 1 to N, whereby N2 transition probabilities are determined, and retaining the highest transition probability to the (k+1)th word in each of the N states; iv. repeating steps (i), (ii) and (iii) for all k from 1 to K, whereby N·
(K-1) transition probabilities are retained;v. identifying a highest probability path from the Kth word back to the first word; and
,vi. labeling each of said K words with said name class state through which said highest probability path from the N identified paths passes. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
16. A computer system for recognizing and labeling instances of name classes in textual environments, comprising:
-
A. means for providing a marked string of words and using said marked string to estimate a plurality of HMM parameters, and using said plurality of HMM parameters to generate an HMM having N states, each of said N states representing a name class and having a plurality of member states; B. means for receiving an unmarked string of K words, K being an integer; C. means for recognizing and labeling a plurality of name class instances in said unmarked string of K words, including; i. means for receiving a kth and a (k+1)th word from said unmarked string of K words; ii. means for determining, from said HMM parameters, a transition probability representing a likelihood of said (k+1)th word being a member of an ith state, given said kth word being a member of a jth state, and given said (k+1)th word immediately following said kth word, said ith and kth states being member states of said N states; iii. means for repeating step (ii) for all i from 1 to N and for all j from 1 to N, whereby N2 transition probabilities are determined, and retaining the highest transition probability to the (k+1)th word in each of the N states; iv. means for repeating steps (i), (ii) and (iii) for all k from 1 to K, whereby N·
(K-1) transition probabilities are retained;v. means for identifying a highest probability path from the Kth word back to the first word; and
,vi. means for labeling each of said K words with said name class state through which said highest probability path from the N identified paths passes.
-
Specification