Method for named-entity recognition and verification
First Claim
Patent Images
1. A method for named-entity recognition and verification, comprising the steps of:
- (A) segmenting text data from an article into at least one to-be-tested segments according to a text window;
(B) parsing the to-be-tested segments to remove ill-formed segments from the to-be-tested segments according to a predefined grammar;
(C) using a hypothesis test to assess a confidence measure of each to-be-tested segment, wherein the confidence measure is determined from dividing a probability of assuming that the to-be-teated tested segment has a named-entity by a probability of assuming that the to-be-tested segment doesn'"'"'t have a named-entity, where is a candidate, is the left context of the candidate, and is the right context of the candidate; and
(D) determining that the to-be-tested segment has a named-entity if the confidence measure is greater than a predefined threshold, wherein the confidence measure is expressed by a log likelihood ratio,
1 Assignment
0 Petitions
Accused Products
Abstract
A method for named-entity (NE) recognition and verification is provided. The method can extract at least one to-be-tested segments from an article according to a text window, and use a predefined grammar to parse the at least one to-be-tested segments to remove ill-formed ones. Then, a statistical verification model is used to calculate the confidence measurement of each to-be-tested segment to determine where the to-be-tested segment has a named-entity or not. If the confidence measurement is less than a predefined threshold, the to-be-tested segment will be rejected. Otherwise, it will be accepted.
129 Citations
18 Claims
-
1. A method for named-entity recognition and verification, comprising the steps of:
-
(A) segmenting text data from an article into at least one to-be-tested segments according to a text window; (B) parsing the to-be-tested segments to remove ill-formed segments from the to-be-tested segments according to a predefined grammar; (C) using a hypothesis test to assess a confidence measure of each to-be-tested segment, wherein the confidence measure is determined from dividing a probability of assuming that the to-be-teated tested segment has a named-entity by a probability of assuming that the to-be-tested segment doesn'"'"'t have a named-entity, where is a candidate, is the left context of the candidate, and is the right context of the candidate; and (D) determining that the to-be-tested segment has a named-entity if the confidence measure is greater than a predefined threshold, wherein the confidence measure is expressed by a log likelihood ratio, - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
5. The method as claimed in claim 4, wherein in step (D), the confidence measure is determined by using Neyman-Pearson Lemma.
-
6. The method as claimed in claim 1, wherein a named-entity model (NE model) is used to determine log
-
( o L , 1 L , x , o C1 C , y , o R , 1 R , z | H 0 ) , where approximates to and approximates to
-
-
7. The method as claimed in claim 6, wherein
-
( o L , 1 L , x ) approximates to and equals to where N is a positive integer.
-
-
8. The method as claimed in claim 6, wherein
-
( o R , 1 R , z ) approximates to and equals to where N is a positive integer.
-
-
9. The method as claimed in claim 6, wherein
-
( o C , 1 C , y ) equals to and approximates to where T is a possible parsing tree, and A→
α
is a rule in the parsing tree T.
-
-
10. The method as claimed in claim 9, wherein the NE model
-
( o L , 1 L , x , o C , 1 C , y , o R , 1 R , z ) is ∑ i = 1 x log P 0 ( o L , i | o L , i - N + 1 L , i - 1 ) + ∑ i = 1 z log P 0 ( o R , i | o R , i - N + 1 R , i - 1 ) + max T ∑ A -> α ∈ T log P 0 ( α | A ) .
-
-
11. The method as claimed in claim 1, wherein an anti-named-entity model (anti-NE model) is used to determine
-
( o L , 1 L , x , o C , 1 C , y , o R , 1 R , z | H 1 ) , where approximates to and N is a positive integer.
-
-
12. The method as claimed in claim 11, wherein oR,j equals to oC,y+j if j=0, −
- 1, −
2, . . . , oC,j equals to oL,x+j if j=0, −
1, −
2, . . . , andequals to
- 1, −
-
13. The method as claimed in claim 11, wherein the anti-NE model
-
( o L , 1 L , x , o C , 1 C , y , o R , 1 R , z ) is ∑ i = 1 x log P 1 ( o L , i | o L , i - N + 1 L , i - 1 ) + ∑ i = 1 y log P 1 ( o C , i | o C , i - N + 1 C , i - 1 ) + ∑ i = 1 z log P 1 ( o R , i | o R , i - N + 1 R , i - 1 ) .
-
-
14. The method as claimed in claim 1, wherein the candidate
-
1 C , y is composed of random variables oc,1, oc,2 . . . , and oc,y, where y is the number of characters of the candidate.
-
-
15. The method as claimed in claim 1, wherein the left context
-
1 L , x is composed of random variables oL,1, oL,2 . . . , and oL,x, where x is the number of characters of the left context.
-
-
16. The method as claimed in claim 1, wherein the right context
-
1 R , z is composed of random variables oR,1, oR,2 . . . , and oR,z, where z is the number of characters of the right context.
-
-
17. The method as claimed in claim 2, wherein each random variable is a Chinese character.
-
18. The method as claimed in claim 2, wherein each random variable is an English word.
Specification