Hybrid system for named entity resolution
First Claim
Patent Images
1. A method for named entity resolution comprising:
- providing a stored global distribution space comprising triples, each triple having the form w1.R.w2, where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities;
with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity;
comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising;
from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w1.R or R.w2 and a lexical unit w2 or w1, respectively, in which the stored context w1.R or R.w2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity;
for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context;
computing a score for each of the plurality of named entity classes based on the computed distances; and
assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores.
7 Assignments
0 Petitions
Accused Products
Abstract
A method for named entity resolution includes parsing an input text string to identify a context in which an identified named entity of the input text string is used. The identified context is compared with at least one stored context in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity. A named entity class is assigned to the identified named entity from the plurality of named entity classes, based on at least one of the identified context and the comparison.
-
Citations
21 Claims
-
1. A method for named entity resolution comprising:
-
providing a stored global distribution space comprising triples, each triple having the form w1.R.w2, where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities; with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity; comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising; from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w1.R or R.w2 and a lexical unit w2 or w1, respectively, in which the stored context w1.R or R.w2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context; computing a score for each of the plurality of named entity classes based on the computed distances; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
2. A method for named entity resolution comprising:
-
with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used; comparing the identified context with a plurality of stored contexts in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the comparison, wherein the comparing comprises; from a stored global distribution space comprising triples, each triple comprising a lexical unit and a context in which the lexical unit is found in a training corpus, each triple having the form w1.R.w2, where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w1.R or R.w2 and a lexical unit w2 or w1, respectively, the stored context w1.R or R.w2 being one that is also found in a triple with the identified named entity in the global distribution space; for triples in the sub-space, determining whether the named entity in the stored context is associated with a class of named entity selected from the plurality of classes, and if so, assigning the class to the named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space; computing a score for each of the plurality of named entity classes based on the computed distances; and assigning one of the named entity classes to the identified named entity, based on the computed scores.
-
-
13. A hybrid system for named entity resolution comprising:
-
memory which stores; a symbolic component for identifying a context in which an identified named entity of an input text string is used; a data structure which stores a subset of triples identified from a global distribution space comprising triples, each triple in the global distribution space having the form w1.R.w2 where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities, each of the triples in the subset comprising a stored context w1.R or R.w2 and a respective lexical unit w2 or w1, in which the stored context w1.R or R.w2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subset of triples comprises another named entity; a distribution component for computing a distance between the identified context in which the named entity is being used and another context in which the named entity is used in a known metonymic sense, the distance being computed as a function of a difference between a frequency of occurrence, in the distribution space, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of another named entity in the stored context; and a processor which implements the symbolic component and distribution component; the system assigning a class to the identified named entity, based on at least one of the identified context and the computed distance. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for document annotation comprising:
-
providing a stored global distribution space comprising triples, each triple having the form w1.R.w2, where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities; inputting, to a computer system, a document comprising at least one text string; with a processor of the computer system; parsing the text string to identify a context in which an identified named entity of the text string is used; comparing the identified context with at least one stored context in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity, the comparing comprising; from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w1.R or R.w2 and a lexical unit w2 or w1, respectively, in which the stored context w1.R or R.w2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity; for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space; and assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the computed distances; and annotating the document based on the assigned class.
-
Specification