Hybrid system for named entity resolution

US 8,374,844 B2
Filed: 08/29/2007
Issued: 02/12/2013
Est. Priority Date: 06/22/2007
Status: Active Grant

First Claim

Patent Images

1. A method for named entity resolution comprising:

providing a stored global distribution space comprising triples, each triple having the form w₁.R.w₂, where w₁and w₂are lexical units, and R is a syntactic relation between the lexical units w₁and w₂, at least some of the lexical units being named entities;

with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity;

comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising;

from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w₁.R or R.w₂and a lexical unit w₂or w₁, respectively, in which the stored context w₁.R or R.w₂is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity;

for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context;

computing a score for each of the plurality of named entity classes based on the computed distances; and

assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for named entity resolution includes parsing an input text string to identify a context in which an identified named entity of the input text string is used. The identified context is compared with at least one stored context in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity. A named entity class is assigned to the identified named entity from the plurality of named entity classes, based on at least one of the identified context and the comparison.

Citations

21 Claims

1. A method for named entity resolution comprising:
- providing a stored global distribution space comprising triples, each triple having the form w₁.R.w₂, where w₁and w₂are lexical units, and R is a syntactic relation between the lexical units w₁and w₂, at least some of the lexical units being named entities;
  
  with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity;
  
  comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising;
  
  from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w₁.R or R.w₂and a lexical unit w₂or w₁, respectively, in which the stored context w₁.R or R.w₂is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity;
  
  for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context;
  
  computing a score for each of the plurality of named entity classes based on the computed distances; and
  
  assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 3. The method of claim 1, wherein the parsing includes applying a set of dependency rules to the input text string, each of the dependency rules specifying a syntactic relation between a first lexical unit based on a named entity and a second lexical unit, the rule being satisfied when the relation is present in the input text string.
  - 4. The method of claim 3, wherein the context is based on the syntactic relation and the named entity.
  - 5. The method of claim 1, wherein the class is selected from a set of classes including a literal class and at least one metonymic class.
  - 6. The method of claim 5, wherein the at least one metonymic class comprises at least one location-specific metonymic class and at least one organization-specific metonymic class.
  - 7. The method of claim 6, wherein the at least one organization-specific metonymic class comprises at least two organization-specific classes selected from the group consisting of:
    - a class in which an organization name stands for its members;
      
      a class in which the organization name refers to an event associated with the organization;
      
      a class in which the organization name refers to its products;
      
      a class in which the organization name stands for the facility that houses the organization;
      
      a class in which the organization name is used as an index indicating its value;
      
      a class in which the name is used as a string; and
      
      a class in which the organization name refers to a representation;
      
      the organization-specific classes optionally further including an additional class for all other types of organization-specific metonymy not otherwise covered.
  - 8. The method of claim 6, wherein the at least one location-specific metonymic class comprises at least two location-specific classes selected from the group consisting of:
    - a class in which a location name stands for persons or an organization associated with it;
      
      a class in which the location name stands for an event that happened there;
      
      a class in which the location name stands for a product developed there;
      
      a class in which the location name is used as a reference to another name;
      
      a class in which the location name refers to a representation;
      
      the location-specific classes optionally further including an additional class for all other types of location-specific metonymy not otherwise covered.
  - 9. The method of claim 1, wherein the parsing includes assigning a preliminary class to the named entity selected from a set of preliminary classes including a literal class, at least one metonymic class, and an unknown class.
  - 10. The method of claim 9, wherein when the preliminary class assigned is an unknown class, the assigning of the named entity class from the plurality of named entity classes is based on the comparison.
  - 11. The method of claim 1, further comprising annotating a document in which the text string occurs in accordance with the assigned class.
  - 12. A computer program product comprising a non-transitory recording medium encoding instructions which, when executed on a computer, perform the method of claim 1.

2. A method for named entity resolution comprising:
- with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used;
  
  comparing the identified context with a plurality of stored contexts in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity; and
  
  assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the comparison, wherein the comparing comprises;
  
  from a stored global distribution space comprising triples, each triple comprising a lexical unit and a context in which the lexical unit is found in a training corpus, each triple having the form w₁.R.w₂, where w₁and w₂are lexical units, and R is a syntactic relation between the lexical units w₁and w₂, at least some of the lexical units being named entities, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w₁.R or R.w₂and a lexical unit w₂or w₁, respectively, the stored context w₁.R or R.w₂being one that is also found in a triple with the identified named entity in the global distribution space;
  
  for triples in the sub-space, determining whether the named entity in the stored context is associated with a class of named entity selected from the plurality of classes, and if so, assigning the class to the named entity;
  
  for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space;
  
  computing a score for each of the plurality of named entity classes based on the computed distances; and
  
  assigning one of the named entity classes to the identified named entity, based on the computed scores.

13. A hybrid system for named entity resolution comprising:
- memory which stores;
  
  a symbolic component for identifying a context in which an identified named entity of an input text string is used;
  
  a data structure which stores a subset of triples identified from a global distribution space comprising triples, each triple in the global distribution space having the form w₁.R.w₂where w₁and w₂are lexical units, and R is a syntactic relation between the lexical units w₁and w₂, at least some of the lexical units being named entities, each of the triples in the subset comprising a stored context w₁.R or R.w₂and a respective lexical unit w₂or w₁, in which the stored context w₁.R or R.w₂is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subset of triples comprises another named entity;
  
  a distribution component for computing a distance between the identified context in which the named entity is being used and another context in which the named entity is used in a known metonymic sense, the distance being computed as a function of a difference between a frequency of occurrence, in the distribution space, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of another named entity in the stored context; and
  
  a processor which implements the symbolic component and distribution component;
  
  the system assigning a class to the identified named entity, based on at least one of the identified context and the computed distance.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, wherein the symbolic component comprises a parser which applies a set of dependency rules to the input text string, each of the dependency rules specifying a syntactic relation between a first lexical unit based on a named entity and a second lexical unit, the rule being satisfied when the relation is present in the input text string.
  - 15. The system of claim 14, wherein the context is based on the relation and the named entity.
  - 16. The system of claim 13, wherein the class is selected from a set of classes including a literal class and at least one metonymic class.
  - 17. The system of claim 16, wherein the at least one metonymic class comprises at least one location-specific metonymic class and at least one organization-specific metonymic class.
  - 18. The system of claim 13, wherein the symbolic component assigns a class to the named entity selected from a set of classes including a literal class, at least one metonymic class, and an unknown class.
  - 19. The system of claim 18, wherein when the class assigned by the symbolic component is an unknown class, the class assigned by the system to the identified named entity is based on the computed distance.
  - 20. The system of claim 13, wherein the distribution component assigns a score to the named entity based on the computes the distance between the context in which the named entity is being used and another context in which the named entity is used in a known metonymic sense.

21. A method for document annotation comprising:
- providing a stored global distribution space comprising triples, each triple having the form w₁.R.w₂, where w₁and w₂are lexical units, and R is a syntactic relation between the lexical units w₁and w₂, at least some of the lexical units being named entities;
  
  inputting, to a computer system, a document comprising at least one text string;
  
  with a processor of the computer system;
  
  parsing the text string to identify a context in which an identified named entity of the text string is used;
  
  comparing the identified context with at least one stored context in which the named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective named entity, the comparing comprising;
  
  from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w₁.R or R.w₂and a lexical unit w₂or w₁, respectively, in which the stored context w₁.R or R.w₂is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity;
  
  for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the computing of the distance being computed for stored contexts in the sub-space; and
  
  assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the computed distances; and
  
  annotating the document based on the assigned class.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Brun, Caroline, Ehrmann, Maud, Jacquet, Guillaume
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US11/846,740
Publication Number

US 20080319978A1
Time in Patent Office

1,994 Days
Field of Search

704 1- 10, 704/240, 704/257, 704/270, 707/736
US Class Current

704/9
CPC Class Codes

G06F 40/295 Named entity recognition

Hybrid system for named entity resolution

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Hybrid system for named entity resolution

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links