×

Hybrid system for named entity resolution

  • US 8,374,844 B2
  • Filed: 08/29/2007
  • Issued: 02/12/2013
  • Est. Priority Date: 06/22/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for named entity resolution comprising:

  • providing a stored global distribution space comprising triples, each triple having the form w1.R.w2, where w1 and w2 are lexical units, and R is a syntactic relation between the lexical units w1 and w2, at least some of the lexical units being named entities;

    with a computing device, parsing an input text string to identify a context in which an identified named entity of the input text string is used, the context including a lexical unit which is in an identified syntactic relation with the identified named entity;

    comparing the identified context with a plurality of stored contexts, each stored context comprising a respective lexical unit which is in an identified syntactic relation with another named entity and in which the other named entity in the stored context is associated with a class of named entity, the named entity class being selected from a plurality of classes, at least one of the plurality of classes corresponding to a metonymic use of a respective other named entity, the comparing comprising;

    from the stored global distribution space, identifying a sub-space comprising a subset of the triples, each of the triples in the subset comprising a stored context w1.R or R.w2 and a lexical unit w2 or w1, respectively, in which the stored context w1.R or R.w2 is one that is also found in a triple with the identified named entity in the global distribution space and the lexical unit for at least some of the triples in the subspace comprises another named entity;

    for each of the plurality of stored contexts, computing a distance between the identified context and the stored context, the distance being computed as a function of a difference between a frequency of occurrence, in a distribution space derived from a training corpus, of the identified named entity in the identified context and a frequency of occurrence, in the distribution space, of the other named entity in the stored context;

    computing a score for each of the plurality of named entity classes based on the computed distances; and

    assigning a named entity class from the plurality of named entity classes to the identified named entity based on at least one of the identified context and the scores.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×