×

Measuring entity extraction complexity

  • US 8,140,567 B2
  • Filed: 04/13/2010
  • Issued: 03/20/2012
  • Est. Priority Date: 04/13/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method implemented in a computing device, the method comprising:

  • receiving a named entity input;

    identifying a target sense for which the named entity input is to be extracted from a set of documents; and

    generating by the computing device, based at least in part on both the named entity input and the set of documents, an extraction complexity feature indicating how difficult it is deemed to be to identify the named entity input for the target sense in the set of documents by;

    performing a graph-based spreading activation technique to generate a language model including;

    building an undirected graph based on the named entity input and the set of documents, the undirected graph including multiple vertices and multiple edges,incrementing scores of selected ones of the multiple vertices by propagating a relevance of one or more of the multiple vertices through the undirected graph, andnormalizing, after incrementing the scores of the selected ones of the multiple vertices, scores of the multiple vertices to obtain the language model;

    performing a clustering technique to refine the language model resulting in a refined language model that includes multiple clusters each including one or more documents; and

    determining an extraction complexity measurement for the named entity input by;

    determining a relatedness of each of the multiple clusters in the refined language model to the target sense;

    assigning, for each of the multiple clusters, a score to the cluster based on the relatedness of the cluster to the target sense;

    determining an average cluster score that is an average of the scores of the multiple clusters;

    identifying, as a value |C*|, a number of documents in clusters having a score greater than the average cluster score;

    identifying, as a value |D|, a number of documents in the set of documents; and

    determining the extraction complexity measurement as;

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×