Measuring entity extraction complexity
First Claim
Patent Images
1. A method implemented in a device, the method comprising:
- receiving a named entity input;
identifying a target sense for which the named entity input is to be extracted from a set of documents; and
generating, based at least in part on both the named entity input and the set of documents, an extraction complexity feature that indicates how difficult it is deemed to be to identify the named entity input for the target sense in the set of documents, the generating including building an undirected graph based on the named entity input and the set of documents and looking for contexts in the undirected graph that are related to the target sense, the undirected graph including multiple vertices and multiple edges.
2 Assignments
0 Petitions
Accused Products
Abstract
A named entity input is received and a target sense for which the named entity input is to be extracted from a set of documents is identified. An extraction complexity feature is generated based on the named entity input, the target sense, and the set of documents. The extraction complexity feature indicates how difficult or complex it is deemed to be to identify the named entity input for the target sense in the set of documents.
20 Citations
26 Claims
-
1. A method implemented in a device, the method comprising:
-
receiving a named entity input; identifying a target sense for which the named entity input is to be extracted from a set of documents; and generating, based at least in part on both the named entity input and the set of documents, an extraction complexity feature that indicates how difficult it is deemed to be to identify the named entity input for the target sense in the set of documents, the generating including building an undirected graph based on the named entity input and the set of documents and looking for contexts in the undirected graph that are related to the target sense, the undirected graph including multiple vertices and multiple edges. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more computer storage media having stored thereon multiple instructions that, when executed by one or more processors of a computing device, cause the one or more processors to:
-
receive a named entity input from a source; identify a target sense for the named entity input, wherein the target sense is a particular desired usage of the named entity input in a document set; and generate, based at least in part on both the named entity input and the document set, an extraction complexity measurement that indicates a complexity of identifying the named entity input in the document set for the target sense, wherein to generate the extraction complexity measurement is to build an undirected graph based on the named entity input and the document set, the undirected graph including multiple vertices and multiple edges, the multiple vertices comprising co-occurring contexts surrounding occurrences of the named entity input in the set of documents. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. One or more computer storage media having stored thereon multiple instructions that, when executed by one or more processors of a computing device, cause the one or more processors to:
-
receive a named entity input from a source; identify a target sense for the named entity input, wherein the target sense is a particular desired usage of the named entity input in a set of documents; and generate, based at least in part on both the named entity input and the set of documents, an extraction complexity measurement that indicates how difficult it is deemed to be to identify the named entity input in the set of documents for the target sense, wherein to generate the extraction complexity measurement is to; perform a graph-based spreading activation technique to generate a language model by; building an undirected graph based on the named entity input and the set of documents, the undirected graph including multiple vertices and multiple edges, incrementing scores of selected ones of the multiple vertices by propagating a relevance of one or more of the multiple vertices through the undirected graph, and normalizing, after incrementing the scores of the selected ones of the multiple vertices, scores of the multiple vertices to obtain the language model; perform a graph-based clustering technique to refine the language model; and determine the extraction complexity measurement based on the refined language model.
-
-
21. A computing device comprising:
-
one or more processors; and one or more computer storage media having stored thereon multiple instructions that, responsive to execution by the one or more processors, cause the one or more processors to perform acts comprising; receiving a named entity input; identifying a target sense for which the named entity input is to be extracted from a set of documents; and generating, based at least in part on both the named entity input and the set of documents, an extraction complexity feature that indicates how difficult it is deemed to be to identify the named entity input for the target sense in the set of documents, the generating including building an undirected graph based on the named entity input and the set of documents and looking for contexts in the undirected graph that are related to the target sense, the undirected graph including multiple vertices and multiple edges. - View Dependent Claims (22, 23)
-
-
24. A computing device comprising:
-
one or more processors; and one or more computer storage media having stored thereon multiple instructions that, responsive to execution by the one or more processors, cause the one or more processors to; receive a named entity input from a source; identify a target sense for the named entity input, wherein the target sense is a particular desired usage of the named entity input in a document set; and generate, based at least in part on both the named entity input and the document set, an extraction complexity measurement that indicates a complexity of identifying the named entity input in the document set for the target sense, wherein to generate the extraction complexity measurement is to build an undirected graph based on the named entity input and the document set, the undirected graph including multiple vertices and multiple edges, the multiple vertices comprising co-occurring contexts surrounding occurrences of the named entity input in the set of documents. - View Dependent Claims (25, 26)
-
Specification