Automatic disambiguation based on a reference resource
First Claim
1. A computing system comprising:
- a processor; and
memory storing instructions which, when executed by the processor, configure the computing system to;
identify a source text having a plurality of words;
analyze the source text to identify a surface form in the source text, the surface form being an ambiguous orthographic representation of a proper name for an entity;
based on the identification of the surface form in the source text, access a surface form record representing the surface form, the surface form record identifying at least a first named entity and a second named entity that are different from one another, and are each associated with the surface form and denoted by a proper name,wherein the surface form record comprises a first pointer to a first named entity record that is separate from the surface form record, the first named entity record corresponding to the first named entity and including a first set of context indicators that represents a context of the first named entity, andwherein the surface form record comprises a second pointer to a second named entity cord is separate from the surface form record, the second named entity record corresponding to the second named entity and including a second set of context indicators that represents a context of the second named entity;
use the first pointer to retrieve the first set of context indicators from the first named entity record;
generate a first correlation measure based on a number of occurrences in the source text of the first set of context indicators;
use the second pointer to retrieve the second set of context indicators from the second named entity record;
generate a second correlation measure based on a number of occurrences in the source text of the second set of context indicators;
based on a comparison of the first and second correlation measures, select one of the first or second named entities as corresponding to the surface form in the source text; and
generate a representation of a user interface display that displays the source text and visually associates the surface form and the selected named entity.
2 Assignments
0 Petitions
Accused Products
Abstract
A novel system for automatically indicating the specific identity of ambiguous named entities is provided. An automatic disambiguation data collection is created using a reference resource. Explicit named entities are catalogued from the reference resource, together with various abbreviated, alternative, and casual ways of referring to the named entities. Entity indicators, such as labels and context indicators associated with the named entities in the reference resource, are also catalogued. The automatic disambiguation collection can then be used as a basis for evaluating ambiguous references to named entities in text content provided in different applications. The content surrounding the ambiguous reference may be compared with the entity indicators to find a good match, indicating that the named entity associated with the matching entity indicators is the intended identity of the ambiguous reference, which can be automatically provided to a user.
59 Citations
20 Claims
-
1. A computing system comprising:
-
a processor; and memory storing instructions which, when executed by the processor, configure the computing system to; identify a source text having a plurality of words; analyze the source text to identify a surface form in the source text, the surface form being an ambiguous orthographic representation of a proper name for an entity; based on the identification of the surface form in the source text, access a surface form record representing the surface form, the surface form record identifying at least a first named entity and a second named entity that are different from one another, and are each associated with the surface form and denoted by a proper name, wherein the surface form record comprises a first pointer to a first named entity record that is separate from the surface form record, the first named entity record corresponding to the first named entity and including a first set of context indicators that represents a context of the first named entity, and wherein the surface form record comprises a second pointer to a second named entity cord is separate from the surface form record, the second named entity record corresponding to the second named entity and including a second set of context indicators that represents a context of the second named entity; use the first pointer to retrieve the first set of context indicators from the first named entity record; generate a first correlation measure based on a number of occurrences in the source text of the first set of context indicators; use the second pointer to retrieve the second set of context indicators from the second named entity record; generate a second correlation measure based on a number of occurrences in the source text of the second set of context indicators; based on a comparison of the first and second correlation measures, select one of the first or second named entities as corresponding to the surface form in the source text; and generate a representation of a user interface display that displays the source text and visually associates the surface form and the selected named entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method comprising:
-
identifying a portion of text having a plurality of words; generating a representation of a user interface display that displays the portion of text; identifying a polysemic word in the portion of text; based on the identification of the polysemic word, accessing a surface form record representing the polysemic word, the surface form record identifying at least a first named entity and a second named entity that are different from one another and each associated with the polysemic word, wherein the surface form record comprises a first pointer to a first named entity record that is separate from the surface form record, the first named entity record corresponding to the first named entity and including a first set of context indicators that represents a context of the first named entity, and wherein the surface form record comprises a second pointer to a second named entity record that is separate from the surface form record, the second named entity record corresponding to the second named entity and including a second set of context indicators that represents a context of the second named entity; based on the first pointer, retrieving the first set of context indicators from the first named entity record; generating a first correlation measure based on a number of occurrences in the source text of the first set of context indicators; based on the second pointer, retrieving the second set of context indicators from the second named entity record; generating a second correlation measure based on a number of occurrences in the source text of the second set of context indicators; based on a comparison of the first and second correlation measures, selecting one of the first or second named entities as corresponding to the polysemic word; generate a representation of a user interface element in the user interface display that indicates that the selected named entity is associated with the portion of text. - View Dependent Claims (15, 16, 17)
-
-
18. A hardware computer-readable storage medium storing instructions which, when executed by a computer, perform a method comprising:
-
receiving source text having a plurality of words; identifying, in the source text, a surface form that is an ambiguous orthographic representation of a proper name for an entity; based on the identification of the surface form in the source text, accessing a surface form record representing the surface form, the surface form record identifying at least a first named entity and a second named entity that are different from one another, and are each associated with the surface form and denoted by a proper name, wherein the surface form record comprises a first pointer to a first named entity record that is separate from the surface form record, the first named entity record corresponding to the first named entity and including a first set of context indicators that represents a context of the first named entity; and wherein the surface form record comprises a second pointer to a second named entity record that is separate from the surface form record, the second named entity record corresponding to the second named entity and including a second set of context indicators that represents a context of the second named entity; using the first pointer to retrieve the first set of context indicators from the first named entity record; generating a first correlation measure based on a number of occurrences in the source text of the first set of context indicators; using the second pointer to retrieve the second set of context indicators from the second named entity record; generating a second similarity measure based on a number of occurrences in the source text of the second set of context indicators; based on a comparison of the first and second correlation measures, selecting one of the first or second named entities as corresponding to the surface form in the source text; and generating a representation of a user interface element that indicates that the selected named entity is associated with the corresponding portion of text. - View Dependent Claims (19, 20)
-
Specification