Automatic disambiguation based on a reference resource
First Claim
1. A method comprising:
- identifying a surface form of a named entity in a text, wherein the surface form is associated in a surface form reference with one or more reference named entities, and each of the reference named entities is associated in a named entity reference with one or more entity indicators;
evaluating one or more measures of correlation among one or more of the entity indicators, and the text;
identifying one of the reference named entities for which the associated entity indicators have a relatively high correlation to the text; and
providing a disambiguation output that indicates the identified reference named entity to be associated with the surface form of the named entity in the text.
2 Assignments
0 Petitions
Accused Products
Abstract
A novel system for automatically indicating the specific identity of ambiguous named entities is provided. An automatic disambiguation data collection is created using a reference resource. Explicit named entities are catalogued from the reference resource, together with various abbreviated, alternative, and casual ways of referring to the named entities. Entity indicators, such as labels and context indicators associated with the named entities in the reference resource, are also catalogued. The automatic disambiguation collection can then be used as a basis for evaluating ambiguous references to named entities in text content provided in different applications. The content surrounding the ambiguous reference may be compared with the entity indicators to find a good match, indicating that the named entity associated with the matching entity indicators is the intended identity of the ambiguous reference, which can be automatically provided to a user.
246 Citations
20 Claims
-
1. A method comprising:
-
identifying a surface form of a named entity in a text, wherein the surface form is associated in a surface form reference with one or more reference named entities, and each of the reference named entities is associated in a named entity reference with one or more entity indicators; evaluating one or more measures of correlation among one or more of the entity indicators, and the text; identifying one of the reference named entities for which the associated entity indicators have a relatively high correlation to the text; and providing a disambiguation output that indicates the identified reference named entity to be associated with the surface form of the named entity in the text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-readable medium comprising computer-executable instructions which, when executed by a computing device, enable the computing device to prepare and apply an automatic disambiguation system, comprising steps of:
-
extracting a collection of surface forms associated with named entities from an information resource; extracting a collection of labels associated with the named entities from the information resource; extracting a collection of context indicators associated with the named entities from the information resource; when provided with a surface form in a text sample, evaluating a measure of correlation of entity indicators associated with the surface form in the text sample with the labels and the context indicators associated with the named entities associated with the surface form in the collection of surface forms; and providing an output, based on the measure of correlation, indicating one of the named entities to be a disambiguation of the surface form in the text sample.
-
-
20. A computer-readable medium comprising one or more databases, the one or more databases comprising a collection of reference surface forms and a collection of named entities, wherein each of the reference surface forms is associated with one or more of the named entities, and each of the named entities is associated with one or more entity indicators extracted from one or more reference works in which the one or more entity indicators are associated with the named entities;
- wherein the computer-readable medium further comprises computer-executable instructions that, when executed by a computing device, configure the computing device to disambiguate one or more polysemic words in a sample of text, comprising steps of;
identifying a respective polysemic word with one of the reference surface forms; evaluating a similarity measure between the entity indicators associated with the named entities associated with the reference surface forms, and entity indicators associated with the respective polysemic word in the sample of text; identifying one of the named entities as having a relatively high similarity measure between its associated entity indicators and the entity indicators associated with the respective polysemic word; and providing a disambiguation output indicating the polysemic word to be associated with the identified named entity.
- wherein the computer-readable medium further comprises computer-executable instructions that, when executed by a computing device, configure the computing device to disambiguate one or more polysemic words in a sample of text, comprising steps of;
Specification