AUTOMATIC DISAMBIGUATION BASED ON A REFERENCE RESOURCE
First Claim
1. A computer-implemented disambiguation system comprising:
- a surface forms database having a plurality of surface form records, each of the surface form records corresponding to a surface form that is an ambiguous orthographic representation of a common name for an entity, each of the surface form records having an indication of the corresponding surface form and indications of named entities that are associated with the surface form;
a named entities database have a plurality of named entity records, each of the named entity records corresponding to one of the named entities, each of the named entity records having labels representative of the named entity and contextual information associated with the named entity; and
a computer processor that identifies one of the surface forms in a text, the computer processor evaluating the identified one of the surface forms against the labels and the contextual information to determine which one of the named entities is most associated with the one of the surface forms.
2 Assignments
0 Petitions
Accused Products
Abstract
A novel system for automatically indicating the specific identity of ambiguous named entities is provided. An automatic disambiguation data collection is created using a reference resource. Explicit named entities are catalogued from the reference resource, together with various abbreviated, alternative, and casual ways of referring to the named entities. Entity indicators, such as labels and context indicators associated with the named entities in the reference resource, are also catalogued. The automatic disambiguation collection can then be used as a basis for evaluating ambiguous references to named entities in text content provided in different applications. The content surrounding the ambiguous reference may be compared with the entity indicators to find a good match, indicating that the named entity associated with the matching entity indicators is the intended identity of the ambiguous reference, which can be automatically provided to a user.
17 Citations
20 Claims
-
1. A computer-implemented disambiguation system comprising:
-
a surface forms database having a plurality of surface form records, each of the surface form records corresponding to a surface form that is an ambiguous orthographic representation of a common name for an entity, each of the surface form records having an indication of the corresponding surface form and indications of named entities that are associated with the surface form; a named entities database have a plurality of named entity records, each of the named entity records corresponding to one of the named entities, each of the named entity records having labels representative of the named entity and contextual information associated with the named entity; and a computer processor that identifies one of the surface forms in a text, the computer processor evaluating the identified one of the surface forms against the labels and the contextual information to determine which one of the named entities is most associated with the one of the surface forms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage medium comprising one or more databases, the one or more databases comprising a collection of reference surface forms and a collection of named entities, wherein each of the reference surface forms is associated with one or more of the named entities that are different from the associated surface form, and each of the named entities is associated with one or more entity indicators extracted from one or more reference works in which the one or more entity indicators are associated with the named entities;
- wherein the computer-readable medium further comprises computer-executable instructions that, when executed by a computing device having a processor, configure the computing device to disambiguate one or more polysemic words in a sample of text, comprising steps of;
identifying, with the processor, a plurality of overlapping polysemic words in the sample of text each having one of the reference surface forms; evaluating, with the processor a similarity measure between the entity indicators associated with the named entities associated with the reference surface forms, and entity indicators associated with each of the overlapping polysemic words in the sample of text; identifying, with the processor, one of the named entities as having a relatively high similarity measure between its associated entity indicators and the entity indicators associated with one of the overlapping polysemic words; and providing a disambiguation output, with the processor, indicating the one of the polysemic words to be associated with the identified named entity. - View Dependent Claims (18, 19, 20)
- wherein the computer-readable medium further comprises computer-executable instructions that, when executed by a computing device having a processor, configure the computing device to disambiguate one or more polysemic words in a sample of text, comprising steps of;
Specification