FINDING AND DISAMBIGUATING REFERENCES TO ENTITIES ON WEB PAGES
First Claim
1. A method for identifying texts referring to an entity, the entity being associated with a first set of features, the method comprising:
- at a computer having one or more processors and memory storing programs for execution by the one or more processors;
identifying a first set of text as associated with the entity in accordance with a first set of features-that are sufficient for identifying a document referring to the entity;
identifying a second set of text as associated with the entity in accordance with a second set of features-that are sufficient for identifying a document referring to the entity, wherein the second set of feature is distinct from the first set of features;
identifying a representative feature associated with the entity, in accordance with the first set of features and the second set of features;
wherein the first set of text and the second set of text are within a same document.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
7 Citations
20 Claims
-
1. A method for identifying texts referring to an entity, the entity being associated with a first set of features, the method comprising:
-
at a computer having one or more processors and memory storing programs for execution by the one or more processors; identifying a first set of text as associated with the entity in accordance with a first set of features-that are sufficient for identifying a document referring to the entity; identifying a second set of text as associated with the entity in accordance with a second set of features-that are sufficient for identifying a document referring to the entity, wherein the second set of feature is distinct from the first set of features; identifying a representative feature associated with the entity, in accordance with the first set of features and the second set of features; wherein the first set of text and the second set of text are within a same document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for identifying texts referring to an entity, the entity being associated with a first set of features, the system comprising one or more instructions for:
-
identifying a first set of text as associated with the entity in accordance with a first set of features-that are sufficient for identifying a document referring to the entity; identifying a second set of text as associated with the entity in accordance with a second set of features-that are sufficient for identifying a document referring to the entity, wherein the second set of feature is distinct from the first set of features; identifying a representative feature associated with the entity, in accordance with the first set of features and the second set of features; wherein the first set of text and the second set of text are within a same document. - View Dependent Claims (17, 18)
-
-
19. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
identifying a first set of text as associated with the entity in accordance with a first set of features-that are sufficient for identifying a document referring to the entity; identifying a second set of text as associated with the entity in accordance with a second set of features-that are sufficient for identifying a document referring to the entity, wherein the second set of feature is distinct from the first set of features; identifying a representative feature associated with the entity, in accordance with the first set of features and the second set of features; wherein the first set of text and the second set of text are within a same document. - View Dependent Claims (20)
-
Specification