Finding and disambiguating references to entities on web pages
First Claim
1. A method for identifying texts referring to an entity, the method comprising:
- at a computer having one or more processors and memory storing programs for execution by the one or more processors;
storing an object representing the entity;
storing a plurality of facts, wherein at least one of the plurality of facts is associated with the object;
determining a first set of features from the stored plurality of facts that are associated with the object, wherein the first set of features are sufficient for identifying a document referring to the entity;
determining a second set of features from the stored plurality of facts that are associated with the object, whereinthe second set of features are sufficient for identifying a document referring to the entity, andthe second set of features are distinct from the first set of features;
identifying a first text from one of the stored plurality of facts associated with the first set of features;
identifying a second text from one of the stored plurality of facts associated with the second set of features;
identifying a representative document as associated with the entity, wherein the first text and the second text are within the representative document; and
associating a fact selected from the representative document with the object.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
327 Citations
23 Claims
-
1. A method for identifying texts referring to an entity, the method comprising:
at a computer having one or more processors and memory storing programs for execution by the one or more processors; storing an object representing the entity; storing a plurality of facts, wherein at least one of the plurality of facts is associated with the object; determining a first set of features from the stored plurality of facts that are associated with the object, wherein the first set of features are sufficient for identifying a document referring to the entity; determining a second set of features from the stored plurality of facts that are associated with the object, wherein the second set of features are sufficient for identifying a document referring to the entity, and the second set of features are distinct from the first set of features; identifying a first text from one of the stored plurality of facts associated with the first set of features; identifying a second text from one of the stored plurality of facts associated with the second set of features; identifying a representative document as associated with the entity, wherein the first text and the second text are within the representative document; and associating a fact selected from the representative document with the object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
17. A system for identifying texts referring to an entity, the system comprising one or more instructions for:
-
storing an object representing the entity; storing a plurality of facts, wherein at least one of the plurality of facts is associated with the object; determining a first set of features from the stored plurality of facts that are associated with the object, wherein the first set of features are sufficient for identifying a document referring to the entity; determining a second set of features from the stored plurality of facts that are associated with the object, wherein the second set of features are sufficient for identifying a document referring to the entity, and the second set of features are distinct from the first set of features; identifying a first text from one of the stored plurality of facts associated with the first set of features; identifying a second text from one of the stored plurality of facts associated with the second set of features; identifying a representative document as associated with the entity, wherein the first text and the second text are within the representative document; and associating a fact selected from the representative document with the object. - View Dependent Claims (18, 19, 20)
-
-
21. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
storing an object representing the entity; storing a plurality of facts, wherein at least one of the plurality of facts is associated with the object; determining a first set of features from the stored plurality of facts that are associated with the object, wherein the first set of features are sufficient for identifying a document referring to the entity; determining a second set of features from the stored plurality of facts that are associated with the object, wherein the second set of features are sufficient for identifying a document referring to the entity, and the second set of features are distinct from the first set of features; identifying a first text from one of the stored plurality of facts associated with the first set of features; identifying a second text from one of the stored plurality of facts associated with the second set of features; identifying a representative document as associated with the entity, wherein the first text and the second text are within the representative document; and associating a fact selected from the representative document with the object. - View Dependent Claims (22, 23)
-
Specification