Finding and disambiguating references to entities on web pages
First Claim
1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:
- at a computer having one or more processors and memory storing programs for execution by the one or more processors;
identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model;
determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;
identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and
extracting one or more facts from the second set of documents and associating the extracted facts with the entity.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
302 Citations
20 Claims
-
1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:
at a computer having one or more processors and memory storing programs for execution by the one or more processors; identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model; determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity; identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and extracting one or more facts from the second set of documents and associating the extracted facts with the entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
identifying a first set of documents based on a first model and a first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to an entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to an entity according to the first model; determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity; identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and extracting one or more facts from the second set of documents and associating the extracted facts with the entity. - View Dependent Claims (17, 18)
-
-
19. A computer system comprising:
-
a processor; memory; and one or more programs, wherein the one or more programs comprising instructions for; identifying a first set of documents based on a first model and an first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to an entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model; determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity; identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and extracting one or more facts from the second set of documents and associating the extracted facts with the entity. - View Dependent Claims (20)
-
Specification