Finding and disambiguating references to entities on web pages
First Claim
1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:
- identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, each document of the first set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model;
determining a second model based on the features of the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;
identifying a second set of documents based on the second model and the first set of features, each document of the second set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model;
identifying a second set of features based on the second set of documents;
determining if the second set of features are associated with the entity; and
responsive to determining that the second set of features are associated with the entity, identifying a third set of documents based on a third model and the second set of features, the third set of documents each comprising a sufficient number of features in common with the second set of features to identify a document referring to the entity according to the third model.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.
348 Citations
28 Claims
-
1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:
-
identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, each document of the first set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model; determining a second model based on the features of the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity; identifying a second set of documents based on the second model and the first set of features, each document of the second set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model; identifying a second set of features based on the second set of documents; determining if the second set of features are associated with the entity; and responsive to determining that the second set of features are associated with the entity, identifying a third set of documents based on a third model and the second set of features, the third set of documents each comprising a sufficient number of features in common with the second set of features to identify a document referring to the entity according to the third model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions to:
-
identify a first set of documents based on a first model and a first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to an entity, each document of the first set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model; determine a second model based on the features of the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity; identify a second set of documents based on the second model and the first set of features, each document of the second set of documents comprising a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model; identify a second set of features based on the second set of documents; determine if the second set of features are associated with the entity; and responsive to determining that the second set of features are associated with the entity, identifying a third set of documents based on a third model and the second set of features, each of the third set of documents comprising a sufficient number of features in common with the second set of features to identify a document referring to the entity according to the third model. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification