Finding and disambiguating references to entities on web pages

US 8,751,498 B2
Filed: 02/01/2012
Issued: 06/10/2014
Est. Priority Date: 10/20/2006
Status: Active Grant

First Claim

Patent Images

1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:

at a computer having one or more processors and memory storing programs for execution by the one or more processors;

identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model;

determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;

identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and

extracting one or more facts from the second set of documents and associating the extracted facts with the entity.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for disambiguating references to entities in a document. In one embodiment, an iterative process is used to disambiguate references to entities in documents. An initial model is used to identify documents referring to an entity based on features contained in those documents. The occurrence of various features in these documents is measured. From the number occurrences of features in these documents, a second model is constructed. The second model is used to identify documents referring to the entity based on features contained in the documents. The process can be repeated, iteratively identifying documents referring to the entity and improving subsequent models based on those identifications. Additional features of the entity can be extracted from documents identified as referring to the entity.

302 Citations

20 Claims

1. A method for identifying documents referring to an entity, the entity being associated with a first set of features, the method comprising:
- at a computer having one or more processors and memory storing programs for execution by the one or more processors;
  
  identifying a first set of documents based on a first model and the first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to the entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model;
  
  determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;
  
  identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and
  
  extracting one or more facts from the second set of documents and associating the extracted facts with the entity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the first set of features is stored as a set of facts in a fact repository in association with a second object that corresponds to the entity.
  - 3. The method of claim 1, wherein the first model is different than the second model.
  - 4. The method of claim 1, wherein determining the second model comprises determining a number of occurrences of the first set of features in the first set of documents.
  - 5. The method of claim 1, further comprising:
    - identifying a second set of features based on the second set of documents;
      
      determining if the second set of features are associated with the entity; and
      
      responsive to determining that the second set of features are associated with the entity, identifying a third set of documents based on a third model and the second set of features, each document of the third set of documents comprising a sufficient number of features in common with the second set of features to identify a document referring to the entity according to the third model.
  - 6. The method of claim 5, wherein the second set of features includes at least one feature not included in the first set of features.
  - 7. The method of claim 5, wherein the first set of features includes at least one feature not included in the second set of features.
  - 8. The method of claim 5, further comprising:
    - storing at least one feature of the second set of features as a fact in the fact repository.
  - 9. The method of claim 1, further comprising:
    - estimating importance of the entity based on the second set of documents.
  - 10. The method of claim 1, further comprising:
    - estimating importance of the entity based on a number of documents in the second set of documents.
  - 11. The method of claim 1, further comprising:
    - estimating importance of the entity based on an estimated importance of at least one of the documents in the second set of documents.
  - 12. The method of claim 1, further comprising:
    - associating at least one of the documents of the second set of documents with the entity.
  - 13. The method of claim 1, wherein identifying a second set of documents based on the second model and the first set of features comprises estimating a probability that a document of the second set of documents refers to the entity.
  - 14. The method of claim 1, wherein the first set of features comprises at least a first feature and a second feature, and wherein the second model specifies that an occurrence of the first feature is sufficient to identify a document referring to the entity.
  - 15. The method of claim 14, wherein the second model specifies that an occurrence of the second feature is not sufficient to identify a document referring to the entity.

16. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
- identifying a first set of documents based on a first model and a first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to an entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to an entity according to the first model;
  
  determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;
  
  identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and
  
  extracting one or more facts from the second set of documents and associating the extracted facts with the entity.
- View Dependent Claims (17, 18)
- - 17. The non-transitory computer readable storage medium of claim 16, wherein the first set of features is stored as a set of facts in the fact repository in association with a second object that corresponds to the entity.
  - 18. The non-transitory computer readable storage medium of claim 16, wherein the first model is different than the second model.

19. A computer system comprising:
- a processor;
  
  memory; and
  
  one or more programs, wherein the one or more programs comprising instructions for;
  
  identifying a first set of documents based on a first model and an first set of features, wherein the first model includes a first set of rules specifying at least one combination of features from the first set of features that are sufficient for identifying a document referring to an entity, and each document in the first set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the first model;
  
  determining a second model based on features included in one or more documents in the first set of documents, wherein the second model includes a second set of rules specifying at least one combination of features from the first set of documents that are sufficient for identifying a document referring to the entity;
  
  identifying a second set of documents based on the second model, wherein each document in the second set of documents includes a sufficient number of features in common with the first set of features to identify a document referring to the entity according to the second model, and wherein the second set of documents includes at least one document not included in the first set of documents; and
  
  extracting one or more facts from the second set of documents and associating the extracted facts with the entity.
- View Dependent Claims (20)
- - 20. The system of claim 19, wherein the first set of features is stored as a set of facts in the fact repository in association with a second object that corresponds to the entity.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Laroco, Leonardo A. Jr., Jevtic, Nikola, Yakovenko, Nikolai V., Reynar, Jeffrey
Primary Examiner(s)
HASAN, SYED HAROON

Application Number

US13/364,244
Publication Number

US 20120203777A1
Time in Patent Office

860 Days
Field of Search

None
US Class Current

707/737
CPC Class Codes

G06F 16/93   Document management systems

G06F 16/955   using information identifie...

G06N 20/00   Machine learning

G06N 5/04   Inference or reasoning models

Finding and disambiguating references to entities on web pages

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

302 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Finding and disambiguating references to entities on web pages

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

302 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links