Providing an explanation of a missing fact estimate

US 9,659,056 B1
Filed: 12/30/2013
Issued: 05/23/2017
Est. Priority Date: 12/30/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, using at least one processor, that information for an entity is absent from a data graph;

determining, using the at least one processor, an estimate for the information based on a plurality of features from a joint distribution model related to the information;

adding the estimate to the data graph so that the estimate is linked to the entity via a relationship indicating that the estimate is not verified;

selecting a subset of the plurality of features, wherein the subset is a first subset and selecting the subset includes;

determining a contribution value for each of the plurality of features;

determining that a second subset of the features are related based on clustering the entity for which the information is missing with other entities having a similar feature;

aggregating the features in the second subset to generate an aggregate feature;

aggregating the contribution values for the features in the second subset to generate a new contribution score; and

selecting the second subset as the first subset;

for each feature in the subset of the plurality of features;

adding the feature in the data graph, and linking the feature to the estimate;

receiving, using the at least one processor, a query that requests the information for the entity;

generating an explanation based on the subset of features linked to the estimate in the data graph, wherein the explanation and the estimate are based on the aggregate feature and the new contribution score; and

providing the explanation and the estimate as part of a search result for the query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are disclosed for providing an explanation of an estimate for information missing from a data graph. An example method may include receiving a query that requests information for a first entity and receiving an estimate for the information, the estimate being based on a plurality of features of a joint distribution model. The method may include determining respective contribution scores for the plurality of features, selecting a quantity of the features with highest contribution scores, generating, using the selected quantity of features, an explanation for the estimate; and providing the explanation and the estimate as part of a search result for the query.

Citations

18 Claims

1. A method comprising:
- determining, using at least one processor, that information for an entity is absent from a data graph;
  
  determining, using the at least one processor, an estimate for the information based on a plurality of features from a joint distribution model related to the information;
  
  adding the estimate to the data graph so that the estimate is linked to the entity via a relationship indicating that the estimate is not verified;
  
  selecting a subset of the plurality of features, wherein the subset is a first subset and selecting the subset includes;
  
  determining a contribution value for each of the plurality of features;
  
  determining that a second subset of the features are related based on clustering the entity for which the information is missing with other entities having a similar feature;
  
  aggregating the features in the second subset to generate an aggregate feature;
  
  aggregating the contribution values for the features in the second subset to generate a new contribution score; and
  
  selecting the second subset as the first subset;
  
  for each feature in the subset of the plurality of features;
  
  adding the feature in the data graph, and linking the feature to the estimate;
  
  receiving, using the at least one processor, a query that requests the information for the entity;
  
  generating an explanation based on the subset of features linked to the estimate in the data graph, wherein the explanation and the estimate are based on the aggregate feature and the new contribution score; and
  
  providing the explanation and the estimate as part of a search result for the query.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein selecting the subset includes:
    - selecting a feature with a highest contribution value when the highest contribution value meets a threshold; and
      
      selecting a quantity of features with highest contribution values when a combination of the contribution values meets the threshold.
  - 3. The method of claim 2, wherein determining the contribution value for a particular feature includes:
    - determining a first estimate for the information using the particular feature;
      
      determining a second estimate for the information without using the particular feature; and
      
      determining a difference between the first estimate and the second estimate.
  - 4. The method of claim 2, wherein the contribution value for a particular feature is related to a statistical descriptor associated with the particular feature.
  - 5. The method of claim 1, wherein the query is received prior to determining that the information is absent, and the determining is performed in response to receiving the query.
  - 6. The method of claim 1, further comprising:
    - analyzing search records to determine that the information has previously been requested for other entities; and
      
      determining whether the information is absent for the entity in response to determining that the information has previously been requested.

7. A computer program product tangibly embodied in a non-transitory storage medium, the computer program product including instructions that when executed cause a processor to perform operations including:
- determining, using at least one processor, that information for an entity is absent from a data graph;
  
  determining, using the at least one processor, an estimate for the information based on a plurality of features from a joint distribution model related to the information;
  
  adding the estimate to the data graph so that the estimate is linked to the entity via a relationship indicating that the estimate is not verified;
  
  selecting a subset of the plurality of features, wherein the subset is a first subset and selecting the subset includes;
  
  determining a contribution value for each of the plurality of features;
  
  determining that a second subset of the features are related based on clustering the entity for which the information is missing with other entities having a similar feature;
  
  aggregating the features in the second subset to generate an aggregate feature;
  
  aggregating the contribution values for the features in the second subset to generate a new contribution score; and
  
  selecting the second subset as the first subset;
  
  for each feature in the subset of the plurality of features;
  
  adding the feature in the data graph, and linking the feature to the estimate;
  
  receiving, using the at least one processor, a query that requests the information for the entity;
  
  generating an explanation based on the subset of features linked to the estimate in the data graph, wherein the explanation and the estimate are based on the aggregate feature and the new contribution score; and
  
  providing the explanation and the estimate as part of a search result for the query.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer program product of claim 7, wherein selecting the subset includes:
    - selecting a feature with a highest contribution value when the highest contribution value meets a threshold; and
      
      selecting a quantity of features with highest contribution values when a combination of the contribution values meets the threshold.
  - 9. The computer program product of claim 8, wherein determining the contribution value for a particular feature includes:
    - determining a first estimate for the information using the particular feature;
      
      determining a second estimate for the information without using the particular feature; and
      
      determining a difference between the first estimate and the second estimate.
  - 10. The computer program product of claim 8, wherein the contribution value for a particular feature is related to a statistical descriptor associated with the particular feature.
  - 11. The computer program product of claim 7, wherein the query is received prior to determining that the information is absent, and the determining is performed in response to receiving the query.
  - 12. The computer program product of claim 7, further comprising:
    - analyzing search records to determine that the information has previously been requested for other entities; and
      
      determining whether the information is absent for the entity in response to determining that the information has previously been requested.

13. A system comprising:
- a processor; and
  
  a computer program product tangibly embodied in a non-transitory storage medium, the computer program product including instructions that when executed cause the processor to perform operations including;
  
  determining, using at least one processor, that information for an entity is absent from a data graph;
  
  determining, using the at least one processor, an estimate for the information based on a plurality of features from a joint distribution model related to the information;
  
  adding the estimate to the data graph so that the estimate is linked to the entity via a relationship indicating that the estimate is not verified;
  
  selecting a subset of the plurality of features, wherein the subset is a first subset and selecting the subset includes;
  
  determining a contribution value for each of the plurality of features;
  
  determining that a second subset of the features are related based on clustering the entity for which the information is missing with other entities having a similar feature;
  
  aggregating the features in the second subset to generate an aggregate feature;
  
  aggregating the contribution values for the features in the second subset to generate a new contribution score; and
  
  selecting the second subset as the first subset;
  
  for each feature in the subset of the plurality of features;
  
  adding the feature in the data graph, and linking the feature to the estimate;
  
  receiving, using the at least one processor, a query that requests the information for the entity;
  
  generating an explanation based on the subset of features linked to the estimate in the data graph, wherein the explanation and the estimate are based on the aggregate feature and the new contribution score; and
  
  providing the explanation and the estimate as part of a search result for the query.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The system of claim 13, wherein selecting the subset includes:
    - selecting a feature with a highest contribution value when the highest contribution value meets a threshold; and
      
      selecting a quantity of features with highest contribution values when a combination of the contribution values meets the threshold.
  - 15. The system of claim 14, wherein determining the contribution value for a particular feature includes:
    - determining a first estimate for the information using the particular feature;
      
      determining a second estimate for the information without using the particular feature; and
      
      determining a difference between the first estimate and the second estimate.
  - 16. The system of claim 14, wherein the contribution value for a particular feature is related to a statistical descriptor associated with the particular feature.
  - 17. The system of claim 13, wherein the query is received prior to determining that the information is absent, and the determining is performed in response to receiving the query.
  - 18. The system of claim 13, further comprising:
    - analyzing search records to determine that the information has previously been requested for other entities; and
      
      determining whether the information is absent for the entity in response to determining that the information has previously been requested.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Leviathan, Yaniv, Manor, Ran El, Tzur, Yoav, Segalis, Eyal, Farkash, Efrat, Matias, Yossi, Chechik, Gal
Primary Examiner(s)
Andersen, Kris

Application Number

US14/143,904
Time in Patent Office

1,240 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/245   Query processing

G06F 16/2462   Approximate or statistical ...

G06F 16/248   Presentation of query results

Providing an explanation of a missing fact estimate

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Providing an explanation of a missing fact estimate

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links