Exploring large textual data sets via interactive aggregation

US 8,732,160 B2
Filed: 11/11/2008
Issued: 05/20/2014
Est. Priority Date: 11/11/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

receiving, via at least one computer, a data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages;

organizing, via the at least one computer, the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold;

building, via the at least one computer, an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality;

receiving, via the at least one computer, one or more bindings for the query template, the bindings comprising query restrictions;

computing, via the at least one computer, an answer to the query template by using the index and the bindings; and

precomputing, via the at least one computer, answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system are provided for exploring a large textual data set via interactive aggregation. In one example, the method includes receiving the large textual data set and an original query template, building an index for the query template, wherein the building the index comprises ordering the index a particular way to optimize query time, receiving one or more bindings for the original query template, computing an answer to the original query template using the index and the one or more bindings, and anticipating one or more future queries that a user may submit and that are related to the original query template.

6 Citations

View as Search Results

21 Claims

1. A method comprising:
- receiving, via at least one computer, a data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages;
  
  organizing, via the at least one computer, the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold;
  
  building, via the at least one computer, an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality;
  
  receiving, via the at least one computer, one or more bindings for the query template, the bindings comprising query restrictions;
  
  computing, via the at least one computer, an answer to the query template by using the index and the bindings; and
  
  precomputing, via the at least one computer, answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising computing one or more answers to the one or more future queries using the index and the one or more bindings.
  - 3. The method of claim 1, wherein the building the index further comprises semi-clustering the data set according to the query template.
  - 4. The method of claim 3, wherein the semi-clustering does not involve random ordering of the index, and wherein a time to answer a particular user query with the semi-clustering is substantially less than a time to answer the particular user query without the semi-clustering.
  - 5. The method of claim 1, further comprising at least one of:
    - precomputing a set of drill-down queries from the query template;
      
      determining that the set of drill-down queries is small; and
      
      computing answers to the drill-down queries using an independent lookup strategy.
  - 6. The method of claim 1, further comprising inserting forward bitmaps into the index.
  - 7. The method of claim 6, further comprising at least one of:
    - precomputing drill-down queries from the query template;
      
      determining that the set of drill-down queries is large; and
      
      computing answers to the drill-down queries simultaneously via a single index lookup.
  - 8. The method of claim 1, further comprising at least one of:
    - determining that the query template is a candidate for precomputing; and
      
      precomputing an answer to the query in an online fashion.
  - 9. The method of claim 8, wherein the query is a dense query, and wherein the precomputing occurs in a background in an interruptible process that terminates soon after a new query is received.
  - 10. The method of claim 1, wherein a selectivity of the future queries is above a threshold.

11. A system, comprising at least one processor to execute and memory to store instructions to:
- receive the data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages;
  
  organize the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold;
  
  build an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality;
  
  receive one or more bindings for the query template, the bindings comprising query restrictions;
  
  compute an answer to the query template by using the index and the bindings; and
  
  precompute answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, the instructions further comprising instructions to compute one or more answers to the one or more future queries using the index and the one or more bindings.
  - 13. The system of claim 11, the instructions further comprising instructions to semi-cluster the data set according to the query template.
  - 14. The system of claim 13, wherein the semi-clustering does not involve random ordering of the index, and wherein a time to answer a particular user query with the semi-clustering is substantially less than a time to answer the particular user query without the semi-clustering.
  - 15. The system of claim 11, the instructions further comprising instructions to perform for at least one of:
    - precompute a set of drill-down queries from the query template;
      
      determine that the set of drill-down queries is small; and
      
      compute answers to the drill-down queries using an independent lookup strategy.
  - 16. The system of claim 11, the instructions further comprising instructions to insert forward bitmaps into the index.
  - 17. The system of claim 16, the instructions further comprising instructions to perform at least one of:
    - precompute drill-down queries from the query template;
      
      determine that the set of drill-down queries is large; and
      
      compute answers to the drill-down queries simultaneously via a single index lookup.
  - 18. The system of claim 11, the instructions further comprising instructions to perform at least one of:
    - determine that the query template is a candidate for precomputing; and
      
      precompute an answer to the query in an online fashion.
  - 19. The system of claim 18, wherein the query is a dense query, and wherein the precomputing occurs in a background in an interruptible process that terminates soon after a new query is received.
  - 20. The system of claim 11, wherein a selectivity of the future queries is above a threshold.

21. A non-transitory computer readable medium carrying one or more instructions that when executed by one or more processors, cause the one or more processors to:
- receive the data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages;
  
  organize the query template into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold;
  
  build an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality;
  
  receive one or more bindings for the query template, the bindings comprising query restrictions;
  
  compute an answer to the query template by using the index and the one or more bindings; and
  
  precompute answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Olston, Christopher
Primary Examiner(s)
Pham, Hung Q
Assistant Examiner(s)
Spieler, William

Application Number

US12/268,956
Publication Number

US 20100121847A1
Time in Patent Office

2,016 Days
Field of Search

707/715, 707/741, 707/719
US Class Current

707/715
CPC Class Codes

G06F 16/3325 Reformulation based on resu...

Exploring large textual data sets via interactive aggregation

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

6 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Exploring large textual data sets via interactive aggregation

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others