Exploring large textual data sets via interactive aggregation
First Claim
Patent Images
1. A method comprising:
- receiving, via at least one computer, a data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages;
organizing, via the at least one computer, the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold;
building, via the at least one computer, an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality;
receiving, via the at least one computer, one or more bindings for the query template, the bindings comprising query restrictions;
computing, via the at least one computer, an answer to the query template by using the index and the bindings; and
precomputing, via the at least one computer, answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and a system are provided for exploring a large textual data set via interactive aggregation. In one example, the method includes receiving the large textual data set and an original query template, building an index for the query template, wherein the building the index comprises ordering the index a particular way to optimize query time, receiving one or more bindings for the original query template, computing an answer to the original query template using the index and the one or more bindings, and anticipating one or more future queries that a user may submit and that are related to the original query template.
6 Citations
21 Claims
-
1. A method comprising:
-
receiving, via at least one computer, a data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages; organizing, via the at least one computer, the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold; building, via the at least one computer, an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality; receiving, via the at least one computer, one or more bindings for the query template, the bindings comprising query restrictions; computing, via the at least one computer, an answer to the query template by using the index and the bindings; and precomputing, via the at least one computer, answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising at least one processor to execute and memory to store instructions to:
-
receive the data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages; organize the query template based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold; build an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality; receive one or more bindings for the query template, the bindings comprising query restrictions; compute an answer to the query template by using the index and the bindings; and precompute answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer readable medium carrying one or more instructions that when executed by one or more processors, cause the one or more processors to:
-
receive the data set and a query template, the data set comprising a plurality of attributes for a plurality of web pages; organize the query template into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the organizing comprising identifying a plurality of dense attributes from the plurality of attributes, each dense attribute having a selectivity exceeding a maximum query selectivity threshold; build an index for the query template after the query template is organized, the index is organized into a plurality of sections based on a number of seeks of the data set needed to fetch data associated with the query template, the plurality of sections comprising a plurality of primary sections and a plurality of secondary sections, each dense attribute of the plurality having a corresponding primary section and each secondary section having a corresponding dense attribute that is denser than one or more other dense attributes of the plurality; receive one or more bindings for the query template, the bindings comprising query restrictions; compute an answer to the query template by using the index and the one or more bindings; and precompute answers for one or more future queries that a user may submit to explore the data set, wherein the future queries comprise query terms of the query template and at least one additional query term.
-
Specification