System and method for analyzing data records
First Claim
Patent Images
1. A computer-implemented method of analyzing data records, comprising:
- storing the data records in one or more data centers;
allocating groups of the stored data records to respective processes of a first plurality of processes executing in parallel;
after allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in parallel, in each respective process of the first plurality of processes;
for each data record in at least a subset of the group of the stored data records allocated to the respective process;
creating a parsed representation of the data record;
applying a procedural language query to the parsed representation of the data record to extract one or more values, wherein the procedural language query is applied independently to each parsed representation; and
applying a respective emit operator to at least one of the extracted one or more values to add corresponding information to a respective intermediate data structure, wherein the respective emit operator implements one of a predefined set of application-independent statistical information processing functions;
in each process of a second plurality of processes, aggregating information from a subset of the intermediate data structures to produce aggregated data; and
combining the produced aggregated data to produce output data.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
173 Citations
30 Claims
-
1. A computer-implemented method of analyzing data records, comprising:
-
storing the data records in one or more data centers; allocating groups of the stored data records to respective processes of a first plurality of processes executing in parallel; after allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in parallel, in each respective process of the first plurality of processes; for each data record in at least a subset of the group of the stored data records allocated to the respective process; creating a parsed representation of the data record; applying a procedural language query to the parsed representation of the data record to extract one or more values, wherein the procedural language query is applied independently to each parsed representation; and applying a respective emit operator to at least one of the extracted one or more values to add corresponding information to a respective intermediate data structure, wherein the respective emit operator implements one of a predefined set of application-independent statistical information processing functions; in each process of a second plurality of processes, aggregating information from a subset of the intermediate data structures to produce aggregated data; and combining the produced aggregated data to produce output data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of analyzing data records, comprising:
-
storing the data records in one or more data centers; allocating groups of the stored data records to respective processes of a first plurality of processes executing in parallel; after allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in parallel, in each respective process of the first plurality of processes; for each data record in at least a subset of the group of stored data records allocated to the respective process; creating a parsed representation of the data record; applying a procedural language query to the parsed representation of the data record to extract one or more values; and applying a respective operator to at least one of the extracted one or more values to add corresponding information to a respective intermediate data structure; in each process of a second plurality of processes, aggregating information from a subset of the intermediate data structures to produce aggregated data; and combining the produced aggregated data to produce output data.
-
-
14. A computer system with one or more processors and memory for analyzing data records, wherein the data records are stored in one or more data centers, the computer system comprising:
-
a first plurality of processes operating in parallel, each of which is allocated a group of stored data records to process; each respective process of the first plurality of processes including instructions for; creating a parsed representation of each data record in at least a subset of the group of stored data records allocated to the respective process after the group of stored data records is allocated to the respective process; applying a procedural language query to the parsed representation of each stored data record in at least the subset of the group of stored data records allocated to the respective process to produce one or more values; and applying one or more emit operators to each of the one or more produced values to add corresponding information to an intermediate data structure; and at least one aggregating process for aggregating information from a plurality of the intermediate data structures to produce output data. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification