System and method for analyzing data records
First Claim
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising:
- allocating subgroups of the plurality of data records to respective processes of a first plurality of processes;
after the allocating, executing in parallel, in each respective process of the first plurality of processes, application-specific and application-independent operations comprising;
for at least one data record in at least a subset of the subgroups of data records allocated to the respective process;
extracting information from the at least one data record, by using one or more application-specific data processing operators provided by an application programmer;
applying a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more values, wherein at least one step in the multi-step script includes selecting a respective application-independent emit operator on an application-specific basis and applying the respective application-independent emit operator to the information extracted from the at least one data record; and
storing the one or more values in one or more intermediate data structures in a plurality of intermediate data structures; and
in each process of a second plurality of processes, aggregating values from a subset of the plurality of intermediate data structures to produce output data.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
62 Citations
18 Claims
-
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising:
-
allocating subgroups of the plurality of data records to respective processes of a first plurality of processes; after the allocating, executing in parallel, in each respective process of the first plurality of processes, application-specific and application-independent operations comprising; for at least one data record in at least a subset of the subgroups of data records allocated to the respective process; extracting information from the at least one data record, by using one or more application-specific data processing operators provided by an application programmer; applying a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more values, wherein at least one step in the multi-step script includes selecting a respective application-independent emit operator on an application-specific basis and applying the respective application-independent emit operator to the information extracted from the at least one data record; and storing the one or more values in one or more intermediate data structures in a plurality of intermediate data structures; and in each process of a second plurality of processes, aggregating values from a subset of the plurality of intermediate data structures to produce output data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for processing data records, comprising:
-
one or more processors; and memory storing one or more programs to be executed by the one or more processors; the one or more programs comprising instructions for; allocating subgroups of the plurality of data records to respective processes of a first plurality of processes; after the allocating, executing in parallel, in each respective process of the first plurality of processes, application-specific and application-independent operations comprising; for at least one data record in at least a subset of the subgroups of data records allocated to the respective process; extracting information from the at least one data record, by using one or more application-specific data processing operators provided by an application programmer; applying a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more values, wherein at least one step in the multi-step script includes selecting a respective application-independent emit operator on an application-specific basis and applying the respective application-independent emit operator to the information extracted from the at least one data record; and storing the one or more values in one or more intermediate data structures in a plurality of intermediate data structures; and in each process of a second plurality of processes, aggregating values from a subset of the plurality of intermediate data structures to produce output data. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
allocating subgroups of the plurality of data records to respective processes of a first plurality of processes; after the allocating, executing in parallel, in each respective process of the first plurality of processes, application-specific and application-independent operations comprising; for at least one data record in at least a subset of the subgroups of data records allocated to the respective process; extracting information from the at least one data record, by using one or more application-specific data processing operators provided by an application programmer; applying a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more values, wherein at least one step in the multi-step script includes selecting a respective application-independent emit operator on an application-specific basis and applying the respective application-independent emit operator to the information extracted from the at least one data record; and storing the one or more values in one or more intermediate data structures in a plurality of intermediate data structures; and in each process of a second plurality of processes, aggregating values from a subset of the plurality of intermediate data structures to produce output data. - View Dependent Claims (15, 16, 17, 18)
-
Specification