System and method for analyzing data records
First Claim
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising:
- partitioning the plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes;
executing the first plurality of processes in parallel, wherein for each group the assigned process;
extracts information from the data records in the group;
applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values;
stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; and
updates a status of the group to indicate completion;
determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records;
when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes;
when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups.
2 Assignments
0 Petitions
Accused Products
Abstract
A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
-
Citations
19 Claims
-
1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising:
-
partitioning the plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes; executing the first plurality of processes in parallel, wherein for each group the assigned process; extracts information from the data records in the group; applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values; stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; and updates a status of the group to indicate completion; determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records; when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes; when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for processing a plurality of data records, comprising:
-
one or more processors; and memory storing one or more programs to be executed by the one or more processors; the one or more programs comprising instructions for; partitioning the plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes; executing the first plurality of processes in parallel, wherein for each group the assigned process; extracts information from the data records in the group; applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values; stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; and updates a status of the group to indicate completion; determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records; when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes; when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
partitioning a plurality of data records into groups and assigning each group of data records to a respective process of a first plurality of processes; executing the first plurality of processes in parallel, wherein for each group the assigned process; extracts information from the data records in the group; applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values; stores the one or more intermediate values in a respective intermediate data structure in a plurality of intermediate data structures; and updates a status of the group to indicate completion; determining whether at least a predefined threshold percentage of the plurality of data records are completed based on the status updates provided by the first plurality of processes, wherein the predefined threshold percentage is a predetermined value that is less than all the first plurality of data records; when it is determined that the predefined threshold percentage of the plurality of data records are completed, assigning each group of data records that is not completed to a respective second process of the first plurality of processes; when it is determined that each of the groups in the plurality of groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group in the plurality of groups. - View Dependent Claims (16, 17, 18, 19)
-
Specification