Data clustering, segmentation, and parallelization
First Claim
1. A method, including:
- processing a first set of original records by a first processing entity to generate a second set of records that includes the original records and one or more copies of each original record, each original record including one or more fields, and the processing of each of at least some of the original records includinggenerating at least one copy of the original record, andassociating a first segment value with the original record and associating a second segment value with the copy, where the first segment value corresponds to a first portion of one or more data values of respective fields of the original record and the second segment value correspond to a second portion of the one or more data values of the respective fields of the original record, and where the second portion is different from the first portion, andpartitioning the second set of records among a plurality of recipient processing entities based on the segment values associated with the records in the second set, and, at each recipient processing entity, performing an operation based on one or more data values of the records received at the recipient processing entity to generate results.
3 Assignments
0 Petitions
Accused Products
Abstract
A first set of original records is processed by a first processing entity to generate a second set of records that includes the original records and one or more copies of each original record, each original record including one or more fields. The processing of each of at least some of the original records includes: generating at least one copy of the original record, and associating a first segment value with the original record and associating a second segment value with the copy. The method also includes partitioning the second set of records among a plurality of recipient processing entities based on the segment values associated with the records in the second set, and, at each recipient processing entity, performing an operation based on one or more data values of the records received at the recipient processing entity to generate results.
-
Citations
40 Claims
-
1. A method, including:
-
processing a first set of original records by a first processing entity to generate a second set of records that includes the original records and one or more copies of each original record, each original record including one or more fields, and the processing of each of at least some of the original records including generating at least one copy of the original record, and associating a first segment value with the original record and associating a second segment value with the copy, where the first segment value corresponds to a first portion of one or more data values of respective fields of the original record and the second segment value correspond to a second portion of the one or more data values of the respective fields of the original record, and where the second portion is different from the first portion, and partitioning the second set of records among a plurality of recipient processing entities based on the segment values associated with the records in the second set, and, at each recipient processing entity, performing an operation based on one or more data values of the records received at the recipient processing entity to generate results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer program stored on a non-transitory computer-readable medium, the computer program including instructions for causing a computing system to:
-
process a first set of original records by a first processing entity to generate a second set of records that includes the original records and one or more copies of each original record, each original record including one or more fields, and the processing of each of at least some of the original records including generating at least one copy of the original record, and associating a first segment value with the original record and associating a second segment value with the copy, where the first segment value corresponds to a first portion of one or more data values of respective fields of the original record and the second segment value correspond to a second portion of the one or more data values of the respective fields of the original record, and where the second portion is different from the first portion, and partition the second set of records among a plurality of recipient processing entities based on the segment values associated with the records in the second set, and, at each recipient processing entity, perform an operation based on one or more data values of the records received at the recipient processing entity to generate results.
-
-
29. A computing system, including:
-
a first processing entity that includes at least one processor configured to process a first set of original records to generate a second set of records that includes the original records and one or more copies of each original record, each original record including one or more fields, and the processing of each of at least some of the original records including generating at least one copy of the original record, and associating a first segment value with the original record and associating a second segment value with the copy, where the first segment value corresponds to a first portion of one or more data values of respective fields of the original record and the second segment value correspond to a second portion of the one or more data values of the respective fields of the original record, and where the second portion is different from the first portion, and a plurality of recipient processing entities receiving respective subsets of the second set of records partitioned based on the segment values associated with the records in the second set, each recipient processing entity configured to perform an operation based on one or more data values of the records received at the recipient processing entity to generate results.
-
-
30. A method, including:
-
partitioning a set of records by a first processing entity into multiple subsets of records; and processing different subsets of the set of records by different respective recipient processing entities and storing results in data storage accessible to each of the recipient processing entities, the processing by each recipient processing entity including performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within local reference information maintained by the recipient processing entity, and performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within the data storage that was provided by any of the other recipient processing entities, and updating the data storage based on the local reference information maintained by the recipient processing entity. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A computer program stored on a non-transitory computer-readable medium, the computer program including instructions for causing a computing system to:
-
partition a set of records by a first processing entity into multiple subsets of records; and process different subsets of the set of records by different respective recipient processing entities and storing results in data storage accessible to each of the recipient processing entities, the processing by each recipient processing entity including performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within local reference information maintained by the recipient processing entity, and performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within the data storage that was provided by any of the other recipient processing entities, and updating the data storage based on the local reference information maintained by the recipient processing entity.
-
-
40. A computing system, including:
-
a first processing entity that includes at least one processor configured to partition a set of records into multiple subsets of records; and a plurality of recipient processing entities each configured to process a different respective subset of the set of records and store results in data storage accessible to each of the recipient processing entities, the processing by each recipient processing entity including performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within local reference information maintained by the recipient processing entity, and performing an operation on at least one record in the subset based on determining whether or not there is an approximate match between one or more values of one or more fields of the record and an entry within the data storage that was provided by any of the other recipient processing entities, and updating the data storage based on the local reference information maintained by the recipient processing entity.
-
Specification