PROFILING IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT
First Claim
Patent Images
1. A computer-implemented method of profiling a data set in a parallel processing environment, comprising:
- partitioning an initial data set vertically according to multiple attribute subsets;
profiling one or more of the attribute subsets;
generating a list of subjects or otherwise horizontal component values corresponding to a specific attribute value identified in the profiling;
extracting values of multiple attributes for each said identified subject or otherwise horizontal component values;
assembling sample results of said identified subjects or otherwise horizontal component values;
merging the sample results to form a profiled subset of the initial data set; and
transmitting, displaying or storing the profiled subset of the initial data set, a further processed version, or combinations thereof.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of profiling a data set in a parallel processing environment includes vertically partitioning an initial data set. One or more attribute subsets are then profiled. A list of subjects is generated each corresponding to a specific attribute value identified in the profiling. Values of multiple attributes are extracted for each identified subject, and the sample results are assembled and merged.
34 Citations
20 Claims
-
1. A computer-implemented method of profiling a data set in a parallel processing environment, comprising:
-
partitioning an initial data set vertically according to multiple attribute subsets; profiling one or more of the attribute subsets; generating a list of subjects or otherwise horizontal component values corresponding to a specific attribute value identified in the profiling; extracting values of multiple attributes for each said identified subject or otherwise horizontal component values; assembling sample results of said identified subjects or otherwise horizontal component values; merging the sample results to form a profiled subset of the initial data set; and transmitting, displaying or storing the profiled subset of the initial data set, a further processed version, or combinations thereof. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer network including a parallel processing environment, comprising:
-
a data source; multiple projection computers; and one or more client computers connected to the multiple projection computers and having computer-readable code embedded therein for programming the projection computers to perform a method of profiling a data set in the parallel processing environment, wherein the method comprises; partitioning an initial data set vertically or otherwise according to multiple attribute subsets; profiling one or more of the attribute subsets; generating a list of subjects or otherwise horizontal component values corresponding to a specific attribute value identified in the profiling; extracting values of multiple attributes for each said identified subject or otherwise horizontal component values; assembling sample results of said identified subjects or otherwise horizontal component values; and merging the sample results to form a profiled subset of the initial data set. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. One or more processor-readable media having embedded therein processor-readable code to program one or more processors to perform a method of profiling a data set in a parallel processing environment, wherein the method comprises:
-
partitioning an initial data set vertically or otherwise according to multiple attribute subsets; profiling one or more of the attribute subsets; generating a list of subjects or otherwise horizontal component values corresponding to a specific attribute value identified in the profiling; extracting values of multiple attributes for each said identified subject or otherwise horizontal component values; assembling sample results of said identified subjects or otherwise horizontal component values; and merging the sample results to form a profiled subset of the initial data set. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification