Managing data profiling operations related to data type
First Claim
1. A method for processing data in a computing system, the method including:
- receiving, over an input device or port of the computing system, a plurality of records that each have one or more values for respective fields of a plurality of fields;
storing, in a storage medium of the computing system, data type information that associates each of one or more data types with at least one identifier; and
processing, using at least one processor of the computing system, a plurality of data values from the records, the processing including;
generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier;
aggregating information about binary values from a plurality of the data units;
generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units;
retrieving a data type associated with a first identifier from the data type information, for type-dependent processing, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and
generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units.
3 Assignments
0 Petitions
Accused Products
Abstract
Processing data in a computing system includes receiving a plurality of records that each have one or more values for respective fields of a plurality of fields. Data type information associates each of one or more data types with at least one identifier. Processing a plurality of data values from the records includes: generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units.
-
Citations
55 Claims
-
1. A method for processing data in a computing system, the method including:
-
receiving, over an input device or port of the computing system, a plurality of records that each have one or more values for respective fields of a plurality of fields; storing, in a storage medium of the computing system, data type information that associates each of one or more data types with at least one identifier; and processing, using at least one processor of the computing system, a plurality of data values from the records, the processing including; generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, for type-dependent processing, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. Software stored in a non-transitory form on a computer-readable medium, the software including instructions for causing a computing system to:
-
receive, over an input device or port of the computing system, a plurality of records that each have one or more values for respective fields of a plurality of fields; store, in a storage medium of the computing system, data type information that associates each of one or more data types with at least one identifier; and process, using at least one processor of the computing system, a plurality of data values from the records, the processing including; generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, for type-dependent processing, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 54, 55)
-
-
35. A computing system including:
-
an input device or port of the computing system configured to receive a plurality of records that each have one or more values for respective fields of a plurality of fields; a storage medium of the computing system configured to store data type information that associates each of one or more data types with at least one identifier; and at least one processor of the computing system configured to process a plurality of data values from the records, the processing including; generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, for type-dependent processing, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
-
53. A computing system including:
-
means for receiving a plurality of records that each have one or more values for respective fields of a plurality of fields; means for storing data type information that associates each of one or more data types with at least one identifier; and means for processing a plurality of data values from the records, the processing including; generating a plurality of data units from the records, each data unit including a field identifier that uniquely identifies one of the fields and a binary value from one of the records, the binary value extracted from the field of that record identified by the field identifier; aggregating information about binary values from a plurality of the data units; generating a list of entries for each of one or more of the fields, at least some of the entries each including one of the binary values and information about that binary value aggregated from a plurality of the data units; retrieving a data type associated with a first identifier from the data type information, for type-dependent processing, and associating the retrieved data type with at least one binary value included in an entry of one of the lists; and generating profile information for at least one of the fields based at least in part on a retrieved data type of a particular binary value appearing in the field, after aggregating information about binary values from a plurality of the data units.
-
Specification