Method and Apparatus for Accelerated Data Quality Checking
First Claim
Patent Images
1. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the method comprising:
- processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and
generating a plurality of rule condition check results for the records based on the processing step.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a method and apparatus for hardware-accelerating various data quality checking operations. Incoming data streams can be processed with respect to a plurality of data quality check operations using offload engines (e.g., reconfigurable logic such as field programmable gate arrays (FPGAs)). Accelerated data quality checking can be highly advantageous for use in connection with Extract, Transfer, and Load (ETL) systems.
-
Citations
54 Claims
-
1. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the method comprising:
-
processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and generating a plurality of rule condition check results for the records based on the processing step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. An apparatus for integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the apparatus comprising:
a coprocessor configured to (1) process the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions, and (2) generate a plurality of rule condition check results for the records based on the processing operation. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
49. A method of preprocessing a data stream for an extract, transfer, and load (ETL) operation with respect to a data store, the method comprising:
-
receiving streaming data at a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a plurality of parallel processing paths; the parallel processing paths of the pipeline performing different data quality check operations on the streaming data in parallel to generate data indicative of whether the streaming data satisfies predetermined criteria; and downstream from the pipeline, loading the streaming data into a data store. - View Dependent Claims (50, 51)
-
-
52. An apparatus for preprocessing a data stream for an extract, transfer, and load (ETL) operation with respect to a data store, the apparatus comprising:
-
a processor; a reconfigurable logic device; and a data store; wherein the reconfigurable logic device is configured to receive streaming data, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a plurality of parallel processing paths; wherein the parallel processing paths of the pipeline are configured to perform different data quality check operations on the streaming data in parallel to generate data indicative of whether the streaming data satisfies predetermined criteria; and wherein the processor is configured to, downstream from the pipeline, load the streaming data into the data store. - View Dependent Claims (53, 54)
-
Specification