Method and apparatus for accelerated data quality checking
First Claim
Patent Images
1. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the method comprising:
- processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and
generating a plurality of rule condition check results for the records based on the processing step;
wherein the computer system comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a range check module, wherein the processing step comprises the range check module determining whether the data in a data field of interest in a record falls within a defined range of data values, and wherein the generating step comprises the range check module generating a rule condition check result indicative of whether the data in the data field of interest falls within the defined range in response to the determining step.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a method and apparatus for hardware-accelerating various data quality checking operations. Incoming data streams can be processed with respect to a plurality of data quality check operations using offload engines (e.g., reconfigurable logic such as field programmable gate arrays (FPGAs)). Accelerated data quality checking can be highly advantageous for use in connection with Extract, Transfer, and Load (ETL) systems.
339 Citations
43 Claims
-
1. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the method comprising:
-
processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and generating a plurality of rule condition check results for the records based on the processing step; wherein the computer system comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a range check module, wherein the processing step comprises the range check module determining whether the data in a data field of interest in a record falls within a defined range of data values, and wherein the generating step comprises the range check module generating a rule condition check result indicative of whether the data in the data field of interest falls within the defined range in response to the determining step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, wherein the data in a plurality of the data fields of the records are expressed by a plurality of characters, the method comprising:
-
processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and generating a plurality of rule condition check results for the records based on the processing step; wherein the computer system comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a character check module, wherein the processing step comprises the character check module determining whether any of the characters in a data field of interest in a record are not members of a defined character set, and wherein the generating step comprises the character check module generating a rule condition check result indicative of whether the characters in the data field of interest are members of the defined character set in response to the determining step. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the method comprising:
-
processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and generating a plurality of rule condition check results for the records based on the processing step; wherein the computer system comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising an exact matching module, wherein the processing step comprises the exact matching module determining whether the data in a data field of interest in a record is a member of a defined value set, and wherein the generating step comprises the exact matching module generating a rule condition check result indicative of whether the data in the data field of interest is a member of the defined value set in response to the determining step. - View Dependent Claims (20, 21, 22)
-
-
23. A method of integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, wherein the data in a plurality of the data fields of the records are expressed by a plurality of characters, the method comprising:
-
processing the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions; and generating a plurality of rule condition check results for the records based on the processing step; wherein the computer system comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a plurality of parallel processing paths, and wherein the processing step comprises the parallel processing paths performing different data quality check operations on the data stream in parallel, and wherein the parallel processing paths comprise a first processing path and a second processing path, and wherein the processing step further comprises; the first processing path (1) filtering the data stream to identify data fields for records in the data stream that have a range constraint, and (2) performing range check operations on the data in the identified data fields having the range constraint to generate data indicative of whether the identified data fields having the range constraint comply with the range constraint; and the second processing path (1) filtering the data stream to identify data fields for records in the data stream that have a character set constraint, and (2) performing character check operations on the characters in the identified data fields having the character set constraint to generate data indicative of whether the identified data fields having the character set constraint comply with the character set constraint; and wherein the first processing path and the second processing path perform their respective operations in parallel at hardware processing speeds. - View Dependent Claims (24, 25, 26)
-
-
27. An apparatus for integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the apparatus comprising:
-
a coprocessor configured to (1) process the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions, and (2) generate a plurality of rule condition check results for the records based on the processing operation; wherein the coprocessor comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a range check module, wherein the range check module is configured to (1) determine whether the data in a data field of interest in a record falls within a defined range of data values, and (2) generate a rule condition check result indicative of whether the data in the data field of interest falls within the defined range in response to the determination. - View Dependent Claims (28, 29, 30)
-
-
31. An apparatus for integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, wherein the data in a plurality of the data fields of the records are expressed by a plurality of characters, the apparatus comprising:
-
a coprocessor configured to (1) process the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions, and (2) generate a plurality of rule condition check results for the records based on the processing operation; wherein the coprocessor comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a character check module, wherein the character check module is configured to (1) determine whether any of the characters in a data field of interest in a record are not members of a defined character set, and (2) generate a rule condition check result indicative of whether the characters in the data field of interest are members of the defined character set in response to the determination. - View Dependent Claims (32, 33, 34)
-
-
35. An apparatus for integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the apparatus comprising:
-
a coprocessor configured to (1) process the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions, and (2) generate a plurality of rule condition check results for the records based on the processing operation; wherein the coprocessor comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising an exact matching module, wherein exact matching module is configured to (1) determine whether the data in a data field of interest in a record is a member of a defined value set, and (2) generate a rule condition check result indicative of whether the data in the data field of interest is a member of the defined value set in response to the determination. - View Dependent Claims (36, 37, 38)
-
-
39. An apparatus for integrating a data stream within an enterprise computing system, the data stream comprising a plurality of records, each record having at least one data field, each data field having data therein, the apparatus comprising:
-
a coprocessor configured to (1) process the data stream with a plurality of hardware-accelerated data quality check operations, the data quality check operations corresponding to a plurality of rule conditions for the data fields and being configured to determine whether the data within the data fields of the data stream satisfy any of the rule conditions, and (2) generate a plurality of rule condition check results for the records based on the processing operation; wherein the coprocessor comprises a reconfigurable logic device, the reconfigurable logic device having a pipeline deployed thereon, the pipeline comprising a plurality of parallel processing paths, and wherein the parallel processing paths are configured to perform different data quality check operations on the data stream in parallel. - View Dependent Claims (40, 41, 42, 43)
-
Specification