Method and apparatus for accelerated format translation of data in a delimited data format
First Claim
Patent Images
1. A computer-implemented method comprising:
- at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) receiving an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields;
the at least one member processing the received byte stream to identify the field delimiter characters that are present in the received byte stream; and
the at least one member translating the received byte stream to an outgoing byte stream arranged in a mapped variable field format based on the identified field delimiter characters, the outgoing byte stream comprising (1) a plurality of the data characters of the received byte stream arranged in a plurality of variable-size fields, and (2) header information, wherein the header information comprises a plurality of byte offset values that identify where a plurality of subsequent variable-size fields are located in the outgoing byte stream.
3 Assignments
0 Petitions
Accused Products
Abstract
Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a mapped variable field format using pipelined operations. A reconfigurable logic device can be used in exemplary embodiments as a platform for the format translation.
326 Citations
72 Claims
-
1. A computer-implemented method comprising:
-
at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) receiving an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields; the at least one member processing the received byte stream to identify the field delimiter characters that are present in the received byte stream; and the at least one member translating the received byte stream to an outgoing byte stream arranged in a mapped variable field format based on the identified field delimiter characters, the outgoing byte stream comprising (1) a plurality of the data characters of the received byte stream arranged in a plurality of variable-size fields, and (2) header information, wherein the header information comprises a plurality of byte offset values that identify where a plurality of subsequent variable-size fields are located in the outgoing byte stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 58, 59, 60)
-
-
25. An apparatus comprising:
at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP), the at least one member configured to (1) receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields, (2) process the received byte stream to identify the field delimiter characters that are present in the received byte stream, and (3) translate the received byte stream to an outgoing byte stream arranged in a mapped variable field format based on the identified field delimiter characters, the outgoing byte stream comprising (1) a plurality of the data characters of the received byte stream arranged in a plurality of variable-size fields, and (2) header information, wherein the header information comprises a plurality of byte offset values that identify where a plurality of subsequent variable-size fields are located in the outgoing byte stream. - View Dependent Claims (61, 62, 63, 64)
-
26. A computer-implemented method comprising:
-
at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) receiving an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields; the at least one member processing the received byte stream to identify the field delimiter characters that are present in the received byte stream; and the at least one member translating the received byte stream to an outgoing byte stream based on the identified field delimiter characters, the outgoing byte stream arranged in a structured format and being representative of the data in the fields of the received byte stream, the outgoing byte stream comprising (1) a plurality of the data characters of the received byte stream, and (2) header information indicative of where boundaries exist between a plurality of fields in the outgoing byte stream, the structured format comprising a mapped variable field format that is configured to permit a downstream processing component to jump from field to field in the outgoing byte stream based on the header information without analyzing the data characters of the outgoing byte stream. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 65, 66, 67, 72)
-
-
43. A computer-implemented method comprising:
-
receiving data in a delimited data format; converting the received data to a mapped variable field format, wherein the converted data includes (1) a plurality of records, wherein each of a plurality of the records includes a plurality variable-size fields, and (2) header data indicative of where boundaries exist between a plurality of records in the converted data and where boundaries exist between a plurality of fields in the converted data, and wherein the header data includes a plurality of byte offset values that identify where boundaries exist between a plurality of subsequent variable-size fields in the converted data; and performing a plurality of processing operations on the converted data to generate processed data in the mapped variable field format; and loading the processed data into a database; and wherein the converting step is performed by at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP). - View Dependent Claims (44, 45, 46, 47, 48)
-
-
49. An apparatus comprising:
-
a reconfigurable logic device comprising a data translation pipeline, the pipeline comprising (1) a first hardware logic circuit configured to convert incoming data arranged in a delimited data format to an internal format, the incoming data in the delimited data format comprising a plurality of data characters, a plurality of field delimiter characters, a plurality of record delimiter characters, and a plurality of shield characters, the converted data having the internal format being stripped of field delimiter characters and record delimiter characters while preserving data characters of incoming fields, and wherein the converted data having the internal format includes associated control data indicative of where boundaries exist between a plurality of records in the converted data and where boundaries exist between a plurality of fields in the converted data, and (2) a second hardware logic circuit downstream from the first hardware logic circuit, the second hardware logic circuit configured to remove shield characters from the converted data having the internal format; a hardware-accelerated data processing stage configured to perform a data processing operation on output from the second hardware logic circuit to thereby generate processed data. - View Dependent Claims (50, 57)
-
-
51. A computer-implemented method comprising:
-
converting incoming data arranged in a delimited data format to an internal format, the incoming data in the delimited data format comprising a plurality of data characters, a plurality of field delimiter characters, a plurality of record delimiter characters, and a plurality of shield characters, the converted data having the internal format being stripped of field delimiter characters and record delimiter characters while preserving data characters of incoming fields, and wherein the converted data having the internal format includes associated control data indicative of where boundaries exist between a plurality of records in the converted data and where boundaries exist between a plurality of fields in the converted data; removing shield characters from the converted data; performing at least one hardware-accelerated processing operation on at least a portion of the converted data to generate processed data; loading the processed data into a database; and wherein the converting step is performed by at least one member of a group consisting of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP). - View Dependent Claims (52, 53, 54, 55, 56, 68, 69, 70, 71)
-
Specification