Methods and computer program products for compression of sequencing data
First Claim
Patent Images
1. A compression method, comprising:
- measuring a waveform associated with a chemical event occurring on a sensor array, the measuring including digitizing voltage signals using an analog to digital converter to produce a plurality of frames of measured values for the waveform, the voltage signals generated by the sensor array in response to the chemical event, wherein the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction, wherein the waveform comprises at least one region associated with expected measured values and at least one region associated with unpredictable measured values;
applying a first compression process to the waveform using a processor, the first compression process including an averaging of one or more frames in one or more portions of the waveform to form frame-averaged data, wherein a number of frames of frame-averaged data is less than a number of frames in the plurality of frames of measured values;
applying a keyframe delta compression to the frame-averaged data using the processor, wherein the keyframe delta compression comprises calculating a difference between a current frame of the frame-averaged data and a previous frame of the frame-averaged data associated with the waveform;
forming a compressed data structure including a keyframe of the frame-averaged data and a plurality of the calculated differences subsequent to the keyframe, wherein the compressed data structure represents the keyframe and the plurality calculated differences in a number of bytes that is less than an original number of bytes representing the frame-averaged data;
determining compression information corresponding to one or more compressed data structures;
storing the compression information and the one or more compressed data structures in a memory; and
applying a second compression process to the waveform using the processor, the second compression process including a truncating of data corresponding to a portion of the waveform that is not related to a nucleotide incorporation component of the waveform.
1 Assignment
0 Petitions
Accused Products
Abstract
A compression method includes measuring a waveform associated with a chemical event occurring on a sensor array, wherein the waveform comprises at least one region associated with expected measured values and at least one region associated with unpredictable measured values; applying a first compression process to the waveform, the first compression process including an averaging of one or more frames in one or more portions of the waveform; and applying a second compression process to the waveform, the second compression process including a truncating of data corresponding to a portion of the waveform that is not related to a nucleotide incorporation component of the waveform.
30 Citations
20 Claims
-
1. A compression method, comprising:
-
measuring a waveform associated with a chemical event occurring on a sensor array, the measuring including digitizing voltage signals using an analog to digital converter to produce a plurality of frames of measured values for the waveform, the voltage signals generated by the sensor array in response to the chemical event, wherein the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction, wherein the waveform comprises at least one region associated with expected measured values and at least one region associated with unpredictable measured values; applying a first compression process to the waveform using a processor, the first compression process including an averaging of one or more frames in one or more portions of the waveform to form frame-averaged data, wherein a number of frames of frame-averaged data is less than a number of frames in the plurality of frames of measured values; applying a keyframe delta compression to the frame-averaged data using the processor, wherein the keyframe delta compression comprises calculating a difference between a current frame of the frame-averaged data and a previous frame of the frame-averaged data associated with the waveform; forming a compressed data structure including a keyframe of the frame-averaged data and a plurality of the calculated differences subsequent to the keyframe, wherein the compressed data structure represents the keyframe and the plurality calculated differences in a number of bytes that is less than an original number of bytes representing the frame-averaged data; determining compression information corresponding to one or more compressed data structures; storing the compression information and the one or more compressed data structures in a memory; and applying a second compression process to the waveform using the processor, the second compression process including a truncating of data corresponding to a portion of the waveform that is not related to a nucleotide incorporation component of the waveform. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a computer-usable medium having computer program logic recorded thereon that, when executed by one or more processors, compresses data from a sensor array, wherein the data comprise digitized voltage signals generated by the sensor array in response to a chemical event occurring on the sensor array to produce a plurality of frames of measured values of a waveform, wherein the chemical event is indicative of a number of nucleotide incorporations in a genetic sequencing reaction, the computer program logic comprising:
-
first computer readable program code that enables a processor to obtain the plurality of frames of measured values of the waveform associated with the chemical event occurring on the sensor array, wherein the waveform comprises at least one portion associated with expected measured values and at least one portion associated with unpredictable measured values; second computer readable program code that enables the processor to apply a first compression process to the waveform, the first compression process including an averaging of one or more frames in one or more portions of the waveform to form frame-averaged data, wherein a number of frames of the frame-averaged data is less than a number of frames in the plurality of frames of measured values; third computer readable program code that enables the processor to; apply a keyframe delta compression to the frame-averaged data, wherein the keyframe delta compression comprises calculating a difference between a current frame of the frame-averaged data and a previous frame of the frame-averaged data associated with the waveform, form a compressed data structure including a keyframe of the frame-averaged data and a plurality of the calculated differences subsequent to the keyframe, wherein the compressed data structure represents the keyframe and the plurality calculated differences in a number of bytes that is less than an original number of bytes representing the frame-averaged data, determine compression information corresponding to one or more compressed data structures, and store the compression information and the one or more compressed data structures in a memory; and fourth computer readable program code that enables the processor to apply a second compression process to the waveform, the second compression process including a truncating of data corresponding to a portion of the waveform that is not related to a nucleotide incorporation component of the waveform.
-
-
17. A method for compressing nucleic acid sequencing data, comprising:
-
obtaining, at a processor, raw data comprising digitized voltage signals from a semiconductor-based genetic sequencing sensor array comprising a plurality of sensors during a data acquisition time period, the raw data comprising at least a non-informative portion corresponding to a subinterval of the data acquisition time period having a location within the data acquisition time period that varies for different sensors according to a position of the sensor in the sensor array; transforming, using the processor, the raw data into compressed data using both a lossless compression process including a keyframe delta compression process and lossy compression processes including a variable frame averaging process and a data truncation process, the data truncation process being related for each sensor to the position of the sensor in the sensor array and configured to discard the non-informative portion of the raw data, wherein the variable frame averaging process produces variable frame-averaged data, wherein the keyframe delta compression process comprises; calculating a difference between a current frame of variable frame-averaged data and a previous frame of the variable frame-averaged data forming a compressed data structure comprising a keyframe of the variable frame-averaged data and a plurality of the calculated differences subsequent to the keyframe, wherein the compressed data structure represents the keyframe and the plurality calculated differences in a number of bytes that is less than an original number of bytes representing the variable frame-averaged data; determining compression information corresponding to one or more compressed data structures; and storing the compression information and the one or more compressed data structures in a memory. - View Dependent Claims (18, 19, 20)
-
Specification