Method and system for compression and decompression using variable-sized offset and length fields
First Claim
1. In a computer system, a method for compressing a sequence of data, comprising:
- (a) dividing the sequence of data into a series of blocks;
(b) identifying a pattern of data located at a given location in a block that also occurs earlier in the data at a previous location in the block;
(c) encoding the pattern of data at the given location in the block as a copy token having a fixed number of bits, wherein said copy token includes an offset field that identifies an offset between the pattern of data at the given location in the block and the pattern of data at the previous location in the block at which the pattern of data also occurred and wherein how many bits that are included in the offset field depends upon the offset between the given location in the block and the previous location in the block for the pattern of data.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system includes a compression engine for compressing a decompressed sequence of data to produce a compressed sequence of data. The compression engine encodes each piece of data in the decompressed sequence of data as either a portion of a copy token or as a literal token. Tokens are grouped together into groups of up to 8 tokens and a bitmap holding 8 bits is provided to identify the respective tokens as either copy tokens or literal tokens. The copy tokens encode sub-sequences of data that have previously occurred in the decompressed data sequence. Each copy token is of a like size but includes a variable-sized offset field for encoding an offset between a current occurrence of a sub-sequence of data and a previous occurrence of a sub-sequence of data. The offset field is variable-sized to encode the offset in a minimal number of bits. The computer system also includes a decompression engine for decompressing data sequences that have been compressed using the compression engine.
-
Citations
34 Claims
-
1. In a computer system, a method for compressing a sequence of data, comprising:
-
(a) dividing the sequence of data into a series of blocks; (b) identifying a pattern of data located at a given location in a block that also occurs earlier in the data at a previous location in the block; (c) encoding the pattern of data at the given location in the block as a copy token having a fixed number of bits, wherein said copy token includes an offset field that identifies an offset between the pattern of data at the given location in the block and the pattern of data at the previous location in the block at which the pattern of data also occurred and wherein how many bits that are included in the offset field depends upon the offset between the given location in the block and the previous location in the block for the pattern of data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. In a computer system, a method of compressing a sequence of data into chunks of compressed data, comprising:
-
(a) dividing the sequence of data into a series of blocks; (b) processing a first portion of the data at a location in a block to identify at least one sub-sequence of data in the first portion to compress; (c) determining another sub-sequence of data to be compressed at another location in a second portion of the block; (d) determining whether at least part of the other sub-sequence of data in the second portion of the block matches at least part of the sub-sequence of data in the first portion of the block; (e) where at least part of the other sub-sequence of data in the second portion of the block does not match at least part of the sub-sequence of data in the first portion of the block, encoding the not matched part of the other sub-sequence of data in the second portion of the block as a literal token, said literal token being added to a chunk of compressed data that is associated with the block; and (f) where at least part of the other sub-sequence of data in the second portion of the block does match at least part of the sub-sequence of data in the first portion of the block, encoding the matched part of the other sub-sequence of data in the second portion of the block into a copy token with a fixed number of bits that is added to the chunk of compressed data associated with the block the copy token including an offset field that identifies an offset between the other location of the matched part of the other sub-sequence of data in the second portion of the block and the location of the matched part of the sub-sequence of data in the first portion of the block, and a length field that identifies a length of the matched part of the sub-sequence of data in the first portion of the block, wherein how many bits are in the offset field depends on the position of the matched part at the other location of the other sub-sequence of data in the second portion of the block. - View Dependent Claims (8, 9, 10, 11)
-
-
12. In a computer system, a method of compressing a file comprising pieces of data, comprising:
-
(a) dividing the file into blocks of data; (b) separately compressing each block of data into a chunk of compressed data by performing the following; (i) sequentially examining each sub-sequence of data in each block of data; (ii) encoding each subsequent sub-sequence of data in a block that is over a minimum threshold length and that has occurred previously in the block as a copy token of a predetermined number of bits, each copy token including an offset field that specifies an offset between the subsequent occurrence of the sub-sequence of data at a location in the block and another location in the block that the sub-sequence of data first occurred and a length field that specifies a length of the first occurrence of the sub-sequence of data at the other location in the block, wherein how many bits are used in the offset field depends on the location of the subsequent occurrence of the sub-sequence of data within the block; and (iii) encoding each sub-sequence of data in the block, that is not encoded as a copy token, as a literal token in the compressed chunk. - View Dependent Claims (13, 14, 15)
-
-
16. In a computer system, a method of compressing a sequence of blocks of data, comprising the computer-implemented steps of:
-
(a) compressing a first sub-sequence of data at a first location in a block of data by encoding the first sub-sequence of data as a first copy token having a fixed number of bits, said first copy token including an offset field that has a first number of bits and that encodes an offset between the first location and a previous occurrence of the first sub-sequence of data in the block of data; and (b) compressing a second sub-sequence of data at a second location in the block of data as a second copy token having the fixed number of bits, said second copy token having another offset field that has a second number of bits that differs from the first number of bits and that encodes an offset between the second location in the block of data and a previous occurrence of the second sub-sequence of data in the block of data. - View Dependent Claims (17, 18)
-
-
19. In a computer system, a method of decompressing an item in a chunk of a compressed file having a number of separate chunks, comprising:
-
(a) identifying which chunk of the compressed file holds the item, said identified chunk including like-sized copy tokens and also including literal tokens; (b) decompressing the chunk of the compressed file that has been identified as holding the item while keeping other chunks compressed, said decompressing comprising; (i) identifying a first of the copy tokens that encodes a current sub-sequence of data that includes the item, said first copy token including an offset field that specifies an offset between the current sub-sequence of data and a previous occurrence of the sub-sequence of data that is included in a sequence of literal tokens; (ii) identifying how many bits are in the offset field by identifying a location of a first of the sequence of literal tokens that encode the previous occurrence of the sub-sequence of data; and (iii) decompressing the first copy token by replacing the first copy token with the previous occurrence of the sub-sequence of data that the sequence of literal tokens encode.
-
-
20. In a computer system, a method of decompressing a sequence of chunks of compressed data containing copy tokens and literal tokens, wherein the copy tokens each contain an offset field and a number of bits in the offset field that depends upon a location in a sequence of data prior to compressing the sequence of data into the sequence of chunks of compressed data, said method comprising:
-
for each of the copy tokens, (i) identifying a number of bits in the offset field by determining a location of a sub-sequence of data prior to encoding by the copy token; (ii) using the identified offset field to locate the sub-sequence of data that is encoded by the copy token; (iii) replacing the copy token with the sub-sequence of data that is encoded by the copy token; and for each literal token, keeping the literal token in the sequence.
-
-
21. A computer system comprising:
-
(a) a storage for storing a sequence of data; and (b) a compression engine for compressing the sequence of data into at least one chunk of compressed data that includes a sequence of copy tokens and literal tokens, each copy token encoding a sub-sequence of data as a copy of a like sub-sequence of data that has previously occurred in the sequence of data and each literal token encoding a literal piece of data wherein each copy token is of like size and includes a variable-sized offset field having a number of bits that is based on the location of the sub-sequence encoded by the copy token in the sequence of data. - View Dependent Claims (22)
-
-
23. A computer system comprising:
-
(a) a storage for storing at least one chunk of compressed data, each chunk of compressed data including copy tokens that encode copies of previously occurring sub-sequences of data and literal tokens that literally encode pieces of data, wherein the copy tokens are all of a like size and include a variable-length offset field that encodes an offset to a previous occurrence of the sub-sequence of data; and (b) a decompression engine for decompressing each chunk of compressed data into a decompressed sequence of data, said decompression engine including a copy token decompressor for decompressing the copy tokens, said copy token decompressor identifying how many bits are in the offset fields of each copy token based on a location of the previous occurrence of the sub-sequence of data prior to encoding as the copy token.
-
-
24. A computer-readable storage media comprising:
a compression engine for compressing a sequence of data into at least one chunk of compressed data, each chunk including a sequence of copy tokens and literal tokens, each copy token encoding a sub-sequence of data as a copy of an identical sub-sequence of data that has previously occurred in the decompressed sequence of data and each literal token encoding a literal piece of data wherein each copy token is of like size and includes a variable-sized offset field having a number of bits that is based on a location in the sequence of data of a subsequent occurrence of the identical sub-sequence of data prior to the encoding of the subsequent occurrence of the identical sub-sequence as the copy token.
-
25. A computer-readable storage media comprising:
a decompression engine for decompressing a chunk of compressed data into a block of data, said decompression engine including a copy token decompressor for decompressing the copy tokens, said copy token decompressor identifying how many bits are in the offset fields of each copy token based on a previous location in the data of an identical sub-sequence of data prior to the encoding of a subsequent occurrence of the identical sub-sequence of data as the copy token.
-
26. In a computer system, a method for compressing a sequence of data, comprising:
-
(a) dividing the sequence of the data into a series of consecutive blocks, each block having a predetermined amount of data; (b) compressing each block of data into a chunk of compressed data, the compression of a block of data comprising; (i) sequentially examining data in the block to identify a pattern of data; (ii) encoding a copy token to represent each subsequent occurrence of each pattern in the block of data that has initially occurred at a previous location in the block of data; (iii) encoding a literal token to represent each initial occurrence of each pattern in the block of data; and (iv) producing a sequence of each literal token and each copy token. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
Specification