Method and Apparatus for Data Compression
First Claim
1. A data compression method comprising:
- processing a first input stream of uncompressed data for a first file, including dividing the input stream into a plurality of segments;
for each segment, applying a hash to a first segment and associating an offset and length with the hashed segment for identifying the location and size of the hash;
storing a hash and corresponding offset and length for the first segment into a hash table;
comparing a subsequent segment within the input stream with all other hashes in the hash table;
based upon the comparison, writing a reference to a prior hash for an identified duplicate segment with a new offset location for the subsequent segment and absent applying the hash to the identified duplicate, and applying a new hash to an identified non-duplicate segment and storing the new hash and corresponding new offset into the hash table; and
a compressed output stream of data from the hash table retained on storage media.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and article for compressing an input stream of uncompressed data. The input stream is divided into one or more data segments. A hash is applied to a first data segment, and an offset and length are associated with this first segment. This hash, together with the offset and length data for the first segment, is stored in a hash table. Thereafter, a subsequent segment within the input stream is evaluated and compared with all other hash entries in the hash table, and a reference is written to a prior hash for an identified duplicate segment. The reference includes a new offset location for the subsequent segment. Similarly, a new hash is applied to an identified non-duplicate segment, with the new hash and its corresponding offset stored in the hash table. A compressed output stream of data is created from the hash table retained on storage media.
116 Citations
23 Claims
-
1. A data compression method comprising:
-
processing a first input stream of uncompressed data for a first file, including dividing the input stream into a plurality of segments; for each segment, applying a hash to a first segment and associating an offset and length with the hashed segment for identifying the location and size of the hash; storing a hash and corresponding offset and length for the first segment into a hash table; comparing a subsequent segment within the input stream with all other hashes in the hash table; based upon the comparison, writing a reference to a prior hash for an identified duplicate segment with a new offset location for the subsequent segment and absent applying the hash to the identified duplicate, and applying a new hash to an identified non-duplicate segment and storing the new hash and corresponding new offset into the hash table; and a compressed output stream of data from the hash table retained on storage media. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for data compression, comprising:
-
a processor in communication with storage media; a first input stream of data for a first file local to the storage media configured to be processed for compression; a compression manager in communication with the first input stream, the manager to divide the first input stream into a plurality of segments; the compression manager to apply a hash to each segment, and an offset and length identifier associated with the hashed segment to identify the location and size of the hash; the compression manager to store the hash and corresponding offset and length for each unique segment in a hash table; a director in communication with the compression manager, the director to compare of a subsequent segment within the first file with all other hashes in the hash table; based upon the comparison, the director to reference a prior hash written with a new offset for an identified duplicate segment absent application of the hash to the identified duplicate, and to apply a new hash to an identified non-duplicate segment and to store the new hash and corresponding new offset and length data into the hash table; and a compressed output stream of data retained on storage media. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An article for compressing data, comprising:
-
a first input stream of uncompressed data for a first file; a computer readable carrier including computer program instructions configured to compress data of the first file, the instructions comprising; instructions to divide the input stream into a plurality of segments; for each segment, instructions to apply a hash to the segment and to associate an offset with the hashed segment to identify the location of the hash; instructions to store a hash and corresponding offset and a length for each unique segment into a hash table; instructions to compare a subsequent segment within the same file with all other hashes in the hash table; and based upon the comparison, instructions to write a reference to a prior hash with a new offset location for an identified duplicate segment absent applying the hash to the identified duplicate, and instructions to apply a new hash to an identified non-duplicate segment and to store the new hash and corresponding new offset and length into the hash table; and a compressed output stream of data retained on storage media. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification