Method and means for managing linear mapped address spaces storing compressed data at the storage subsystem control unit or device level
First Claim
1. A method of preserving locality of referencing of strings in a system for update writing-in-place of compressed images of fixed length symbol strings in counterpart addresses in a bounded (constant sum) addressable storage medium, the method comprising the steps of:
- (a) partitioning the bounded addressable storage medium into N1 segments in a linearly addressable space and N2 segments in a linked list addressable space, each segment in N1 being of a first length less than an estimated average uncompressed symbol string length, each segment in N2 being of a second length less than the estimated average uncompressed symbol string length;
(b) populating segments in N1 one-to-one and ONTO with compressed images of ones of the fixed length symbol strings;
(c) updating-in-place ones of the compressed images of the fixed length symbol strings stored among said N1 address segments including;
(1) writing portions of any compressed symbol string image exceeding the size of the segment in N1 to which it is addressed into one or more counterpart segments in N2, and(2) embedding a token in the counterpart segment in N1 pointing to the segment or segments in N2; and
(d) recursively adjusting the size of all of the segments in N1 in a direction so that the number of segments in N2 lie within a predetermined percentage range of the current number of segments in N2.
1 Assignment
0 Petitions
Accused Products
Abstract
A constant size storage can be managed to preserve locality of referencing where it is partitioned into linear addressable storage space for compressed symbol strings and a linked list addressable space for overflowing portions of each compressed string, a token to the overflow being embedded in the linear address. The linear space is readjusted periodically in a direction so as to maintain the amount of available overflow within to lie within a certain range of current usage. Changes in compression statistics result in changing overflow usage requiring readjustment to minimize internal fragmentation etc.
-
Citations
10 Claims
-
1. A method of preserving locality of referencing of strings in a system for update writing-in-place of compressed images of fixed length symbol strings in counterpart addresses in a bounded (constant sum) addressable storage medium, the method comprising the steps of:
-
(a) partitioning the bounded addressable storage medium into N1 segments in a linearly addressable space and N2 segments in a linked list addressable space, each segment in N1 being of a first length less than an estimated average uncompressed symbol string length, each segment in N2 being of a second length less than the estimated average uncompressed symbol string length; (b) populating segments in N1 one-to-one and ONTO with compressed images of ones of the fixed length symbol strings; (c) updating-in-place ones of the compressed images of the fixed length symbol strings stored among said N1 address segments including; (1) writing portions of any compressed symbol string image exceeding the size of the segment in N1 to which it is addressed into one or more counterpart segments in N2, and (2) embedding a token in the counterpart segment in N1 pointing to the segment or segments in N2; and (d) recursively adjusting the size of all of the segments in N1 in a direction so that the number of segments in N2 lie within a predetermined percentage range of the current number of segments in N2. - View Dependent Claims (2)
-
-
3. A method for updating in place and rewriting compressed images of strings into a storage medium in a system responsive to fixed length strings generated by a plurality of dynamic Markov symbol string sources and means for writing compressed images of said fixed length strings into counterpart addresses of a fixed sized cyclic storage medium, the method comprising the steps of:
-
(a) determining a compression ratio bound r as the superimposition of a global Gaussian or Laplacian symbol probability distribution having a lower tail over said string sources onto a partially ordered range r0, r1, . . . ,rm, . . . of compression ratios; (b) partitioning said fixed size cyclic storage medium into N1 linearly addressable segments and N2 linked list addressable segments, each of the N1 segments approximating the size of a constant block, record, or track of symbols divided by the compression ratio bound r coincident at a predetermined standard deviation in the lower tail (2 Sigma) of said distribution; (c) populating ones of the segments in N1 one-to-one and ONTO with compressed images of ones of said fixed length symbol strings, (d) updating the strings in place in ones of the segments in N1 including; (1) writing portions of any compressed images exceeding the size of the segment to which they are addressed in segments in N1 into one or more counterpart segments in N2, and (2) embedding a token in the counterpart segment in N1 pointing to the segment or segments in N2; and (e) recursively adjusting the size of all of the segments in N1 in a direction so that the number of segments in N2 lie within a predetermined percentage range of the current number of segments in N2. - View Dependent Claims (4, 5, 6)
-
-
7. A method for preserving locality of referencing strings in a bounded addressable storage medium in a system for update writing-in-place of compressed images of fixed length symbol strings in counterpart addresses in the bounded (constant sum) addressable storage medium, the method comprising the steps of:
-
(a) partitioning the bounded storage medium into N1 segments in a linearly addressable space and N2 segments in a linked list addressable space, each segment in N1 being of a first length<
an estimated average uncompressed symbol string length, each segment in N2 being of a second length<
the estimated average uncompressed string length;(b) populating segments in N1 one-to-one and ONTO with compressed images of ones of the fixed length symbol strings; (c) updating-in-place ones of the compressed images of the symbol strings stored among said N1 address segments including; (1) writing portions of any compressed symbol string image exceeding the size of the segment in N1 to which it is addressed into one or more counterpart segments in N2, and (2) embedding a token in the counterpart segment in N1 pointing to the segment or segments in N2; and (d) recursively adjusting the size of all of the segments in N1 in a direction so that the number of segments in N2 lie within a predetermined percentage range of the current number of segments in N2, said recursive adjustment of size including either (1) reclaiming segments for use in N1 from segments assigned to N2 where the segment size in N1 is being increased, and (2) reclaiming segments for use in N2 from segments assigned to N1 where the segment size N1 is being decreased. - View Dependent Claims (8)
-
-
9. In a storage subsystem having a plurality of cyclic storage devices and a control unit coupling said cyclic storage devices and responsive to commands for moving uncompressed data between said control unit and an attached CPU, each cyclic storage device having means for compressing symbol strings as each string is recorded at addressable locations on the device, and means for generating uncompressed images from said recorded symbol strings stored at addressable locations on said device, wherein the improvement comprises:
-
means for partitioning a constant size address space over said devices into a plurality of linear addressable segments and a plurality of link list addressable segments, the size of each linear addressable segment being an average estimate length of a compressed symbol string; means for populating one-to-one and ONTO the linear addressable segments over the devices; means at each device for updating-in-place selected ones of said linear addressable segments, for writing portions of any compressed symbol string exceeding the size of the linear addressable segment into one or more counterpart link list addressable segments, and for embedding a token in the counterpart linear addressable segment pointing to the link list addressable segment or segments; and means for recursively adjusting the size of all of the linear addressable segments in a direction so that the number of link listed segments lie within a predetermined percentage range of the current number of link listed segments. - View Dependent Claims (10)
-
Specification