Compression of data transmitted over a network
First Claim
1. A method of managing a data transfer over a network, comprising:
- determining an input matching data segment of input data by;
i) selecting a candidate input matching data segment from the input data based on a fitness function value computed for each of a plurality of portions of the input data, by;
sliding a window over portions of the input data, and for each change in position of the sliding window, computing each fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values,selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria,selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment,using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, andemploying the file identifier and offset to locate within the synchronized store the candidate store matching segment of data,ii) determining a candidate store matching segment of the data in a synchronized store, andiii) selectively revising at least one boundary of the candidate input matching data segment by comparing data contiguous with the candidate store matching segment with data contiguous with the candidate input matching segment;
determining an unmatched portion of data within the input data that is distinct from the input matching segment;
determining an encoded representation of the input matching data segment at least partly based on the synchronized store; and
generating a data structure that includes at least a portion of the unmatched portion and the encoded representation.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and apparatus are directed towards identifying adaptive length segments of redundant data for encoding a data structure. Initial boundaries are identified for an input matching segment within input data and for a candidate store matching segment in a synchronized store. The data prior to and after the boundaries are compared to identify matching data. As matching data is identified, at least one of the boundaries of the matching segments is revised. An encoded representation of the resulting input matching segment is then generated based in part on pointers and offsets into the synchronized store. A data structure is generated based on the encoded representation and unmatched portion, which is sent to a receiver. The receiver uses the data structure to extract matching data from the synchronized store, and together with the unmatched input data in the data structure, reconstruct the input data.
104 Citations
27 Claims
-
1. A method of managing a data transfer over a network, comprising:
-
determining an input matching data segment of input data by; i) selecting a candidate input matching data segment from the input data based on a fitness function value computed for each of a plurality of portions of the input data, by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing each fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data, ii) determining a candidate store matching segment of the data in a synchronized store, and iii) selectively revising at least one boundary of the candidate input matching data segment by comparing data contiguous with the candidate store matching segment with data contiguous with the candidate input matching segment; determining an unmatched portion of data within the input data that is distinct from the input matching segment; determining an encoded representation of the input matching data segment at least partly based on the synchronized store; and generating a data structure that includes at least a portion of the unmatched portion and the encoded representation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A network device for use in managing a data transfer over a network, comprising:
-
a transceiver to send and receive the data over the network; and a data reduction component that is operative to perform actions, including; selecting a candidate input matching data segment from input data based on a fitness function value computed for each of a plurality of portions of the input data, by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing each fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data; determining a candidate store matching segment of the data in a synchronized store; determining an input matching data segment of the input data and a corresponding store matching data segment in the synchronized store by selectively revising at least one boundary of the candidate input matching data segment by comparing data that is contiguous with and prior to a left boundary of the candidate store matching segment with data that is contiguous with and prior to a left boundary of the input matching segment; determining an unmatched portion of data within the input data that is distinct from the determined input matching segment; determining an encoded representation of the determined input matching data segment at least partly based on the synchronized store; and sending a data structure over the network, that includes at least a portion of the unmatched portion and the encoded representation. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system of managing a data transfer over a network, comprising:
-
a synchronized data store; a computer processor that is configured to store data and program instructions that when executed on a computer system perform actions comprising selecting a candidate input matching data segment from input data based on a fitness function value computed for each of a plurality of portions of the input data, and determining a candidate store matching segment of data in the synchronized data store by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing the fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data; means for selectively revising the candidate store matching segment and input matching data segment; and the computer processor that is further configured to store data and other program instructions that when executed on the computer system are operative to determine an unmatched portion of data within the input data, generate an encoded representation of the input data matching data segment, and generate a data structure including at least a portion of the unmatched portion and the encoded representation.
-
-
17. A system of managing a data transfer over a network, comprising:
-
a synchronized store; and a first network device that includes program instructions operative to perform actions, including; determining an input matching data segment of input data and a corresponding store matching data segment of data in a synchronized store by; i) selecting a candidate input matching data segment from the input data based on a fitness function value computed for each of a plurality of portions of the input data, by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing the fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data; ii) determining a candidate store matching segment of the data in the synchronized store, and iii) selectively revising at least one boundary of the candidate store matching segment and a corresponding boundary of the candidate input matching data segment by comparing data contiguous with the candidate store matching segment with data contiguous with the candidate input matching segment; determining an unmatched portion of data within the input data that is distinct from the input matching segment; determining an encoded representation of the input matching data segment at least partly based on the synchronized store; and generating a data structure that includes at least a portion of the unmatched portion and the encoded representation. - View Dependent Claims (18, 19)
-
-
20. A network device for use in managing a data transfer over a network, comprising:
-
a synchronized data store; and program code that is operative to perform actions, including; determining a store matching segment of data in the synchronized store; determining the input matching data segment of input data by selectively modifying a length of a candidate input matching data segment to include data based on a comparison of data contiguous to the store matching segment with data contiguous to the candidate input matching segment, wherein the candidate input matching data segment is selected based on a fitness function value computed for each of a plurality of portions of the input data by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing the fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data; determining an unmatched portion of data within the input data; and generating a data structure that includes at least a portion of the unmatched portion and an encoded representation of the input matching data segment. - View Dependent Claims (21, 22, 23)
-
-
24. A network device for use in managing a data transfer over a network, comprising:
-
a transceiver to send and receive the data over the network; and a data reduction component that is operative to perform actions, including; determining an input matching data segment of input data by; i) selecting a candidate input matching data segment from the input data based on a fitness function value computed for each of a plurality of portions of the input data, by; sliding a window over portions of the input data, and for each change in position of the sliding window, computing the fitness function value from a portion of the input data within the sliding window to generate a plurality of fitness function values, selecting from the plurality of fitness function values, a best fitness function value based on a defined criteria, selecting a portion of the input data corresponding to the selected best fitness function value as the candidate input matching data segment, using a hash value that is determined from the selected candidate input matching data as an index into a dictionary to obtain a file identifier and offset, and employing the file identifier and offset to locate within the synchronized store the candidate store matching segment of data; ii) determining a candidate store matching segment of the data in a synchronized store, and iii) selectively revising at least one boundary of the candidate input matching data segment by comparing data prior to and within the candidate store matching segment with data prior to and within the candidate input matching segment; iv) determining a portion of the candidate input matching data segment to include in the input matching data segment based on a comparison of at least a portion of the candidate input matching data segment with the candidate store matching data segment; determining an unmatched portion of data within the input data that is distinct from the input matching segment; determining an encoded representation of the input matching data segment at least partly based on the synchronized store; and generating a data structure that includes at least a portion of the unmatched portion and the encoded representation. - View Dependent Claims (25, 26, 27)
-
Specification