Hardware efficient fingerprinting
First Claim
1. A system comprising:
- a fingerprint pipeline configured to compute fingerprints for a data chunk, the fingerprint pipeline comprising;
a Fresh module configured to;
split a first shingle of data from the data chunk into a plurality of portions;
perform a first Fresh function on a first portion of the plurality of portions; and
perform a second Fresh function on a second portion of the plurality of portions using a result of the first Fresh function to compute a first fingerprint from the first shingle of data from the data chunk;
a first Shift module communicatively coupled with an output of the Fresh module, wherein the first Shift module is configured to compute a second fingerprint using the first fingerprint, the first shingle of data from the data chunk, and a second shingle of data from the data chunk;
a plurality of sampling modules communicatively coupled with the fingerprint pipeline, the plurality of sampling modules configured to sample candidate fingerprints for generating a sketch for the data chunk; and
a fingerprint selection module communicatively coupled with the plurality of sampling modules, the fingerprint selection module configured to select a plurality of fingerprints to create a sketch of the data chunk.
7 Assignments
0 Petitions
Accused Products
Abstract
An approach for fingerprinting large data objects at the wire speed has been disclosed. The techniques include Fresh/Shift pipelining, split Fresh, optimization, online channel sampling, and pipelined selection. The architecture can also be replicated to work in parallel for higher system throughput. Fingerprinting may provide an efficient mechanism for identifying duplication in a data stream, and deduplication based on the identified fingerprints may provide reduced storage costs, reduced network bandwidth consumption, reduced processing time and other benefits. In some embodiments, fingerprinting may be used to ensure or verify data integrity and may facilitate detection of corruption or tampering. An efficient manner of generating fingerprints (either via hardware, software, or a combination) may reduce a computation load and/or time required to generate fingerprints.
-
Citations
18 Claims
-
1. A system comprising:
-
a fingerprint pipeline configured to compute fingerprints for a data chunk, the fingerprint pipeline comprising; a Fresh module configured to; split a first shingle of data from the data chunk into a plurality of portions; perform a first Fresh function on a first portion of the plurality of portions; and perform a second Fresh function on a second portion of the plurality of portions using a result of the first Fresh function to compute a first fingerprint from the first shingle of data from the data chunk; a first Shift module communicatively coupled with an output of the Fresh module, wherein the first Shift module is configured to compute a second fingerprint using the first fingerprint, the first shingle of data from the data chunk, and a second shingle of data from the data chunk; a plurality of sampling modules communicatively coupled with the fingerprint pipeline, the plurality of sampling modules configured to sample candidate fingerprints for generating a sketch for the data chunk; and a fingerprint selection module communicatively coupled with the plurality of sampling modules, the fingerprint selection module configured to select a plurality of fingerprints to create a sketch of the data chunk. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
receiving a data chunk including a plurality of shingles of data; performing a Fresh function to compute a first fingerprint for a first shingle of data of the plurality of shingles of data k splitting the first shingle of data into a plurality of portions; performing a first Fresh function on a first portion of the plurality of portions; and performing a second Fresh function on a second portion of the plurality of portions using a result of the first Fresh function; and performing a Shift function to compute a second fingerprint for a second shingle of data of the plurality of shingles of data, wherein the Shift function uses the first fingerprint for the first shingle of data as an input. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
performing a plurality of Fresh functions in parallel to compute a first plurality of fingerprints for a first shingle of data, wherein each Fresh function comprises; splitting the first shingle of data into a plurality of portions; performing a first Fresh function on a first portion of the plurality of portions; and performing a second Fresh function on a second portion of the plurality of portions using a result of the first Fresh function; and performing a first plurality of Shift functions in parallel to compute a second plurality of fingerprints for a second shingle of data, wherein the first plurality of Shift functions use the first plurality of fingerprints for the first shingle of data, a plurality of portions of the first shingle of data, and a plurality of portions of the second shingle of data as inputs. - View Dependent Claims (15, 16, 17, 18)
-
Specification