IDENTIFICATION OF PORTIONS OF DATA
First Claim
Patent Images
1. A computer-implemented method for identifying portions of data on a storage media, the method comprising:
- accessing a storage media commutatively coupled to a first computer;
determining if the storage media has a block size with a multiple of 4096 bytes;
in response to the storage media having a block size that is a multiple of 4096 bytes performing;
a) retrieving a next 4096 byte cluster and position;
b) using a first hash function on the 4096 byte cluster to produce a first hash value;
c) applying a first bloom filter to the first hash value;
d) in response to the first bloom filter returning a possibility of the first hash value in a first set of data, performinge) using a second hash function on the 4096 byte cluster to produce a second hash value;
f) applying a second bloom filter to the second hash value;
g) in response to the second bloom filter returning a possibility of the second hash value in a second set of data, transmitting the second hash value and the position to a second computer;
h) determining whether there are more 4096 byte clusters to be examined; and
.i) in response to more 4096 byte clusters to be examined, returning to step a.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a novel system and process for automating the process of identifying deleted file chunks. The present invention has two components. A client component to identify data chunks and a server component for storage and indexing technology for the over 1 billion records relating to the data chunks necessary to run the software.
-
Citations
20 Claims
-
1. A computer-implemented method for identifying portions of data on a storage media, the method comprising:
-
accessing a storage media commutatively coupled to a first computer; determining if the storage media has a block size with a multiple of 4096 bytes; in response to the storage media having a block size that is a multiple of 4096 bytes performing; a) retrieving a next 4096 byte cluster and position; b) using a first hash function on the 4096 byte cluster to produce a first hash value; c) applying a first bloom filter to the first hash value; d) in response to the first bloom filter returning a possibility of the first hash value in a first set of data, performing e) using a second hash function on the 4096 byte cluster to produce a second hash value; f) applying a second bloom filter to the second hash value; g) in response to the second bloom filter returning a possibility of the second hash value in a second set of data, transmitting the second hash value and the position to a second computer; h) determining whether there are more 4096 byte clusters to be examined; and
.i) in response to more 4096 byte clusters to be examined, returning to step a. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method for identifying portions of data on a storage media, the method on a second computer comprising:
-
receiving a group or hash values and a physical location of a data block on a storage media corresponding to each of the hash values, the storage media communicatively coupled to a first computer being reviewed for portions of data; in response to receiving the group of hash values, for each hash value in the group of hash values performing; a) looking up a next hash value in a database; b) in response to the next hash value matching in the database, determining if the hash value that matches is unique to a set of target data file values, c) in response to the hash value being unique to a set of target data file values, updating metrics for every data file that match the target data file values, otherwise returning to step a; d) determining if there are more hash values to be examined in the group of hash values e) in response to more hash values being available, returning to step a, otherwise proceeding to step f; f) selecting target data files with a highest percentage of hash values that match the target data file values; g) remove hash values that match across all target data files and determining if any the remaining number of hash values pass a threshold; and h) in response to the remaining number of target hashes being above a threshold, return to step f, otherwise sending results data to the first computer.
-
-
16. A system for identifying portions of data on a storage media, the system comprising:
-
a computer memory capable of storing machine instructions; and a hardware processor in communication with the computer memory, the hardware processor configured to access the computer memory, the hardware processor performing accessing a storage media commutatively coupled to a first computer; determining if the storage media has a block size with a multiple of 4096 bytes; in response to the storage media having a block size that is a multiple of 4096 bytes performing; a) retrieving a next 4096 byte cluster and position; b) using a first hash function on the 4096 byte cluster to produce a first hash value; c) applying a first bloom filter to the first hash value; d) in response to the first bloom filter returning a possibility of the first hash value in a first set of data, performing e) using a second hash function on the 4096 byte cluster to produce a second hash value; f) applying a second bloom filter to the second hash value; g) in response to the second bloom filter returning a possibility of the second hash value in a second set of data, transmitting the second hash value and the position to a second computer; h) determining whether there are more 4096 byte clusters to be examined; and
.i) in response to more 4096 byte clusters to be examined, returning to step a.
-
-
17. A non-transitory computer program product tangibly embodying computer readable instructions which, when implemented, cause a computer to carry out the steps of a method for identifying portions of data on a storage media, comprising:
-
accessing a storage media commutatively coupled to a first computer; determining if the storage media has a block size with a multiple of 4096 bytes; in response to the storage media having a block size that is a multiple of 4096 bytes performing; a) retrieving a next 4096 byte cluster and position; b) using a first hash function on the 4096 byte cluster to produce a first hash value; c) applying a first bloom filter to the first hash value; d) in response to the first bloom filter returning a possibility of the first hash value in a first set of data, performing e) using a second hash function on the 4096 byte cluster to produce a second hash value; f) applying a second bloom filter to the second hash value; g) in response to the second bloom filter returning a possibility of the second hash value in a second set of data, transmitting the second hash value and the position to a second computer; h) determining whether there are more 4096 byte cluster to be examined; and
.i) in response to more 4096 byte cluster to be examined, returning to step a. - View Dependent Claims (18, 19, 20)
-
Specification