Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. A method, implemented in a computer that is part of a plurality of computers in a network, comprising:
- selecting a portion of file information corresponding to a file stored on one of the plurality of computers;
comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer;
identifying which of the computer identifiers have portions matching the selected portion of the file information;
communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information; and
wherein a value W represents the size of the portion of the file information, wherein a value M represents a count of computers that the one computer is aware of in the network, wherein a value R is a system configuration value calculated based on an average number of computers that a particular file identifier should be communicated to, wherein 1 g is a base 2 logarithm function, wherein floor brackets indicate the largest integer that is no greater than the enclosed value, and wherein the value W is determined as follows;
1 Assignment
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
-
Citations
20 Claims
-
1. A method, implemented in a computer that is part of a plurality of computers in a network, comprising:
-
selecting a portion of file information corresponding to a file stored on one of the plurality of computers; comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer; identifying which of the computer identifiers have portions matching the selected portion of the file information; communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information; and wherein a value W represents the size of the portion of the file information, wherein a value M represents a count of computers that the one computer is aware of in the network, wherein a value R is a system configuration value calculated based on an average number of computers that a particular file identifier should be communicated to, wherein 1 g is a base 2 logarithm function, wherein floor brackets indicate the largest integer that is no greater than the enclosed value, and wherein the value W is determined as follows; - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer, which is part of a plurality of computers in a network, comprising:
-
means for selecting a portion of file information corresponding to a file stored on one of the plurality of computers; means for comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer; means for identifying which of the computer identifiers have portions matching the selected portion of the file information; means for communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information; and wherein a value W represents the size of the portion of the file information, wherein a value M represents a count of computers that the one computer is aware of in the network, wherein a value R is a system configuration value calculated based on an average number of computers that a particular file identifier should be communicated to, wherein 1 g is a base 2 logarithm function, wherein floor brackets indicate the largest integer that is no greater than the enclosed value, and wherein the value W is determined as follows; - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification