Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. One or more computer-readable storage media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is part of a plurality of computers in a network, cause the one or more processors to perform a method, the method comprising:
- selecting a portion of file information corresponding to a file stored on one of the plurality of computers, wherein the selected portion of the file information comprises a set of least significant bits of the file information, and wherein a size of the portion of the file information is based at least in part on a count of computers that the one computer is aware of in the network;
comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer;
identifying which of the computer identifiers have portions matching the selected portion of the file information; and
communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information.
1 Assignment
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
187 Citations
18 Claims
-
1. One or more computer-readable storage media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is part of a plurality of computers in a network, cause the one or more processors to perform a method, the method comprising:
-
selecting a portion of file information corresponding to a file stored on one of the plurality of computers, wherein the selected portion of the file information comprises a set of least significant bits of the file information, and wherein a size of the portion of the file information is based at least in part on a count of computers that the one computer is aware of in the network; comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer; identifying which of the computer identifiers have portions matching the selected portion of the file information; and communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for identification of potentially identical files in the system, the system comprising:
-
a memory; a processor; an interface electronically coupled to the processor and configured to allow the system to communicate with a plurality of other computers; and a forwarding location determination module, coupled to the interface, configured to perform a method, the method comprising; identifying one or more of the plurality of computers to which to communicate file information corresponding to a file; identifying a set of bits of file information associated with the file, wherein the set of bits of file information comprises a set of least significant bits of the file information, and wherein a size of the portion of the file information is based at least in part on a count of computers that the one computer is aware of in the network; and identifying ones of the one or more computers that each have a computer identifier that has a set of bits that match the set of bits of the file information. - View Dependent Claims (16, 17, 18)
-
Specification