Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising:
- grouping the plurality of computers into a plurality of groups, wherein the grouping is based at least in part on the number of the plurality of computers in the network of which the computer is aware;
selecting a portion of object information corresponding to an object, wherein the portion of object information comprises a set of least significant bits of the object information and a number of bits in the set of least significant bits is based at least in part on the number of other computers in the network of which the computer is aware; and
identifying to which of the plurality of computers to communicate the object information for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing at a bit level the selected portion of the object information to a portion of a computer identifier of one or more of the plurality of computers such that the bits in the portion of the computer identifier are compared to the bits in the selected portion of the object information.
1 Assignment
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
179 Citations
25 Claims
-
1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising:
-
grouping the plurality of computers into a plurality of groups, wherein the grouping is based at least in part on the number of the plurality of computers in the network of which the computer is aware; selecting a portion of object information corresponding to an object, wherein the portion of object information comprises a set of least significant bits of the object information and a number of bits in the set of least significant bits is based at least in part on the number of other computers in the network of which the computer is aware; and identifying to which of the plurality of computers to communicate the object information for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing at a bit level the selected portion of the object information to a portion of a computer identifier of one or more of the plurality of computers such that the bits in the portion of the computer identifier are compared to the bits in the selected portion of the object information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. One or more computer storage media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is one of a plurality of computers in a network, causes the one or more processors to perform the following acts:
-
selecting ones of the plurality of computers to populate a plurality of groups, wherein the selecting is based at least in part on the number of computers in the network that the computer is aware of; selecting a plurality of bits of file information corresponding to a file, wherein the plurality of bits of file information comprises a set of least significant bits of the file information and a number of bits in the set of least significant bits is based at least in part on the number of other computers in the network of which the computer is aware; and identifying which of the selected ones of the plurality of computers to communicate the file information to for identification of potentially identical files on the plurality of computers, wherein the identifying is based at least in part on comparing the selected plurality of bits of the file information to a corresponding plurality of bits of a computer identifier of one or more of the selected ones of the plurality of computers. - View Dependent Claims (19, 20, 21, 22, 23, 24)
-
-
25. A computing device that facilitates locating potentially identical objects across a plurality of computing devices, wherein the computing device is one of the plurality of computing devices, the computing device comprising:
-
a processing unit; a memory coupled to the processing unit; a distributed file system interface connecting the computing device to a network, wherein the network comprises the plurality of computing devices; a grouping module stored in the memory and executed on the processing unit to determine a plurality of groups of the plurality of computing devices, wherein criteria for determining the plurality of groups comprises; determining a number of computing devices of which the computing device is aware; and determining a number of the plurality of groups of the plurality of computing devices as a function of the number of computing devices of which the computing device is aware; a file information generation module stored in the memory and executed on the processing unit to generate object information for use in locating potentially identical objects across the plurality of computing devices; a forwarding location determination module stored in the memory and executed on the processing unit to perform a bit level comparison of the object information and a computer identifier representing a particular one of the plurality of computing devices; and in an event that the object information matches the computer identifier; identifying a particular one of the plurality of groups to which the particular one of the plurality of computing devices belongs; facilitating each of the plurality of computing devices that belongs to the particular one of the plurality of groups to determine potentially identical objects.
-
Specification