Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising:
- grouping, into a plurality of groups, selected ones of the plurality of computers, wherein the grouping is based at least in part on the number of the plurality of computers in the network that the computer is aware of;
selecting a portion of object information corresponding to an object; and
identifying which of the selected ones of the plurality of computers to communicate the object information to for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing the selected portion of the object information to a portion of a computer identifier of one or more of the selected ones of the plurality of computers.
1 Assignment
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
127 Citations
27 Claims
-
1. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising:
-
grouping, into a plurality of groups, selected ones of the plurality of computers, wherein the grouping is based at least in part on the number of the plurality of computers in the network that the computer is aware of;
selecting a portion of object information corresponding to an object; and
identifying which of the selected ones of the plurality of computers to communicate the object information to for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing the selected portion of the object information to a portion of a computer identifier of one or more of the selected ones of the plurality of computers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. One or more computer-readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is one of a plurality of computers in a network, causes the one or more processors to perform the following acts:
-
selecting ones of the plurality of computers to populate a plurality of groups, wherein the selecting is based at least in part on the number of computers in the network that the computer is aware of;
selecting a plurality of bits of file information corresponding to a file; and
identifying which of the selected ones of the plurality of computers to communicate the file information to for identification of potentially identical files on the plurality of computers, wherein the identifying is based at least in part on comparing the selected plurality of bits of the file information to a corresponding plurality of bits of a computer identifier of one or more of the selected ones of the plurality of computers. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
Specification