Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. A method implemented on a processor coupled to a memory, comprising:
- determining a size for an imprint identification code of an object stored at a computer, wherein the size of the imprint identification code is based at least in part on a count of other computers in a network coupled to the computer, wherein each of the other computers in the network have communicated to the said computer information indicating that each other computer is actively available in the network;
identifying a particular imprint identification code based on the size for each object stored at the computer, wherein the imprint identification code is a calculated set of bits of object information corresponding to the object and derived from the object;
in an event that the object information is based on the data in the object, the object information is a semi-unique value based at least in part on the data in the object;
in an event that the object information is based on characteristics of the object, the object information is a value based at least in part on one or more characteristics of the object;
accessing the imprint identification code to compute a mapping that maps the imprint identification code to one or more computer identifiers associated with one or more computers that are active on the network, wherein the one or more computer identifiers have the same set of bits as the imprint identification code;
identifying one or more computers to receive the object information based at least in part on the mapping; and
sending the object information to each of the identified one or more computers.
1 Assignment
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
-
Citations
8 Claims
-
1. A method implemented on a processor coupled to a memory, comprising:
-
determining a size for an imprint identification code of an object stored at a computer, wherein the size of the imprint identification code is based at least in part on a count of other computers in a network coupled to the computer, wherein each of the other computers in the network have communicated to the said computer information indicating that each other computer is actively available in the network; identifying a particular imprint identification code based on the size for each object stored at the computer, wherein the imprint identification code is a calculated set of bits of object information corresponding to the object and derived from the object; in an event that the object information is based on the data in the object, the object information is a semi-unique value based at least in part on the data in the object; in an event that the object information is based on characteristics of the object, the object information is a value based at least in part on one or more characteristics of the object; accessing the imprint identification code to compute a mapping that maps the imprint identification code to one or more computer identifiers associated with one or more computers that are active on the network, wherein the one or more computer identifiers have the same set of bits as the imprint identification code; identifying one or more computers to receive the object information based at least in part on the mapping; and sending the object information to each of the identified one or more computers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification