Locating potentially identical objects across multiple computers based on stochastic partitioning of workload
First Claim
1. A method comprising:
- selecting, for each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object;
using a stochastic partitioning process to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
2 Assignments
0 Petitions
Accused Products
Abstract
Potentially identical objects (e.g., files) are located across multiple computers based on stochastic partitioning of workload. For each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object is selected. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). Any of a variety of portions of the object information can be used (e.g., the least significant bits of the object information). A stochastic partitioning process is then used to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers.
131 Citations
97 Claims
-
1. A method comprising:
-
selecting, for each of a plurality of objects stored on a plurality of computers in a network, a portion of object information corresponding to the object;
using a stochastic partitioning process to identify which of the plurality of computers to communicate the object information to for identification of potentially identical objects on the plurality of computers. - View Dependent Claims (2, 3, 4, 5)
-
-
6. One or more computer-readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is part of a plurality of computers in a network, causes the one or more processors to perform the following acts:
-
selecting a portion of file information corresponding to a file stored on one of the plurality of computers;
comparing, for each of the plurality of computers, the selected portion to a portion of a computer identifier associated with the computer;
identifying which of the computer identifiers have portions matching the selected portion of the file information; and
communicating, for identification of potentially identical files stored on the plurality of computers, the file information to each of the computers associated with a computer identifier having a portion matching the selected portion of the file information. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method comprising:
-
generating an imprint for an object stored at a computer, wherein the imprint comprises a first set of bits of object information corresponding to the object;
identifying one or more additional computers each having a computer identifier that includes a second set of bits that match the imprint; and
sending the object information to each of the identified one or more additional computers. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A system comprising:
-
an interface configured to allow the system to communicate with a plurality of other computers; and
a forwarding location determination module, coupled to the interface, configured to identify one or more of the plurality of computers to communicate file information corresponding to a file to, for identification of potentially identical files in the system, by, identifying a set of bits of file information associated with a file, and identifying ones of the one or more computers that each have a computer identifier that has a set of bits that match the set of bits of the file information. - View Dependent Claims (34, 35, 36)
-
-
37. A method comprising:
-
identifying an imprint for an object stored at a computer, wherein the imprint is a set of bits of object information corresponding to the object;
accessing an imprint to computer mapping;
identifying one or more computers to receive the object information based at least in part on the accessed mapping; and
sending the object information to at least one of the identified one or more computers. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
-
50. One or more computer-readable media having stored thereon a plurality of instructions that, when executed by one or more processors of one of a plurality of computers in a network, causes the one or more processors to perform the following acts:
-
selecting a portion of file information corresponding to a file;
identifying a mapping of the portion to one or more computers; and
communicating the file information to each of the identified one or more computers for identification of potentially identical files on the one or more computers. - View Dependent Claims (51, 52, 53, 54, 55, 56)
-
-
57. A system comprising:
-
an interface configured to allow the system to communicate with a plurality of other computers; and
a forwarding location determination module, coupled to the interface, configured to identify one or more of the plurality of other computers to communicate file information corresponding to a file to for identification of potentially identical files stored on the plurality of other computers by accessing a mapping of a portion of the file information to one or more computers. - View Dependent Claims (58, 59, 60, 61)
-
-
62. A method comprising:
-
receiving file information corresponding to a file stored at a computer;
comparing the received file information to a file information database;
checking whether the received file information matches any of the file information in the database;
determining that two potentially identical files exist if the received file information matches any of the file information in the database; and
forwarding the received file information to another computer for storage in a file information database at the other computer. - View Dependent Claims (63)
-
-
64. One or more computer-readable media having stored thereon a plurality of instructions that, when executed by one or more processors, causes the one or more processors to perform the following acts:
-
receiving, from a requesting computer, a request for an imprint to computer mapping, wherein the imprint comprises a portion of object information corresponding to an object;
accessing a mapping database to identify one or more computers associated with the imprint; and
returning an identification of at least one of the one or more computers to the requesting computer. - View Dependent Claims (65, 66, 67, 68, 69)
-
-
70. A method, implemented in a computer that is one of a plurality of computers in a network, the method comprising:
-
grouping, into a plurality of groups, selected ones of the plurality of computers, wherein the grouping is based at least in part on the number of the plurality of computers in the network that the computer is aware of;
selecting a portion of object information corresponding to an object; and
identifying which of the selected ones of the plurality of computers to communicate the object information to for identification of potentially identical objects stored on the plurality of computers, wherein the identifying is based at least in part on comparing the selected portion of the object information to a portion of a computer identifier of one or more of the selected ones of the plurality of computers. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89)
-
-
90. One or more computer-readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a computer that is one of a plurality of computers in a network, causes the one or more processors to perform the following acts:
-
selecting ones of the plurality of computers to populate a plurality of groups, wherein the selecting is based at least in part on the number of computers in the network that the computer is aware of;
selecting a plurality of bits of file information corresponding to a file; and
identifying which of the selected ones of the plurality of computers to communicate the file information to for identification of potentially identical files on the plurality of computers, wherein the identifying is based at least in part on comparing the selected plurality of bits of the file information to a corresponding plurality of bits of a computer identifier of one or more of the selected ones of the plurality of computers. - View Dependent Claims (91, 92, 93, 94, 95, 96)
-
-
97. A system, coupled to a plurality of computers, the system comprising:
-
an interface configured to allow the system to communicate with the plurality of computers; and
a forwarding location determination module, coupled to the interface, configured to identify one or more of the plurality of computers to communicate the file information for a file to, for identification of potentially identical files stored on the plurality of computers by, grouping, into a plurality of groups, selected ones of the plurality of computers, wherein the grouping is based at least in part on the number of the plurality of computers in the network that the system is aware of, selecting a portion of the file information, and comparing the selected portion of the file information to a portion of a computer identifier of one or more of the selected ones of the plurality of computers.
-
Specification