Locating potentially identical objects across multiple computers
First Claim
Patent Images
1. A method, implemented in a computer, the method comprising:
- generating file information for a file at the computer, wherein the file information is different from the file and based at least in part on data in the file, and wherein the generating comprises generating file information only for files stored at the computer that satisfy certain criteria;
transferring the file information to a database server computer, wherein the file information is to be compared to file information from one or more other computers to determine whether the file is potentially identical to another file on one of the one or more other computers, wherein said transferring does not transfer the file associated with said file information;
associating, with the generated file information, a time to live component that identifies to how many additional database server computers the file information can be communicated, wherein the time to live component limits the number of additional database server computers to which the file information can be communicated; and
causing removal, from at least one computer, of one or more files determined to be identical to another file and causing a pointer to said another file to be set up in place of removed files.
2 Assignments
0 Petitions
Accused Products
Abstract
Potentially identical objects (such as files) across multiple computers are located. In one embodiment, a computer generates object information for an object stored on the computer. The object information can be generated in a variety of manners (e.g., based on hashing the object, based on characteristics of the object, and so forth). The object information is then transferred to one or more database server computers, where the object information can be compared to object information from other computers to determine whether the object is potentially identical to another object on one of the other computers.
-
Citations
33 Claims
-
1. A method, implemented in a computer, the method comprising:
-
generating file information for a file at the computer, wherein the file information is different from the file and based at least in part on data in the file, and wherein the generating comprises generating file information only for files stored at the computer that satisfy certain criteria; transferring the file information to a database server computer, wherein the file information is to be compared to file information from one or more other computers to determine whether the file is potentially identical to another file on one of the one or more other computers, wherein said transferring does not transfer the file associated with said file information; associating, with the generated file information, a time to live component that identifies to how many additional database server computers the file information can be communicated, wherein the time to live component limits the number of additional database server computers to which the file information can be communicated; and causing removal, from at least one computer, of one or more files determined to be identical to another file and causing a pointer to said another file to be set up in place of removed files. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method, implemented in a database server computer, comprising:
-
receiving file information from a computer only for files stored at the computer that satisfy certain criteria, wherein said file information is different from the files and based on a hash value which is based, at least in part, on convergent encryption, wherein the file information does not comprise the files themselves; comparing the file information to other file information, received from one or more other computers, corresponding to other files; determining that a file corresponding to the received file information is potentially identical to one of the other files if the received file information is the same as any of the other file information; and communicating the received file information to one or more other database server computers only if a time to live component associated with the received information indicates the file information can be communicated to at least one more database server computer, wherein the time to live component identifies to how many additional database server computers the received file information can be communicated, and wherein the time to live component limits the number of additional database server computers to which the received file information can be communicated. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. A system comprising:
-
a storage device to store a plurality of files; a file information generation module to generate file information for one or more of the plurality of files, wherein said file information is different from the one or more of the plurality of files and based on a hash value which is based, at least in part, on convergent encryption, and wherein the file information does not comprise the files themselves; and a forwarding location determination module, coupled to the file information generation module, to determine one or more other systems to which all of the generated file information is to be communicated, wherein the forwarding location determination module is further to associate a time to live component with each generated file information, the time to live component identifying to how many other systems the generated file information can be communicated, and wherein the time to live component limits the number of other systems to which the generated file information can be communicated. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
-
31. A method, implemented in a computer, the method comprising:
-
generating, for each of a plurality of files stored at the computer, file information, wherein the file information is different than the plurality of files and is a semi-unique value based at least in part on the data in the file, wherein the file information does not comprise the files themselves; receiving, from another computer, a plurality of file information corresponding to a plurality of files stored at the other computer; comparing the received file information to the generated file information; determining that a file on the computer is potentially identical to a file on the other computer if any of the received file information match any of the generated file information; transferring the received file information to a third computer, wherein the received file information is to be compared, at the third computer, to file information corresponding to files stored at the third computer; maintaining a time to live component corresponding to one or more of the received file information, wherein the time to live component limits the number of computers to which the received file information can be communicated; wherein the transferring the received file information to the third computer comprises transferring the one or more of the received file information to the third computer only if the time to live component exceeds a threshold amount; and causing removal, from at least one computer, of one or more files determined to be identical to another file and causing a pointer to said another file to be set up in place of removed files. - View Dependent Claims (32, 33)
-
Specification