Information source agent systems and methods for distributed data storage and management using content signatures
First Claim
1. A method comprising:
- generating a file content index for a file received at a computing device from an information source client;
comparing the file content index to stored file content indices to determine a similarity between the file content index and the stored file content indices and to determine whether the similarity exceeds a similarity threshold;
comparing, if the similarity exceeds the similarity threshold, differences between the received file and files that have the file content indices that exceed the similarity threshold;
determining, from among the files that have the file content indices that exceed the similarity threshold, a closest match file to the received file; and
creating a delta file of the differences between the received file and the closest match file.
3 Assignments
0 Petitions
Accused Products
Abstract
Information source agent systems and methods for distributed content storage and management using content signatures that use file identicality properties are provided. A data management system is provided that includes a content engine for managing the storage of file content, a content signature generator that generates a unique content signature for a file processed by the content engine, a content signature comparator that compares content signatures and a content signature repository that stores content signatures. Information source agents are provided that include content signature generators and content signature comparators. Methods are provided for the efficient management of files using content signatures that take advantage of file identicality properties. Content signature application modules and registries exist within information source clients and centralized servers to support the content signature methods.
-
Citations
18 Claims
-
1. A method comprising:
-
generating a file content index for a file received at a computing device from an information source client; comparing the file content index to stored file content indices to determine a similarity between the file content index and the stored file content indices and to determine whether the similarity exceeds a similarity threshold; comparing, if the similarity exceeds the similarity threshold, differences between the received file and files that have the file content indices that exceed the similarity threshold; determining, from among the files that have the file content indices that exceed the similarity threshold, a closest match file to the received file; and creating a delta file of the differences between the received file and the closest match file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An non-transitory computer readable medium having stored thereon in digital form computer-executable instructions that, in response to execution by a computing device, cause the computing device to perform operations for storing file information in an indexed archive system, the operations comprising:
-
generating a file content index for a file received at the computing device from an information source client; comparing the file content index to stored file content indices to determine a similarity between the file content index and the stored file content indices and to determine whether the similarity exceeds a similarity threshold; comparing, if the similarity exceeds the similarity threshold, differences between the received file and files that have the file content indices that exceed the similarity threshold; determining, from among the files that have the file content indices that exceed the similarity threshold, a closest match file to the received file; and creating a delta file of the differences between the received file and the closest match file. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification