Data management systems and methods for distributed data storage and management using content signatures
First Claim
1. A data management system, comprising:
- a content engine for managing the storage of file content;
a content signature generator that generates a content signature for a file processed by a content engine for managing the storage of file content; and
a content signature comparator that compares file content signatures to determine how to process the files with the associated file content signatures.
3 Assignments
0 Petitions
Accused Products
Abstract
Data management systems and methods for distributed content storage and management using content signatures that use file identicality properties are provided. A data management system is provided that includes a content engine for managing the storage of file content, a content signature generator that generates a unique content signature for a file processed by the content engine, a content signature comparator that compares content signatures and a content signature repository that stores content signatures. Methods are provided for the efficient management of files using content signatures that take advantage of file identicality properties. Content signature application modules and registries exist within information source clients and centralized servers to support the content signature methods.
141 Citations
20 Claims
-
1. A data management system, comprising:
-
a content engine for managing the storage of file content;
a content signature generator that generates a content signature for a file processed by a content engine for managing the storage of file content; and
a content signature comparator that compares file content signatures to determine how to process the files with the associated file content signatures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method to store a file within a file management system having a set of existing file contents and file content signatures, comprising:
-
receiving a file;
generating a content signature for the received file;
generating or receiving metadata for the received file;
comparing the content signature for the received file to the file content signatures that exist within the data management system;
storing the file content, file content signature and metadata for the received file when the file content signature for the received file does not match any of the file content signatures already stored within the data management system;
associating the metadata for the received file with the stored file content signature that matches the content signature for the received file and storing the metadata for the received file when the file content signature for the received file does match a file content signature already stored within the data management system. - View Dependent Claims (11)
-
-
12. A method to store a multi-segmented file, comprising:
-
receiving the multi-segmented file having multiple files;
generate a content signature for each file within the multi-segmented file;
compare generated content signatures to existing content signatures within a data management system;
determine whether the content signatures for the files within the multi-segmented file match existing content signatures; and
when all content signatures for the files within the multi-segmented files match existing content signatures, generate metadata for the multi-segment file and store the metadata. - View Dependent Claims (13)
-
-
14. A method for managing copyrighted files on a data management system, comprising:
-
receiving a file;
generating a content signature for the received file;
comparing the content signature for the received file to content signatures stored within a copyright file content signature registry;
determining whether the content signature for the received file matches a content signature store within the copyright file content signature registry; and
when a content signature match is determined, increment an instance count for the content signature within the copyright file content signature registry. - View Dependent Claims (15)
-
-
16. A method for controlling confidential files, comprising:
-
establishing a registry of content signatures for confidential documents;
enrolling participants in the registry;
transmitting content signatures for files held by participants to the registry;
compare content signatures from participants to content signatures of confidential documents within the registry; and
when a content signature from a participant matches a content signature of a confidential document, taking a control action.
-
-
17. A method to generate search results, comprising:
-
receiving a search request terms;
conducting a search to identify files that include the search request terms or their equivalents;
generating content signatures for the files identified by the search;
determining statistics for the files represented by the content signatures, wherein statistics include copies of files found, number of recent deletions of files and/or number of recent additions of files; and
prioritizing search results based on the statistics.
-
-
18. A method to perform computer forensics, comprising:
-
generating or receiving a content signature for a file under investigation;
identifying information source clients that are associated with the content signature of the file under investigation;
identifying information source clients that formerly contained the content signature of the file under investigation; and
generating a file investigation report that identifies the information source clients that are associated with the content signature of the file under investigation and/or identifies information source clients that formerly were associated with the content signature of the file under investigation.
-
-
19. A method for fetching links associated with a requested web page, comprising:
-
requesting a web page;
receiving a set of content signatures associated with the web page;
comparing the web page content signatures to existing content signatures stored on an information source client; and
fetching links for content signatures that do not exist within the existing content signatures.
-
-
20. A method for identifying when identical files are independently created within a network, comprising:
-
receiving a file;
generate a content signature for the received file;
compare content signature for the received file to content signatures for existing files;
determine whether the content signature for the received file matches an existing content signature;
when a content signature match exists, examining metadata for the received file to determine if the received file was independently developed; and
when a determination is made that the received file was not independently developed, taking a control action.
-
Specification