Efficient file hash identifier computation
First Claim
1. In a computing environment, a system comprising:
- a hash data store, in which each file of a plurality of files has an entry in the hash data store including at least one hash value and state data representative of a state of the file'"'"'s contents at a hashing time corresponding to computing the at least one hash value for that file; and
a hash return mechanism coupled to the hash data store to process requests for hash values for files, including by accessing the hash data store upon receiving a request for a hash value for a file to locate an entry for that file in the hash data store, by evaluating the state data associated with that file against current state data of the file to determine from the evaluation whether the file contents are unchanged since the hashing time, and if unchanged, by returning a hash value from the hash data store for that file in response to the request.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is maintaining cached hash values for files in association with state data for each file that represents the state of that file'"'"'s contents at the time of hashing. For example, in a journaling file system, the state data may comprise the update sequence number of the file in the journal and a journal identifier for that journal instance. A request for a hash value for a file is processed by determining whether a cached hash value is maintained for that file. If so, and the associated maintained state data matches current state data for the file, the file contents are unchanged since the last hash computation, whereby the cached hash value is returned in response to the request. Otherwise, a new hash value is computed for the file and returned, and cached for future use. Multiple types of hashes may be cached for a given file.
59 Citations
20 Claims
-
1. In a computing environment, a system comprising:
-
a hash data store, in which each file of a plurality of files has an entry in the hash data store including at least one hash value and state data representative of a state of the file'"'"'s contents at a hashing time corresponding to computing the at least one hash value for that file; and a hash return mechanism coupled to the hash data store to process requests for hash values for files, including by accessing the hash data store upon receiving a request for a hash value for a file to locate an entry for that file in the hash data store, by evaluating the state data associated with that file against current state data of the file to determine from the evaluation whether the file contents are unchanged since the hashing time, and if unchanged, by returning a hash value from the hash data store for that file in response to the request. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable medium having computer-executable instructions, which when executed perform steps, comprising:
-
a) maintaining cached hash values for files in association with maintained state data for each file representative of a state of that file'"'"'s contents at a time of computing the hash value; b) processing a request for a hash value for a file by determining whether a cached hash value is maintained for that file, and if not maintained for that file, advancing to step d); c) determining from the maintained state data for that file and information associated with a current state of the file whether the file contents are unchanged since the time of the hash computation, and if the information associated with the current state of the file and the maintained state data indicate the file contents are unchanged since the time of the hash computation, selecting as a selected hash value the cached hash value for that file and advancing to step e); d) computing a hash value for the file and selecting as a selected hash value the computed hash value; and e) returning the selected hash value in response to the request. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. In a computing environment in which a file system maintains a journal of changes to files, each change associated with an update sequence number, a method comprising:
-
a) maintaining a record for a file, the record including a hash set of at least one computed hash value for the file'"'"'s contents and a set of file-related data at a time that the hash set was computed for the file, the file related-data including the update sequence number of the file and a journal identifier that identifies an instance of the journal; b) processing a request for a hash value for that file by comparing the update sequence number and the journal identifier maintained with the hash set against an actual update sequence number associated with the file and a current journal identifier, respectively, and if the comparison indicates the update sequence numbers and journal identifiers are equivalent, returning a hash value from the hash set in response to the request. - View Dependent Claims (18, 19, 20)
-
Specification