Efficient file hash identifier computation
First Claim
1. A method, implemented at a computer system that includes a processing unit, for determining whether to compute a new hash value for a file, comprising:
- identifying a first state of a file;
computing a first hash value corresponding to the first state of the file;
recording the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal;
receiving a request for a second hash value of the file corresponding to a current second state of the file;
based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determining whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier;
based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, providing the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and
based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, computing the second hash value for the file.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is maintaining cached hash values for files in association with state data for each file that represents the state of that file'"'"'s contents at the time of hashing. For example, in a journaling file system, the state data may comprise the update sequence number of the file in the journal and a journal identifier for that journal instance. A request for a hash value for a file is processed by determining whether a cached hash value is maintained for that file. If so, and the associated maintained state data matches current state data for the file, the file contents are unchanged since the last hash computation, whereby the cached hash value is returned in response to the request. Otherwise, a new hash value is computed for the file and returned, and cached for future use. Multiple types of hashes may be cached for a given file.
21 Citations
20 Claims
-
1. A method, implemented at a computer system that includes a processing unit, for determining whether to compute a new hash value for a file, comprising:
-
identifying a first state of a file; computing a first hash value corresponding to the first state of the file; recording the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal; receiving a request for a second hash value of the file corresponding to a current second state of the file; based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determining whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier; based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, providing the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, computing the second hash value for the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 20)
-
-
10. A computer-readable hardware storage device comprising computer-executable instructions that are executable by one or more processors of a computer system to configure the computer system to determine whether to compute a new hash value for a file, the computer-executable instructions including instructions that are executable to configure the computer system to perform at least the following:
-
identify a first state of a file; compute a first hash value corresponding to the first state of the file; record the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal; receive a request for a second hash value of the file corresponding to a current second state of the file; based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determine whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data of the file corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier; based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, provide the first hash value corresponding to the first state of the file in response to the request instead of computing the second hash value; and based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, compute the second hash value for the file.
-
-
11. A computer system for determining whether to compute a new hash value for a file, comprising:
-
one or more processing units; and at least one computer readable storage devices having stored thereon computer-executable instructions that are executable by the one or more processing units to cause the computer system to determine whether to compute a new hash value for a file, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following; identify a first state of a file; compute a first hash value corresponding to the first state of the file; record the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal; receive a request for a second hash value of the file corresponding to a current second state of the file; based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determine whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier; based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, provide the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, compute the second hash value for the file. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification