Efficient file hash identifier computation

US 9,424,266 B2
Filed: 10/01/2007
Issued: 08/23/2016
Est. Priority Date: 10/01/2007
Status: Active Grant

First Claim

Patent Images

1. A method, implemented at a computer system that includes a processing unit, for determining whether to compute a new hash value for a file, comprising:

identifying a first state of a file;

computing a first hash value corresponding to the first state of the file;

recording the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal;

receiving a request for a second hash value of the file corresponding to a current second state of the file;

based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determining whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier;

based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, providing the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and

based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, computing the second hash value for the file.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is maintaining cached hash values for files in association with state data for each file that represents the state of that file'"'"'s contents at the time of hashing. For example, in a journaling file system, the state data may comprise the update sequence number of the file in the journal and a journal identifier for that journal instance. A request for a hash value for a file is processed by determining whether a cached hash value is maintained for that file. If so, and the associated maintained state data matches current state data for the file, the file contents are unchanged since the last hash computation, whereby the cached hash value is returned in response to the request. Otherwise, a new hash value is computed for the file and returned, and cached for future use. Multiple types of hashes may be cached for a given file.

21 Citations

View as Search Results

20 Claims

1. A method, implemented at a computer system that includes a processing unit, for determining whether to compute a new hash value for a file, comprising:
- identifying a first state of a file;
  
  computing a first hash value corresponding to the first state of the file;
  
  recording the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal;
  
  receiving a request for a second hash value of the file corresponding to a current second state of the file;
  
  based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determining whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier;
  
  based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, providing the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and
  
  based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, computing the second hash value for the file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 20)
- - 2. The method of claim 1, at least one of the first state data of the file or the second state data of the file comprising the update sequence number.
  - 3. The method of claim 1, at least one of the first state data of the file or the second state data of the file comprising the journal identifier.
  - 4. The method of claim 1, at least one of the first state data of the file or the second state data of the file comprising the file size information.
  - 5. The method of claim 1, the comparing first state data of the file to second state data of the file comprising at least one of:
    - comparing the update sequence number of the first state data to an update sequence number of the second state data, andcomparing the journal identifier of the first state data to a journal identifier of the second state data.
  - 6. The method of claim 1, the comparing first state data of the file to second state data of the file comprising:
    - comparing the update sequence number of the first state data to an update sequence number of the second state data, andcomparing the journal identifier of the first state data to a journal identifier of the second state data.
  - 7. The method of claim 1, the first state data equivalent to the second state data when the first state data matches the second state data.
  - 8. The method of claim 1, the first state data equivalent to the second state data when at least one of:
    - the journal identifier of the first state data matches a journal identifier of the second state data, anda file size corresponding to the first state of the file is equal to a file size of the file at a time the request is received.
  - 9. The method of claim 1, the first state data equivalent to the second state data when a difference between the update sequence identifier of the first state data and an update sequence identifier of the second state data is less than a specified threshold.
  - 20. The method of claim 1, wherein the first state data comprises both the update sequence number and the journal identifier.

10. A computer-readable hardware storage device comprising computer-executable instructions that are executable by one or more processors of a computer system to configure the computer system to determine whether to compute a new hash value for a file, the computer-executable instructions including instructions that are executable to configure the computer system to perform at least the following:
- identify a first state of a file;
  
  compute a first hash value corresponding to the first state of the file;
  
  record the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal;
  
  receive a request for a second hash value of the file corresponding to a current second state of the file;
  
  based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determine whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data of the file corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier;
  
  based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, provide the first hash value corresponding to the first state of the file in response to the request instead of computing the second hash value; and
  
  based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, compute the second hash value for the file.

11. A computer system for determining whether to compute a new hash value for a file, comprising:
- one or more processing units; and
  
  at least one computer readable storage devices having stored thereon computer-executable instructions that are executable by the one or more processing units to cause the computer system to determine whether to compute a new hash value for a file, the computer-executable instructions including instructions that are executable to cause the computer system to perform at least the following;
  
  identify a first state of a file;
  
  compute a first hash value corresponding to the first state of the file;
  
  record the first hash value corresponding to the first state of the file and first state data corresponding to the first state of the file, the first state data comprising at least one of an update sequence number and a journal identifier that both correspond to the first state of the file, the update sequence number comprising an incrementable number that is changed every time the file is updated and the journal identifier comprising an identifier of a current instance of a journal;
  
  receive a request for a second hash value of the file corresponding to a current second state of the file;
  
  based at least on receiving the request, and prior to computing the second hash value corresponding to the second state of the file, determine whether to provide the first hash value corresponding to the first state of the file or to compute the second hash value corresponding to the second state of the file based on a determination of whether or not contents of the file have changed since the first hash value corresponding to the first state of the file was computed, the determination of whether or not contents of the file have changed comprising comparing the recorded first state data corresponding to the first state of the file to current second state data of the file that indicates a second state of the file at the time of the request, to determine whether the contents of the file are the same between the first state and the second state, the contents of the file being the same when the first state data is equivalent to the second state data, at least one of the first state data of the file or the second state data of the file comprising at least one of an update sequence number and a journal identifier;
  
  based on determining that the first state data of the file is equivalent to the second state data of the file based upon the comparison, provide the first hash value corresponding to the first state of the file in response to the request, instead of computing the second hash value; and
  
  based on determining that the first state data of the file is not equivalent to the second state data of the file based upon the comparison, compute the second hash value for the file.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The computer system of claim 11, the first state data comprising at least one of:
    - the update sequence number corresponding to the first state of the file;
      
      the journal identifier corresponding to the first state of the file; and
      
      file size information describing a size of the file corresponding to the first state of the file.
  - 13. The computer system of claim 11, the comparing first state data of the file to second state data of the file comprising at least one of:
    - comparing the update sequence number of the recorded state data to an update sequence number of the current state data, andcomparing the journal identifier of the recorded state data to a journal identifier of the current state data.
  - 14. The computer system of claim 11, the first state data equivalent to the second state data when the first state data matches the second state data.
  - 15. The computer system of claim 11, the first state data equivalent to the second state data when at least one of:
    - the journal identifier of the first state data matches a journal identifier of the second state data; and
      
      a file size of the file corresponding to the first state of the file is equal to a file size of the file at the time the request is received.
  - 16. The computer system of claim 11, the first state data equivalent to the second state data when a difference between the update sequence identifier of the first state data and an update sequence identifier of the second state data is less than a specified threshold.
  - 17. The computer system of claim 11, the first state data equivalent to the second state data when one or more differences between the first state data and the second state data indicate that content of the file has not been modified between when the first state data was created and when the second state data was created.
  - 18. The computer system of claim 11, the actions comprising:
    - accessing the first state data and the first hash value corresponding to the first state of the file via at least one of a file identifier or a volume identifier.
  - 19. The computer system of claim 11, the comparing first state data of the file to second state data of the file comprising:
    - determining if an update sequence number of the second state data that is not comprised in the first state data is indicative of an update of a type that alters contents of the file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Perlin, Eric C., Pudipeddi, Ravisankar V.
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
DAYE, CHELCIE L

Application Number

US11/906,302
Publication Number

US 20090089337A1
Time in Patent Office

3,249 Days
Field of Search

707/2, 707/3, 707/7, 707/747, 711/113, 711/216
US Class Current

1/1
CPC Class Codes

G06F 16/152   using file content signatur...

G06F 16/1734   Details of monitoring file ...

G06F 21/565   by checking file integrity

G06F 2221/2101   Auditing as a secondary aspect

Efficient file hash identifier computation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

21 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient file hash identifier computation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links