Network file system-based data storage system

US 7,747,581 B1
Filed: 04/19/2007
Issued: 06/29/2010
Est. Priority Date: 09/15/2004
Status: Active Grant

First Claim

Patent Images

1. A file system comprising:

an index engine to maintain an index recoverable from crashes, the index engine including;

an on-disk circular buffer to store a current on-disk set of hash-table buckets to store a current on-disk set of index entries at a current location;

an in-memory merge buffer to store a current in-memory set of hash-table buckets to store a current in-memory set of index entries between merges, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the current in-memory set of index entries being duplicated in an on-disk log;

a merge mechanism, coupled to the on-disk and the in-memory merge buffers, to merge the current on-disk set of hash-table buckets and the current in-memory set of hash-table buckets to create at a new location in the on-disk circular buffer a new version of the current on-disk set of hash-table buckets storing the merged on-disk and in-memory set of index entries as the current on-disk set of index entries each time the number of index entries of the current in-memory set of index entries reaches a specified value, wherein the on-disk circular buffer is used for recovery from crashes occurring during merges; and

an on-disk partial indexes buffer to store all prior in-memory sets of hash-table buckets by said merge mechanism used for said merges to facilitate index checking and reconstruction of the index;

a segment store unit to segment data into the one or more variable length data segments;

a storage unit, coupled to the index engine, to store at least one copy of each unique variable length data segment; and

the index engine and the segment store unit to identify new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit, create for each of the new variable length data segments identical to one of the already stored variable length data segments a reference to the already stored identical variable length data segment, store each of the references in the storage unit, and delete each of the new variable length data segments determined identical to one of the already stored variable length data segments.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A network file system-based data storage system that converts random I/O requests into a piecewise sequential data structure to facilitate variable length data segment redundancy identification and elimination. For one embodiment of the invention a stateless network file system is employed. For one such embodiment, that provides multiple-client access to stored data, multiple Writes are buffered and then broken into variable length data segments. Redundant segment elimination is then effected. One embodiment of the invention allows sharing of the variable length data segments among files.

33 Citations

View as Search Results

25 Claims

1. A file system comprising:
- an index engine to maintain an index recoverable from crashes, the index engine including;
  
  an on-disk circular buffer to store a current on-disk set of hash-table buckets to store a current on-disk set of index entries at a current location;
  
  an in-memory merge buffer to store a current in-memory set of hash-table buckets to store a current in-memory set of index entries between merges, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the current in-memory set of index entries being duplicated in an on-disk log;
  
  a merge mechanism, coupled to the on-disk and the in-memory merge buffers, to merge the current on-disk set of hash-table buckets and the current in-memory set of hash-table buckets to create at a new location in the on-disk circular buffer a new version of the current on-disk set of hash-table buckets storing the merged on-disk and in-memory set of index entries as the current on-disk set of index entries each time the number of index entries of the current in-memory set of index entries reaches a specified value, wherein the on-disk circular buffer is used for recovery from crashes occurring during merges; and
  
  an on-disk partial indexes buffer to store all prior in-memory sets of hash-table buckets by said merge mechanism used for said merges to facilitate index checking and reconstruction of the index;
  
  a segment store unit to segment data into the one or more variable length data segments;
  
  a storage unit, coupled to the index engine, to store at least one copy of each unique variable length data segment; and
  
  the index engine and the segment store unit to identify new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit, create for each of the new variable length data segments identical to one of the already stored variable length data segments a reference to the already stored identical variable length data segment, store each of the references in the storage unit, and delete each of the new variable length data segments determined identical to one of the already stored variable length data segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1,wherein the index engine maps from a segment-related data to a segment location-related data for each of the one or more unique variable length data segments.
  - 3. The system of claim 2 wherein the segment-related data is a Segment ID.
  - 4. The system of claim 2 wherein the segment-related data is a segment content-related data.
  - 5. The system of claim 4 wherein each of the segment content-related data is a hash of its variable length data segment.
  - 6. The system of claim 3 wherein each of the Segment IDs includes a fingerprint or hash of its variable length data segment.
  - 7. The system of claim 1 wherein one or more of the already stored variable length data segments is compressed using an entropy compression scheme selected from the group consisting of run-length encoding, Huffman coding, Arithmetic coding, Context coding, LZ77, LZ78, LZW, Deflate, Transformation coding, and combinations thereof.
  - 8. The system of claim 1 wherein the file system implements a network file system protocol.
  - 9. The system of claim 8 wherein the network file system protocol is selected from the group consisting of Network File System, Common Internet File System, NetWare Core Protocol, Global File System, Appleshare, InterMezzo File System-, and Andrew File System.

10. A method comprising:
- maintaining by an index engine an index recoverable from crashes, said maintaining including,storing in an on-disk circular buffer at a first location a first set of index entries in a first set of hash-table buckets;
  
  storing a second set of index entries in a second set of hash-table buckets of an in-memory merge buffer, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the second set of index entries being duplicated in an on-disk log;
  
  merging the first set of hash-table buckets and the second set of hash-table buckets to create at a second location in the on-disk circular buffer a subsequent version of the index when the number of index entries of the second set of index entries reaches a specified value, the subsequent version of the index organized as a third set of hash-table buckets storing the first set of index entries and the second set of index entries, wherein the on-disk circular buffer is used for recovery from crashes occurring during a given merge; and
  
  adding to an on-disk partial indexes buffer the second set of hash-table buckets prior to said step of merging to facilitate index checking and reconstruction of the index; and
  
  segmenting data into the one or more variable length data segments;
  
  storing, in a storage unit coupled to the index engine, at least one copy of each unique variable length data segment;
  
  identifying one or more new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit;
  
  creating, for each of the new variable length data segments identical to one of the already stored variable length data segments, a reference to the already stored identical variable length data segment;
  
  storing each of the references to the storage unit; and
  
  deleting each of the one or more new variable length data segments identical to the already stored variable length data segment.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10wherein the index engine maps from a segment-related data to a segment location-related data for each of the one or more unique variable length data segments.
  - 12. The method of claim 11 wherein the segment-related data is a Segment ID.
  - 13. The method of claim 11 wherein the segment-related data is a segment content-related data.
  - 14. The method of claim 13 wherein each of the segment content-related data is a hash of its variable length data segment.
  - 15. The method of claim 12 wherein each of the Segment IDs includes a fingerprint or hash of its variable length data segment.
  - 16. The method of claim 10 further comprising:
    - compressing prior to storage in the storage unit of one or more of the already stored variable length data segments using an entropy compression scheme selected from the group consisting of run-length encoding, Huffman coding, Arithmetic coding, Context coding, LZ77, LZ78, LZW, Deflate, Transformation coding, and combinations thereof.
  - 17. The method of claim 10 further comprising:
    - receiving the data to be segmented through a network file system protocol.

18. An article of manufacture comprising:
- a machine-readable storage medium storing instructions, which when executed by a machine, results in the machine performing operations comprising;
  
  maintaining by an index engine an index recoverable from crashes, said maintaining including;
  
  storing in an on-disk circular buffer at a first location a first set of index entries in a first set of hash-table buckets;
  
  storing a second set of index entries in a second set of hash-table buckets of an in-memory merge buffer, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the second set of index entries being stored in an on-disk log;
  
  merging the first set of hash-table buckets and the second set of hash-table buckets to create a at a second location in the on-disk circular buffer a subsequent version of the index when the number of index entries of the second set of index entries reaches a specified value, the subsequent version of the index organized as a third set of hash-table buckets storing the first set of index entries and the second set of index entries, wherein the on-disk circular buffer is used for recovery from crashes occurring during a given merge; and
  
  adding to an on-disk partial indexes buffer the second set of hash-table buckets prior to said step of merging to facilitate index checking and reconstruction of the index; and
  
  segmenting data into the one or more variable length data segments;
  
  storing, in a storage unit coupled with the index engine, at least one copy of each unique variable length data segment;
  
  identifying one or more new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit;
  
  creating, for each of the new variable length data segment identical to one of the already stored variable length data segments, a reference to the already stored identical variable length data segment;
  
  storing each of the references to the storage unit; and
  
  deleting each of the one or more new variable length data segments identical to the already stored data segment.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The article of manufacture of claim 18 whereinthe index engine maps from a segment-related data to a segment location-related data for each of the one or more unique variable length data segments.
  - 20. The article of manufacture of claim 19 wherein the segment-related data is a Segment ID.
  - 21. The article of manufacture of claim 19 wherein the segment-related data is a segment content-related data.
  - 22. The article of manufacture of claim 21 wherein each of the segment content-related data is a hash of its variable length data segment.
  - 23. The article of manufacture of claim 20 wherein each of the Segment IDs includes a fingerprint or hash of its variable length data segment.
  - 24. The article of manufacture of claim 18 wherein the machine performs further operations comprising:
    - compressing prior to storage in the storage unit of one or more of the already stored variable length data segments using an entropy compression scheme selected from the group consisting of run-length encoding, Huffman coding, Arithmetic coding, Context coding, LZ77, LZ78, LZW, Deflate, Transformation coding, and combinations thereof.
  - 25. The article of manufacture of claim 18 wherein the machine performs further operations comprising:
    - receiving the data to be segmented through a network file system protocol.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Emc IP Holding Company LLC (Dell Technologies Inc.)
Original Assignee
EMC Corporation (Dell Technologies Inc.)
Inventors
Li, Kai, Patterson, R. Hugo, Zhu, Ming Benjamin, Reddy, Sazzala, Johnsson, Richard, Zabarsky, Jeffery, Bricker, Allan
Primary Examiner(s)
Cottingham; John R.
Assistant Examiner(s)
TIMBLIN, ROBERT M

Application Number

US11/788,407
Time in Patent Office

1,167 Days
Field of Search

None
US Class Current

707/686
CPC Class Codes

G06F 16/10   File systems; File servers

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/0659   Command handling arrangemen...

G06F 3/067   Distributed or networked st...

Network file system-based data storage system

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Network file system-based data storage system

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links