Network file system-based data storage system
First Claim
Patent Images
1. A file system comprising:
- an index engine to maintain an index recoverable from crashes, the index engine including;
an on-disk circular buffer to store a current on-disk set of hash-table buckets to store a current on-disk set of index entries at a current location;
an in-memory merge buffer to store a current in-memory set of hash-table buckets to store a current in-memory set of index entries between merges, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the current in-memory set of index entries being duplicated in an on-disk log;
a merge mechanism, coupled to the on-disk and the in-memory merge buffers, to merge the current on-disk set of hash-table buckets and the current in-memory set of hash-table buckets to create at a new location in the on-disk circular buffer a new version of the current on-disk set of hash-table buckets storing the merged on-disk and in-memory set of index entries as the current on-disk set of index entries each time the number of index entries of the current in-memory set of index entries reaches a specified value, wherein the on-disk circular buffer is used for recovery from crashes occurring during merges; and
an on-disk partial indexes buffer to store all prior in-memory sets of hash-table buckets by said merge mechanism used for said merges to facilitate index checking and reconstruction of the index;
a segment store unit to segment data into the one or more variable length data segments;
a storage unit, coupled to the index engine, to store at least one copy of each unique variable length data segment; and
the index engine and the segment store unit to identify new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit, create for each of the new variable length data segments identical to one of the already stored variable length data segments a reference to the already stored identical variable length data segment, store each of the references in the storage unit, and delete each of the new variable length data segments determined identical to one of the already stored variable length data segments.
12 Assignments
0 Petitions
Accused Products
Abstract
A network file system-based data storage system that converts random I/O requests into a piecewise sequential data structure to facilitate variable length data segment redundancy identification and elimination. For one embodiment of the invention a stateless network file system is employed. For one such embodiment, that provides multiple-client access to stored data, multiple Writes are buffered and then broken into variable length data segments. Redundant segment elimination is then effected. One embodiment of the invention allows sharing of the variable length data segments among files.
33 Citations
25 Claims
-
1. A file system comprising:
-
an index engine to maintain an index recoverable from crashes, the index engine including; an on-disk circular buffer to store a current on-disk set of hash-table buckets to store a current on-disk set of index entries at a current location; an in-memory merge buffer to store a current in-memory set of hash-table buckets to store a current in-memory set of index entries between merges, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the current in-memory set of index entries being duplicated in an on-disk log; a merge mechanism, coupled to the on-disk and the in-memory merge buffers, to merge the current on-disk set of hash-table buckets and the current in-memory set of hash-table buckets to create at a new location in the on-disk circular buffer a new version of the current on-disk set of hash-table buckets storing the merged on-disk and in-memory set of index entries as the current on-disk set of index entries each time the number of index entries of the current in-memory set of index entries reaches a specified value, wherein the on-disk circular buffer is used for recovery from crashes occurring during merges; and an on-disk partial indexes buffer to store all prior in-memory sets of hash-table buckets by said merge mechanism used for said merges to facilitate index checking and reconstruction of the index; a segment store unit to segment data into the one or more variable length data segments; a storage unit, coupled to the index engine, to store at least one copy of each unique variable length data segment; and the index engine and the segment store unit to identify new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit, create for each of the new variable length data segments identical to one of the already stored variable length data segments a reference to the already stored identical variable length data segment, store each of the references in the storage unit, and delete each of the new variable length data segments determined identical to one of the already stored variable length data segments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
maintaining by an index engine an index recoverable from crashes, said maintaining including, storing in an on-disk circular buffer at a first location a first set of index entries in a first set of hash-table buckets; storing a second set of index entries in a second set of hash-table buckets of an in-memory merge buffer, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the second set of index entries being duplicated in an on-disk log; merging the first set of hash-table buckets and the second set of hash-table buckets to create at a second location in the on-disk circular buffer a subsequent version of the index when the number of index entries of the second set of index entries reaches a specified value, the subsequent version of the index organized as a third set of hash-table buckets storing the first set of index entries and the second set of index entries, wherein the on-disk circular buffer is used for recovery from crashes occurring during a given merge; and adding to an on-disk partial indexes buffer the second set of hash-table buckets prior to said step of merging to facilitate index checking and reconstruction of the index; and segmenting data into the one or more variable length data segments; storing, in a storage unit coupled to the index engine, at least one copy of each unique variable length data segment; identifying one or more new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit; creating, for each of the new variable length data segments identical to one of the already stored variable length data segments, a reference to the already stored identical variable length data segment; storing each of the references to the storage unit; and deleting each of the one or more new variable length data segments identical to the already stored variable length data segment. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An article of manufacture comprising:
a machine-readable storage medium storing instructions, which when executed by a machine, results in the machine performing operations comprising; maintaining by an index engine an index recoverable from crashes, said maintaining including; storing in an on-disk circular buffer at a first location a first set of index entries in a first set of hash-table buckets; storing a second set of index entries in a second set of hash-table buckets of an in-memory merge buffer, wherein recovery from crashes between and during merges is effected by one of the in-memory merge buffer being stored in a non-volatile memory and the second set of index entries being stored in an on-disk log; merging the first set of hash-table buckets and the second set of hash-table buckets to create a at a second location in the on-disk circular buffer a subsequent version of the index when the number of index entries of the second set of index entries reaches a specified value, the subsequent version of the index organized as a third set of hash-table buckets storing the first set of index entries and the second set of index entries, wherein the on-disk circular buffer is used for recovery from crashes occurring during a given merge; and adding to an on-disk partial indexes buffer the second set of hash-table buckets prior to said step of merging to facilitate index checking and reconstruction of the index; and segmenting data into the one or more variable length data segments; storing, in a storage unit coupled with the index engine, at least one copy of each unique variable length data segment; identifying one or more new variable length data segments that are identical to one of the variable length data segments already stored in said storage unit; creating, for each of the new variable length data segment identical to one of the already stored variable length data segments, a reference to the already stored identical variable length data segment; storing each of the references to the storage unit; and deleting each of the one or more new variable length data segments identical to the already stored data segment. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
Specification