STORING NODES REPRESENTING RESPECTIVE CHUNKS OF FILES IN A DATA STORE
First Claim
Patent Images
1. A method executed by a computer of providing a data store, comprising:
- storing, by the computer, nodes representing respective chunks of files in a predefined structure that defines relationships among the nodes, wherein the files are divided into the chunks;
collecting, by the computer, the nodes into plural groups stored in persistent storage, wherein some of the nodes are collected into a particular one of the groups according to a locality relationship of the some of the nodes, wherein each of the groups has a header portion and a data portion, the data portion containing payload data of respective chunks associated with the corresponding group, and the header portion containing hashes of the chunks associated with the corresponding group, wherein each of the hashes is calculated by applying a hash function on content of a corresponding one of the chunks in the corresponding group; and
associating location indications with the nodes, wherein the location indication of a first one of the nodes includes a pending indication to indicate that the first node has not yet been written to a group in the persistent storage, and wherein the location indication of a second one of the nodes includes an indication of a group in the persistent storage that the second node is part of.
1 Assignment
0 Petitions
Accused Products
Abstract
To provide a data store, nodes representing respective chunks of files are stored in a predefined structure that defines relationships among the nodes, where the files are divided into the chunks. The nodes are collected into plural groups stored in persistent storage, where some of the nodes are collected into a particular one of the groups according to a locality relationship of the some of the nodes.
63 Citations
20 Claims
-
1. A method executed by a computer of providing a data store, comprising:
-
storing, by the computer, nodes representing respective chunks of files in a predefined structure that defines relationships among the nodes, wherein the files are divided into the chunks; collecting, by the computer, the nodes into plural groups stored in persistent storage, wherein some of the nodes are collected into a particular one of the groups according to a locality relationship of the some of the nodes, wherein each of the groups has a header portion and a data portion, the data portion containing payload data of respective chunks associated with the corresponding group, and the header portion containing hashes of the chunks associated with the corresponding group, wherein each of the hashes is calculated by applying a hash function on content of a corresponding one of the chunks in the corresponding group; and associating location indications with the nodes, wherein the location indication of a first one of the nodes includes a pending indication to indicate that the first node has not yet been written to a group in the persistent storage, and wherein the location indication of a second one of the nodes includes an indication of a group in the persistent storage that the second node is part of. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method executed by a data storage system to provide a data store, comprising:
-
providing, in the data store, a graph structure that specifies relationships among nodes representing chunks of files, the nodes containing digests of the chunks, wherein plural chunks make up a particular one of the files; collecting, by at least one processor, some of the nodes into pages stored in persistent storage, wherein a particular one of the pages has a header portion and a data portion, the data portion containing payload data of respective chunks associated with the particular page, and the header portion containing first references to nodes in the particular page and second references to nodes that are not part of the particular page but are children of at least one node of the particular page, wherein collecting the some of the nodes into pages is according to a first algorithm that; searches for a larger one of subgraphs of nodes that have not yet been written to a given page; and writes at least some of the nodes of the larger one of the subgraphs to the given page. - View Dependent Claims (17, 18, 19)
-
-
20. A computer-readable storage medium storing instructions that when executed cause a computer to:
-
store nodes representing respective chunks of files in a predefined structure that defines relationships among the nodes, wherein the files are divided into the chunks; and collect the nodes into plural pages stored in persistent storage, wherein some of the nodes are collected into a particular one of the pages according to a locality relationship of the some of the nodes, wherein each of the pages has a header portion and a data portion, the data portion containing payload data of respective chunks associated with the corresponding page, and the header portion containing hashes of the chunks associated with the corresponding page, wherein each of the hashes is calculated by applying a hash function on content of a corresponding one of the chunks in the corresponding page; and associate location indications with the nodes, wherein the location indication of a first one of the nodes includes a pending indication to indicate that the first node has not yet been written to a page in the persistent storage, and wherein the location indication of a second one of the nodes includes an indication of a page in the persistent storage that the second node is part of.
-
Specification