Storing data and metadata in respective virtual shards on sharded storage systems
First Claim
1. A method for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data in a plurality of shards on one or more storage nodes, said method comprising:
- providing a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset;
obtaining, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and
providing, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards.
8 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for storing data and metadata on sharded storage arrays. In one embodiment, data is processed in a sharded distributed data storage system that stores data in a plurality of shards on one or more storage nodes by providing a plurality of addressable virtual shards within each of the shards, wherein at least a first one of the addressable virtual shards stores the data, and wherein at least a second one of the addressable virtual shards stores the metadata related to the data; obtaining the data from a compute node; and providing the data and the metadata related to the data stored to the sharded distributed data storage system for storage in the respective first and second addressable virtual shards. The metadata related to the data is stored together at a portion of a corresponding stripe for the data in the second one of the addressable virtual shards. A third one of the addressable virtual shards optionally stores a checksum value related to the data.
-
Citations
23 Claims
-
1. A method for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data in a plurality of shards on one or more storage nodes, said method comprising:
-
providing a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtaining, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and providing, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 22)
-
-
11. An apparatus for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data on a plurality of shards on one or more storage nodes, said apparatus comprising:
-
a memory; and at least one hardware device operatively coupled to the memory and configured to; provide a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtain, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and provide, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 23)
-
-
21. An article of manufacture for processing data in a sharded distributed data storage system, wherein the sharded distributed data storage system stores said data on a plurality of shards on one or more storage nodes, said article of manufacture comprising a non-transitory machine readable recordable storage medium containing one or more programs which when executed implement the steps of:
-
providing a plurality of addressable virtual shards within each of said plurality of shards, wherein at least a first one of said plurality of addressable virtual shards stores said data and wherein at least a second different one of said plurality of addressable virtual shards separately stores metadata related to said data, wherein said data and said corresponding metadata related to said data are stored within said first and second addressable virtual shards, respectively, with a same object offset; obtaining, at a first burst buffer appliance, said data for a given shard from at least a second burst buffer appliance connected to said first burst buffer appliance by an interconnect network; and providing, by said first burst buffer appliance, said data for said given shard and said metadata related to said data for said given shard to said sharded distributed data storage system using a single write operation for storage in said respective first and second addressable virtual shards.
-
Specification