Cluster storage using subsegmenting
First Claim
Patent Images
1. A method for storing data on cluster storage comprising:
- receiving a data stream or a data block;
breaking the data stream or the data block into segments; and
for each segment associated with the data stream or the data block;
assigning the segment to a cluster node, wherein the cluster node is associated with a cluster storage system comprising at least two cluster nodes and wherein each cluster node is associated with a corresponding storage, wherein the cluster node indexes and stores one or more segments managed by the cluster storage system;
breaking the segment into a plurality of portions of the segment, wherein each portion of the segment is smaller than the segment; and
identifying one of the plurality of portions of the segment that is a duplicate of a portion of another segment already managed by the assigned cluster node for determining storage of a deduplicated representation of the segment in the cluster node, wherein the identification is based at least in part on using a determined tag associated with the portion of the segment, wherein storing the segment includes at least storing a reference to the portion of the other segment already managed by the cluster node instead of the portion of the segment identified as the duplicate, wherein at least the stored reference is used to reconstruct the segment.
12 Assignments
0 Petitions
Accused Products
Abstract
Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node.
31 Citations
42 Claims
-
1. A method for storing data on cluster storage comprising:
-
receiving a data stream or a data block; breaking the data stream or the data block into segments; and for each segment associated with the data stream or the data block; assigning the segment to a cluster node, wherein the cluster node is associated with a cluster storage system comprising at least two cluster nodes and wherein each cluster node is associated with a corresponding storage, wherein the cluster node indexes and stores one or more segments managed by the cluster storage system; breaking the segment into a plurality of portions of the segment, wherein each portion of the segment is smaller than the segment; and identifying one of the plurality of portions of the segment that is a duplicate of a portion of another segment already managed by the assigned cluster node for determining storage of a deduplicated representation of the segment in the cluster node, wherein the identification is based at least in part on using a determined tag associated with the portion of the segment, wherein storing the segment includes at least storing a reference to the portion of the other segment already managed by the cluster node instead of the portion of the segment identified as the duplicate, wherein at least the stored reference is used to reconstruct the segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A system for storing data on cluster storage comprising:
-
a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor to; receive a data stream or a data block; break the data stream or the data block into segments; and for each segment associated with the data stream or the data block; assign the segment to a cluster node, wherein the cluster node is associated with a cluster storage system comprising at least two cluster nodes and wherein each cluster node is associated with a corresponding storage, wherein the cluster node indexes and stores one or more segments managed by the cluster storage system; break the segment into a plurality of portions of the segment, wherein each portion of the segment is smaller than the segment; and identify one of the plurality of portions of the segment that is a duplicate of a portion of another segment already managed by the assigned cluster node for determining storage of a deduplicated representation of the segment in the cluster node, wherein the identification is based at least in part on using a determined tag associated with the portion of the segment, wherein storing the segment includes at least storing a reference to the portion of the other segment already managed by the cluster node instead of the portion of the segment identified as the duplicate, wherein at least the stored reference is used to reconstruct the segment.
-
-
42. A computer program product for storing data on cluster storage, the computer program product being embodied in a computer readable storage medium and comprising computer instructions for:
-
receiving a data stream or a data block; breaking the data stream or the data block into segments; and for each segment associated with the data stream or the data block; assigning the segment to a cluster node, wherein the cluster node is associated with a cluster storage system comprising at least two cluster nodes and wherein each cluster node is associated with a corresponding storage, wherein the cluster node indexes and stores one or more segments managed by the cluster storage system file; breaking the segment into a plurality of portions of the segment, wherein each portion of the segment is smaller than the segment; and identifying one of the plurality of portions of the segment that is a duplicate of a portion of another segment already managed by the assigned cluster node for determining storage of a deduplicated representation of the segment in the cluster node, wherein the identification is based at least in part on using a determined tag associated with the portion of the segment, wherein storing the segment includes at least storing a reference to the portion of the other segment already managed by the cluster node instead of the portion of the segment identified as the duplicate, wherein at least the stored reference is used to reconstruct the segment.
-
Specification