Data transfer reduction in scale out architectures

US 8,825,985 B2
Filed: 07/14/2011
Issued: 09/02/2014
Est. Priority Date: 07/14/2011
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving a data stream at a compute node, the compute node having compute node local storage resources;

separating the data stream into a plurality of chunks, including a first chunk and a second chunk;

generating a plurality of fingerprints for the plurality of chunks, including the first chunk and the second chunk;

transmitting the plurality of fingerprints to a scale out node, the scale out node having scale out node local storage resources, wherein the scale out node compares the plurality of fingerprints with fingerprints corresponding to chunks maintained in the scale out node local storage resources;

creating an object map for the data stream based on a determination of whether fingerprints in the plurality of fingerprints correspond to fingerprints already stored in the scale out node; and

creating a datastore suitcase corresponding to the object map including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the plurality of chunks and a last file entry for each of the plurality of chunks, each last file entry storing an identifier of a file which last placed a reference to the corresponding chunk.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms are provided for data transfer reduction in scale out architectures. When a compute node receives a write input/output (I/O) request for a data stream, the compute node separates the data stream into chunks and generates fingerprints for the individual chunks. Fingerprints are then sent to a scale out node and compared to fingerprints of chunks already maintained at the scale out node. Write data transfers are only made for chunks not already maintained at the scale out node. For a read I/O request for a data stream, fingerprints for chunks of the data stream are requested by the compute node from a scale out node. Fingerprints received are compared to fingerprints of chunks already maintained at the compute node and read data transfers are only made for chunks not already maintained at the compute node.

14 Citations

View as Search Results

22 Claims

1. A method, comprising:
- receiving a data stream at a compute node, the compute node having compute node local storage resources;
  
  separating the data stream into a plurality of chunks, including a first chunk and a second chunk;
  
  generating a plurality of fingerprints for the plurality of chunks, including the first chunk and the second chunk;
  
  transmitting the plurality of fingerprints to a scale out node, the scale out node having scale out node local storage resources, wherein the scale out node compares the plurality of fingerprints with fingerprints corresponding to chunks maintained in the scale out node local storage resources;
  
  creating an object map for the data stream based on a determination of whether fingerprints in the plurality of fingerprints correspond to fingerprints already stored in the scale out node; and
  
  creating a datastore suitcase corresponding to the object map including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the plurality of chunks and a last file entry for each of the plurality of chunks, each last file entry storing an identifier of a file which last placed a reference to the corresponding chunk.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the plurality of fingerprints are a plurality of checksums for the plurality of chunks.
  - 3. The method of claim 1, wherein the plurality of fingerprints is a plurality of hash values for the plurality of chunks.
  - 4. The method of claim 1, wherein compute node local storage resources comprise disk arrays.
  - 5. The method of claim 1, wherein the data stream is an object.
  - 6. The method of claim 1, wherein the scale out node sends a negative acknowledgement (N-ACK) to the compute node for fingerprints corresponding to chunks not maintained in the scale out node local storage resources.
  - 7. The method of claim 6, wherein the compute node transmits to the scale out node chunks not maintained in scale out node local storage resources.
  - 8. The method of claim 1, wherein the compute node is connected to the scale out node using a network interface.

9. A system, comprising:
- an interface configured to receive a data stream at a compute node, the compute node having compute node local storage resources;
  
  a processor configured to separate the data stream into a plurality of chunks and generate a plurality of fingerprints for the plurality of chunks, the plurality of chunks including a first chunk and a second chunk;
  
  wherein the plurality of fingerprints are transmitted to a scale out node, the scale out node having scale out node local storage resources, wherein the scale out node compares the plurality of fingerprints with fingerprints corresponding to chunks maintained in the scale out node local storage resources;
  
  wherein the processor is further configured to create an object map for the data stream based on a determination of whether fingerprints in the plurality of fingerprints correspond to fingerprints already stored in the scale out node; and
  
  wherein the processor is further configured to create a datastore suitcase corresponding to the object map including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to the plurality of chunks and a last file entry for each of the plurality of chunks, each last file entry storing an identifier of a file which last placed a reference to the corresponding chunk.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the plurality of fingerprints are a plurality of checksums for the plurality of chunks.
  - 11. The system of claim 9, wherein the plurality of fingerprints is a plurality of hash values for the plurality of chunks.
  - 12. The system of claim 9, wherein compute node local storage resources comprise disk arrays.
  - 13. The system of claim 9, wherein the data stream is an object.
  - 14. The system of claim 9, wherein the scale out node sends a negative acknowledgement (N-ACK) to the compute node for fingerprints corresponding to chunks not maintained in the scale out node local storage resources.
  - 15. The system of claim 14, wherein the compute node transmits to the scale out node chunks not maintained in scale out node local storage resources.
  - 16. The system of claim 9, wherein the compute node is connected to the scale out node using a network interface.

17. A method, comprising:
- receiving a read input/output (I/O) request for a data stream at a compute node, the compute node having compute node local storage resources;
  
  identifying a scale out node having an object map corresponding to the data stream, the object map further corresponding to a datastore suitcase including an index portion and a data portion, the data portion holding a plurality of datastore indices corresponding to a plurality of chunks, the chunks corresponding to the data stream, and a last file entry for each of the plurality of chunks, each last file entry storing an identifier of a file which last placed a reference to the corresponding chunk;
  
  requesting a plurality of fingerprints for the data stream from the scale out node;
  
  comparing the plurality of fingerprints with fingerprints corresponding to chunks maintained in compute node local storage resources;
  
  requesting from the scale out node only the chunks not maintained in the compute node local storage resources.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The method of claim 17, wherein the plurality of fingerprints are a plurality of checksums for the plurality of chunks.
  - 19. The method of claim 17, wherein the plurality of fingerprints is a plurality of hash values for the plurality of chunks.
  - 20. The method of claim 17, wherein compute node local storage resources comprise disk arrays.
  - 21. The method of claim 17, wherein the data stream is an object.
  - 22. The method of claim 17, wherein the compute node is connected to the scale out node using a network interface.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quest Software, Inc.
Original Assignee
Dell Products LP (Dell Technologies Inc.)
Inventors
Jayaraman, Vinod, Dinkar, Abhijit
Primary Examiner(s)
SCHNEE, HAL W

Application Number

US13/183,054
Publication Number

US 20130019061A1
Time in Patent Office

1,146 Days
Field of Search

711/154, 711/216, 711/E12.06
US Class Current

711/216
CPC Class Codes

G06F 12/04   Addressing variable-length ...

G06F 12/0864   using pseudo-associative me...

G06F 2212/1044   Space efficiency improvement

G06F 2212/264   Remote server

H04L 1/0061   Error detection codes

H04L 67/06   specially adapted for file ...

H04L 67/1097   for distributed storage of ...

Data transfer reduction in scale out architectures

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Data transfer reduction in scale out architectures

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others