Methods and Systems For Vectored Data De-Duplication

US 20140136490A1
Filed: 11/12/2012
Published: 05/15/2014
Est. Priority Date: 11/12/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

comparing a de-duplication code for a first block of data received as part of an input stream to a de-duplication code for a previously processed block of data;

upon determining that the de-duplication code for the first block of data matches the code for the previously processed block of data, storing in an output stream a vector instead of the first block of data, where the vector points in the output stream to one of, the previously processed block of data, or another vector,where the vector is placed in a location in the output data stream where the first block of data would have been placed, andwhere the vector contains fewer bits than the first block of data, andconfiguring the output stream to receive the next item to be stored after the end of the vector that was stored in the output stream, where the next item is to be processed from the input stream.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is directed toward methods and systems for data de-duplication. More particularly, in various embodiments, the present invention provides systems and methods for data de-duplication that may utilize a vectoring method for data de-duplication wherein a stream of data is divided into “data sets” or blocks. For each block, a code, such as a hash or cyclic redundancy code may be calculated and stored. The first block of the set may be written normally and its address and hash can be stored and noted. Subsequent block hashes may be compared with previously written block hashes.

Citations

15 Claims

1. A method, comprising:
- comparing a de-duplication code for a first block of data received as part of an input stream to a de-duplication code for a previously processed block of data;
  
  upon determining that the de-duplication code for the first block of data matches the code for the previously processed block of data, storing in an output stream a vector instead of the first block of data, where the vector points in the output stream to one of, the previously processed block of data, or another vector,where the vector is placed in a location in the output data stream where the first block of data would have been placed, andwhere the vector contains fewer bits than the first block of data, andconfiguring the output stream to receive the next item to be stored after the end of the vector that was stored in the output stream, where the next item is to be processed from the input stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, where the input stream can be recreated from the output stream without reference to other de-duplication data structures.
  - 3. The method of claim 1, where the output stream includes self-describing data.
  - 4. The method of claim 1, where the stored block of data and the vector are stored in a non-tape solid state memory,
  - 5. The method of claim 1, where the vector comprises a pointer to a previous physical block,
  - 6. The method of claim 1, where the vector comprises a pointer to a previous logical block.
  - 7. The method of claim 6, where the vector comprises a start address and an offset.
  - 8. The method of claim 1, where the stored block of data and the vector are stored on a FLASH drive.
  - 9. The method of claim 1, comprising:
    - upon determining that the de-duplication code for the first block of data matches the code for the previously processed block of data, verifying that the first block of data matches the previously processed block of data.
  - 10. The method of claim 9, where verifying that the first block of data matches the previously processed block of data comprises a bit-by-bit comparison of the first block of data and the previously processed block of data.
  - 11. The method of claim 9, where verifying that the first block of data matches the previously processed block of data comprises a byte-by-byte comparison of the first block of data and the previously processed block of data.
  - 12. The method of claim 9, where verifying that the first block of data matches the previously processed block of data comprises a word-by-word comparison of the first block of data and the previously processed block of data.
  - 13. The method of claim 9, where verifying that the first block of data matches the previously processed block of data comprises computing a second different code for the first block of data and computing a second different code for the previously processed block of data and then comparing the second different, code for the first block of data to the second different code for the previously processed block of data.
  - 14. The method of claim 1, comprising:
    - selectively adding filler to the output stream.
  - 15. The method of claim 1, comprising:
    - selectively adding filler to the output stream in an amount sufficient to cause the next output to the output stream to begin on a block boundary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quantum Corporation (Chi Ko Investment Co., Ltd.)
Original Assignee
Quantum Corporation (Chi Ko Investment Co., Ltd.)
Inventors
Saliba, George, White, Theron

Granted Patent

US 9,047,305 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/692
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/1748   De-duplication implemented ...

G06F 16/1752   based on file chunks

G06F 16/215   Improving data quality; Dat...

G06F 16/2365   Ensuring data consistency a...

Methods and Systems For Vectored Data De-Duplication

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Systems For Vectored Data De-Duplication

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links