×

Predictive probabilistic deduplication of storage

  • US 9,940,337 B2
  • Filed: 05/31/2015
  • Issued: 04/10/2018
  • Est. Priority Date: 05/31/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for probability-based deduplication of storage, said method comprising:

  • receiving, by a processor, a plurality of input/output (I/O) commands, said plurality of commands including content subdivided into a first plurality of data blocks;

    setting the first plurality of data blocks as unique;

    writing the first plurality of data blocks to storage;

    sampling the first plurality of data blocks based on the first plurality of data blocks being set as unique to check for unique and duplicate blocks in the first plurality of the blocks and updating a key-value table with the sampled blocks;

    predicting, by the processor, based on the sampling, whether a second plurality of blocks is expected to be unique or duplicate, wherein said predicting is performed without writing the second plurality of blocks to the storage; and

    upon predicting that the second plurality of blocks is duplicate;

    updating the key-value table with the duplicate blocks;

    tallying unique blocks in the second plurality of blocks;

    writing the unique blocks to the storage and updating a value in a uniqueness counter corresponding to the tallying; and

    upon the value in the uniqueness counter exceeding a threshold, predicting that a next plurality of blocks is expected to be unique; and

    upon predicting that the second plurality of blocks is unique;

    writing the second plurality of blocks to the storage; and

    continuing to perform said sampling and predicting with blocks of the received plurality of I/O commands, thereby deduplicating the content.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×