×

Method and system for handling object boundaries of a data stream to optimize deduplication

  • US 9,087,086 B1
  • Filed: 12/18/2012
  • Issued: 07/21/2015
  • Est. Priority Date: 12/18/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for deduplicating data, comprising:

  • receiving at a storage system over a network from a client a data stream having a sequence of a plurality of data objects, the data stream representing a file or a directory of one or more files of a file system associated with the client, wherein the data stream includes a plurality of boundary markers inserted by the client prior to being received at the storage system;

    scanning the data stream to recognize a plurality of boundary markers each being associated with each of the data objects, the boundary markers identifying boundaries of the data objects; and

    deduplicating the data stream into a plurality of deduplicated chunks in view of boundaries of the data objects marked by the boundary markers, wherein deduplicating the data stream comprisesanchoring the data stream using a predetermined chunking algorithm to create a plurality of anchor points, each anchor point identifying a chunking boundary for deduplication;

    relocating at least one of the anchor points to a location that is identified by at least one boundary markers; and

    chunking the data stream into the deduplicated chunks based on the anchor points that include at least one relocated anchor point.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×