×

Scalable grid deduplication

  • US 10,387,374 B2
  • Filed: 02/27/2015
  • Issued: 08/20/2019
  • Est. Priority Date: 02/27/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method, comprisinggenerating a listing of a plurality of zone stamps, each zone stamp representing a data zone in the plurality of data zones in a data stream, the data stream being received from at least one data source by a network of communicatively coupled plurality of grid servers, each grid server in the plurality of grid servers storing a grid server listing of zone stamps corresponding to zones of data stored on that grid server, the generated listing including a logical arrangement of a combination of grid server listings of zone stamps obtained from each grid server in the plurality of grid servers and being accessible by the plurality of grid servers, and storing the generated listing on a coordinating grid server in the plurality of grid servers;

  • partitioning, using the coordinating grid server, the generated listing into a plurality of partitions of zone stamps, each partition in the plurality of partitions including a portion of the plurality of zone stamps, the partitioning being performed based on at least one of the following;

    a processing capability of each grid server in the plurality of grid servers, a size of each zone in the plurality of zones stored by the plurality of grid servers, a time consumed by comparing of zone stamps in the plurality of zone stamps contained in the generated listing, availability to process data zones in the data stream of each grid server in the plurality of grid servers, and any combination thereof; and

    distributing, using the coordinating grid server, each partition of zone stamps in the plurality of partitions to one or more grid servers in the plurality of grid servers for storage, the distributing being performed based on at least a processing capability of each grid server in the plurality of grid servers;

    selecting, using the coordinating grid server, a grid server in the plurality of grid servers, based on the generated listing and a partition stored on that grid server, to performcomparing a first zone stamp in the plurality of zone stamps contained in the generated listing to a second zone stamp in the plurality of zone stamps contained in the generated listing, the first zone stamp representing a first zone in the plurality of zones and the second zone stamp representing a second zone in the plurality of zones in the received data stream; and

    delta-compressing the first zone and the second zone based on a determination that the first zone stamp is substantially similar to the second zone stamp; and

    monitoring, using the coordinating grid server, the comparing and the delta-compressing, and, based on the monitoring, selecting, using the coordinating grid server, at least another grid server in the plurality of grid servers to perform the comparing and the delta-compressing upon determination that the selected grid server exceeded a predetermined amount of time to perform the comparing and the delta-compressing.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×