Stream-based data deduplication with peer node prediction
First Claim
1. A method operative in an overlay network comprising a sending peer and a receiving peer, comprising:
- maintaining a directed cyclic graph in association with the sending peer;
maintaining a directed cyclic graph in association with the receiving peer;
wherein each directed cyclic graph comprises a set of nodes and edges that collectively represent temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, wherein each node represents a chunk of data, and an edge between nodes represents a transition that the respective peer has seen in the data stream with respect to that chunk, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication;
the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of a page;
the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page;
wherein the hinting request and the hinting response are generated in software executing in a hardware element.
1 Assignment
0 Petitions
Accused Products
Abstract
Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chunks in cache that it needs, it follows a conventional content delivery network procedure to retrieve them. In this way, if dictionaries between pairs of sending and receiving peers are out-of-sync, relevant sections are then re-synchronized on-demand. The approach does not require that libraries maintained at a particular pair of sender and receiving peers are the same. Rather, the technique enables a peer, in effect, to “backfill” its dictionary on-the-fly. On-the-wire compression techniques are provided to reduce the amount of data transmitted between the peers.
25 Citations
15 Claims
-
1. A method operative in an overlay network comprising a sending peer and a receiving peer, comprising:
-
maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph comprises a set of nodes and edges that collectively represent temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, wherein each node represents a chunk of data, and an edge between nodes represents a transition that the respective peer has seen in the data stream with respect to that chunk, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication; the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of a page; the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page; wherein the hinting request and the hinting response are generated in software executing in a hardware element. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
-
2. A method operative in an overlay network comprising a sending peer and a receiving peer, the sending peer associated with an origin, and the receiving peer associated with an overlay network edge, the method comprising:
-
maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph comprises a set of nodes and edges that collectively represent temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, wherein each node represents a chunk of data, and an edge between nodes represents a transition that the respective peer has seen in the data stream with respect to that chunk, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication; using the annotated directed cyclic graphs to enforce a compression protocol across the sending and receiving peers; wherein the compression protocol is carried out in software executing in one or more hardware elements.
-
-
10. A method operative in an overlay network comprising a sending peer and a receiving peer, comprising:
-
maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph represents temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication; the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of a page; the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page; wherein the sending peer returns the hinting response to the receiving peer while going forward to an origin to fetch the page; wherein the hinting request and the hinting response are generated in software executing in a hardware element. - View Dependent Claims (11)
-
-
12. A method operative in an overlay network comprising a sending peer and a receiving peer, comprising:
-
maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph represents temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication, wherein the information identifies a URI-host and path from which one or more blocks of data originate, and a number of times that a page associated with the URI-host and path has led to given other content represented in the annotated directed cyclic graph; the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of a page; the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page; wherein the hinting request and the hinting response are generated in software executing in a hardware element. - View Dependent Claims (13)
-
-
14. A method operative in an overlay network comprising a sending peer and a receiving peer, comprising:
-
maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph represents temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to a stream-based data deduplication; the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of a page; the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page, wherein the hinting response also predicts what blocks of data are expected to be included in one or more objects that depend from the page; wherein the hinting request and the hinting response are generated in software executing in a hardware element. - View Dependent Claims (15)
-
Specification