Selective data deduplication
First Claim
Patent Images
1. A method of selectively deduplicating an incoming data stream having plural segments, the method comprising:
- as the incoming data stream is received by a server having a processor from a data source over a network and prior to storing the incoming data stream into a first store, determining, by the server, which of the plural segments that are associated with corresponding types is of a predetermined type;
selecting, by the server, segments according to the predetermined type from the plural segments, wherein selecting the segments for deduplication is based on at least one predetermined condition being met, the predetermined condition relating to a time associated with the plural segments;
using a deduplication engine in the server to deduplicate the selected segments according to the predetermined type;
storing, by the server, the deduplicated data segments in the first store;
deciding, by the server, to not deduplicate segments of the incoming data stream that are according to another one of the types, wherein the selective deduplication of the selected segments and the decision not to deduplicate the segments of the another type are performed at the server without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream.
2 Assignments
0 Petitions
Accused Products
Abstract
Data is selectively deduplicated such that portions of data suitable for deduplication are passed to a deduplication engine (1064) and stored in a first store (1072).
-
Citations
12 Claims
-
1. A method of selectively deduplicating an incoming data stream having plural segments, the method comprising:
-
as the incoming data stream is received by a server having a processor from a data source over a network and prior to storing the incoming data stream into a first store, determining, by the server, which of the plural segments that are associated with corresponding types is of a predetermined type; selecting, by the server, segments according to the predetermined type from the plural segments, wherein selecting the segments for deduplication is based on at least one predetermined condition being met, the predetermined condition relating to a time associated with the plural segments; using a deduplication engine in the server to deduplicate the selected segments according to the predetermined type; storing, by the server, the deduplicated data segments in the first store; deciding, by the server, to not deduplicate segments of the incoming data stream that are according to another one of the types, wherein the selective deduplication of the selected segments and the decision not to deduplicate the segments of the another type are performed at the server without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, executed in an apparatus having a processor, for selectively deduplicating data, the method comprising:
-
uploading a plurality of rule definitions, including a definition of a plurality of predetermined types of data suitable for deduplication, into an interface; applying, by the apparatus, the plurality of rule definitions to an incoming data stream in the interface; identifying a first portion of the incoming data stream determined to be one of the predetermined types suitable for deduplication, and passing, in response to determining that a predefined condition relating to a time associated with the incoming data stream has been met, the identified first portion to a deduplication engine in the apparatus to apply deduplication on the identified first portion; and identifying a second portion of the incoming data stream determined to not be any one of the predetermined types suitable for deduplication, and bypassing the deduplication engine for the identified second portion to avoid performing deduplication on the identified second portion, wherein the incoming data stream is received by the apparatus from a data source over a network, and wherein selective deduplication and non-deduplication of the identified first and second portions are performed at the apparatus without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream.
-
-
7. A server for selectively deduplicating an incoming data stream having plural segments, the server comprising:
-
a first store; a processor to; as the incoming data stream is received from a data source over a network and prior to storing the incoming data stream into the first store, determine which of the plural segments that are associated with corresponding types is of a predetermined type; and select segments according to the predetermined type from the plural segments, wherein the selecting is based on a predetermined condition relating to a time associated with the plural segments being met; and a deduplication engine to deduplicate the selected segments according to the predetermined type, and store the deduplicated segments into the first store, wherein the processor is to decide to not deduplicate segments of the incoming data stream that are according to another one of the types, and wherein the selective deduplication of the selected segments and the decision not to deduplicate the segments of the another type are performed at the server without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream. - View Dependent Claims (8, 9)
-
-
10. Apparatus for selectively deduplicating data, comprising:
-
an interface to receive a rule definition specifying a plurality of predetermined types of data suitable for deduplication, to receive an incoming data stream, and to apply the rule definition to the incoming data stream; and a deduplication engine to deduplicate data, wherein the interface is to further; identify a first portion of the incoming data stream determined to be one of the predetermined types suitable for deduplication, and pass, in response to determining that a predefined condition relating to a time associated with the incoming data stream has been met, the identified first portion to the deduplication engine to apply deduplication on the identified first portion; and identify a second portion of the incoming data stream determined to not be any one of the predetermined types suitable for deduplication, and bypass the deduplication engine for the identified second portion to avoid performing deduplication on the identified second portion, wherein the incoming data stream is received by the apparatus from a data source over a network, and wherein selective deduplication and non-deduplication of the identified first and second portions are performed at the apparatus without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream.
-
-
11. A non-transitory computer readable medium having stored thereon computer program instructions that, when executed by a computer system, cause the computer system to:
-
segment an incoming data stream received from a data source over a network into plural segments; determine a type of each segment of the incoming data stream, wherein the segments of the incoming data stream are according to plural types; select segments that are according to a predetermined one of the types if at least one predetermined condition is met, the predetermined condition relating to a time associated with the plural segments; deduplicate the selected segments; and decide to not deduplicate segments of the incoming data stream that are according to another one of the types, wherein the selective deduplication of the selected segments and the decision not to deduplicate the segments of the another type are performed at the computer system without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream.
-
-
12. A non-transitory computer readable medium having stored thereon computer program instructions that, when executed by a computer system, cause the computer system to:
-
upload a rule definition specifying a plurality of predetermined types of data suitable for deduplication, into an interface; apply the rule definition to an incoming data stream in the interface; identify a first portion of the incoming data stream determined to be one of the predetermined types suitable for deduplication, and pass, in response to determining that a predefined condition relating to a time associated with the incoming data stream has been met, the identified first portion to a deduplication engine to apply deduplication on the identified first portion; and identify a second portion of the incoming data stream determined to not be any one of the predetermined types suitable for deduplication, and bypass the deduplication engine for the identified second portion to avoid performing deduplication on the identified second portion, wherein the incoming data stream is received by the computer system from a data source over a network, and wherein selective deduplication and non-deduplication of the identified first and second portions are performed at the computer system without management of the incoming data stream at the data source to control deduplicated storage of the incoming data stream.
-
Specification