Distinguishing data streams to enhance data storage efficiency
First Claim
1. A method of backing up a data file from a first system to a second system, wherein the data file comprises first data and second data, the method comprising:
- identifying data included in the data file, prior to transmission of the data file in a data stream to the second system, such that the second system is able to distinguish the first data from the second data in the data file,wherein the first data comprises information stored in the data file as content that is to be backed up on storage media by the second system, and the second data comprises information defining one or more attributes associated with said content in the first data,wherein the second system processes the second data to determine whether content included in the first data is a candidate for deduplication, prior to comparing the content of the first data with third data, previously backed up on the storage media, to determine any redundancies between the content in the first data and the third data,wherein if processing the second data indicates that an attribute associated with the content in the first data is different from an attribute associated with the content in the third data, then content of the first data is not deduplicated with respect to the third data,wherein if processing the second data indicates that an attribute associated with the content in the first data is the same as an attribute associated with the content in the third data, then content of the first data is deduplicated with respect to the third data,such that portions of content in the first data that are non-redundant with respect to the third data are stored in the storage media by the second system, and portions of the content in the first data that are redundant with respect to the third data are not stored in the storage media and instead a reference pointer is provided to portions of the third data that include a copy of the redundant portions of the content in the first data, instead of duplicating the redundant portions of the content on the storage media,wherein the second data comprises metadata including attributes for identifying the content in the first data as belonging to a class, category or type, wherein the metadata is fixed in size, andwherein the second data further comprises associated stream data related to the content in the first data,wherein the associated stream data provides additional information about the content in the first data,wherein said additional information is not included in the metadata, wherein the associated stream data is not fixed in size,wherein the metadata is used to determine whether the associated stream data is to be deduplicated.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, and computer products for communicating between a client and server by identifying and separating file data streams within a file are provided. The method comprises indicating the type of file data stream to be transmitted and transmitting the indicated file data stream. The transmitted file data stream is compared with a stored data stream. A non-redundant file data stream is stored based upon the outcome of the comparison. The transmitted file data stream and stored data stream may be compared according to a deduplication table based on data stream profiles.
224 Citations
17 Claims
-
1. A method of backing up a data file from a first system to a second system, wherein the data file comprises first data and second data, the method comprising:
-
identifying data included in the data file, prior to transmission of the data file in a data stream to the second system, such that the second system is able to distinguish the first data from the second data in the data file, wherein the first data comprises information stored in the data file as content that is to be backed up on storage media by the second system, and the second data comprises information defining one or more attributes associated with said content in the first data, wherein the second system processes the second data to determine whether content included in the first data is a candidate for deduplication, prior to comparing the content of the first data with third data, previously backed up on the storage media, to determine any redundancies between the content in the first data and the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is different from an attribute associated with the content in the third data, then content of the first data is not deduplicated with respect to the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is the same as an attribute associated with the content in the third data, then content of the first data is deduplicated with respect to the third data, such that portions of content in the first data that are non-redundant with respect to the third data are stored in the storage media by the second system, and portions of the content in the first data that are redundant with respect to the third data are not stored in the storage media and instead a reference pointer is provided to portions of the third data that include a copy of the redundant portions of the content in the first data, instead of duplicating the redundant portions of the content on the storage media, wherein the second data comprises metadata including attributes for identifying the content in the first data as belonging to a class, category or type, wherein the metadata is fixed in size, and wherein the second data further comprises associated stream data related to the content in the first data, wherein the associated stream data provides additional information about the content in the first data, wherein said additional information is not included in the metadata, wherein the associated stream data is not fixed in size, wherein the metadata is used to determine whether the associated stream data is to be deduplicated. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A storage management system comprising:
-
a first system communicatively coupled to a network, such that the first system may transmit a data file to a second system communicatively coupled to the network, wherein the data file comprises first data and second data, said first system configured to identify data included in the data file, prior to transmission of the data file in a data stream to the second system, such that the second system is able to distinguish the first data from the second data in the data file, wherein the first data comprises information stored in the data file as content that is to be backed up on storage media by the second system, and the second data comprises information defining one or more attributes associated with said content in the first data, wherein the second system processes the second data to determine whether content included in the first data is a candidate for deduplication, prior to comparing the content of the first data with third data, previously backed up on the storage media, to determine any redundancies between the content in the first data and the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is different from an attribute associated with the content in the third data, then content of the first data is not deduplicated with respect to the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is the same as an attribute associated with the content in the third data, then content of the first data is deduplicated with respect to the third data, such that portions of content in the first data that are non-redundant with respect to the third data are stored in the storage media by the second system, and portions of the content in the first data that are redundant with respect to the third data are not stored in the storage media and instead a reference pointer is provided to portions of the third data that include a copy of the redundant portions of the content in the first data, instead of duplicating the redundant portions of the content on the storage media, wherein the second data comprises metadata including attributes for identifying the content in the first data as belonging to a class, category or type, wherein the metadata is fixed in size, and wherein the second data further comprises associated stream data related to the content in the first data, wherein the associated stream data provides additional information about the content in the first data, wherein said additional information is not included in the metadata, wherein the associated stream data is not fixed in size, wherein the metadata is used to determine whether the associated stream data is to be deduplicated. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a non-transitory computer readable storage medium including computer usable program code for transmitting a data file from a first system to a second system, said computer program product including:
-
computer usable program code for identifying data included in the data file, prior to transmission of the data file in a data stream to the second system, such that the second system is able to distinguish the first data from the second data in the data file, wherein the first data comprises information stored in the data file as content that is to be backed up on storage media by the second system, and the second data comprises information defining one or more attributes associated with said content in the first data, wherein the second system processes the second data to determine whether content included in the first data is a candidate for deduplication, prior to comparing the content of the first data with third data, previously backed up on the storage media, to determine any redundancies between the content in the first data and the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is different from an attribute associated with the content in the third data, then content of the first data is not deduplicated with respect to the third data, wherein if processing the second data indicates that an attribute associated with the content in the first data is the same as an attribute associated with the content in the third data, then content of the first data is deduplicated with respect to the third data, such that portions of content in the first data that are non-redundant with respect to the third data are stored in the storage media by the second system, and portions of the content in the first data that are redundant with respect to the third data are not stored in the storage media and instead a reference pointer is provided to portions of the third data that include a copy of the redundant portions of the content in the first data, instead of duplicating the redundant portions of the content on the storage media, wherein the second data comprises metadata including attributes for identifying the content in the first data as belonging to a class, category to type, wherein the metadata is fixed in size, and wherein the second data further comprises associated stream data related to the content in the first data, wherein the associated stream data provides additional information about the content in the first data, wherein said additional information is not included in the metadata, wherein the associated stream data is not fixed in size, wherein the metadata is used to determine whether the associated stream data is to be deduplicated. - View Dependent Claims (16, 17)
-
Specification