×

Distinguishing data streams to enhance data storage efficiency

  • US 8,352,540 B2
  • Filed: 03/06/2008
  • Issued: 01/08/2013
  • Est. Priority Date: 03/06/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of backing up a data file from a first system to a second system, wherein the data file comprises first data and second data, the method comprising:

  • identifying data included in the data file, prior to transmission of the data file in a data stream to the second system, such that the second system is able to distinguish the first data from the second data in the data file,wherein the first data comprises information stored in the data file as content that is to be backed up on storage media by the second system, and the second data comprises information defining one or more attributes associated with said content in the first data,wherein the second system processes the second data to determine whether content included in the first data is a candidate for deduplication, prior to comparing the content of the first data with third data, previously backed up on the storage media, to determine any redundancies between the content in the first data and the third data,wherein if processing the second data indicates that an attribute associated with the content in the first data is different from an attribute associated with the content in the third data, then content of the first data is not deduplicated with respect to the third data,wherein if processing the second data indicates that an attribute associated with the content in the first data is the same as an attribute associated with the content in the third data, then content of the first data is deduplicated with respect to the third data,such that portions of content in the first data that are non-redundant with respect to the third data are stored in the storage media by the second system, and portions of the content in the first data that are redundant with respect to the third data are not stored in the storage media and instead a reference pointer is provided to portions of the third data that include a copy of the redundant portions of the content in the first data, instead of duplicating the redundant portions of the content on the storage media,wherein the second data comprises metadata including attributes for identifying the content in the first data as belonging to a class, category or type, wherein the metadata is fixed in size, andwherein the second data further comprises associated stream data related to the content in the first data,wherein the associated stream data provides additional information about the content in the first data,wherein said additional information is not included in the metadata, wherein the associated stream data is not fixed in size,wherein the metadata is used to determine whether the associated stream data is to be deduplicated.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×