DATA OBJECT STORE AND SERVER FOR A CLOUD STORAGE ENVIRONMENT, INCLUDING DATA DEDUPLICATION AND DATA MANAGEMENT ACROSS MULTIPLE CLOUD STORAGE SITES
First Claim
1. A method for storing a secondary copy, of an original data set, on a cloud storage site using a cloud gateway, wherein the cloud gateway is coupled between multiple computers and one or more cloud storage sites via a network, the method comprising:
- identifying data blocks within a cache of the cloud gateway that satisfy certain criteria,wherein the original data set comprises data blocks,wherein the certain criteria are from a storage policy, andwherein the certain criteria include time-based criteria;
performing block-level deduplication of the identified data blocks to create a deduplicated set of data,wherein the block-level deduplication includes—
determining a size for a container file to utilize when deduplicating the identified data blocks; and
deduplicating at least some of the identified data blocks to create one or more container files containing deduplicated data,wherein at least one of the container files has the determined size; and
storing the deduplicated set of data on the cloud storage site by;
buffering data for later transmission to the cloud storage site;
repeating the following steps while the data buffer is not full;
receiving a file system request to write a group of data to the cloud storage site; and
adding the group of data to the buffer;
converting a file system request to one or more application program interface calls associated with the cloud storage site; and
transmitting contents of the buffer to the cloud storage site using the one or more application program interface calls associated with the cloud storage site.
2 Assignments
0 Petitions
Accused Products
Abstract
Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.
120 Citations
46 Claims
-
1. A method for storing a secondary copy, of an original data set, on a cloud storage site using a cloud gateway, wherein the cloud gateway is coupled between multiple computers and one or more cloud storage sites via a network, the method comprising:
-
identifying data blocks within a cache of the cloud gateway that satisfy certain criteria, wherein the original data set comprises data blocks, wherein the certain criteria are from a storage policy, and wherein the certain criteria include time-based criteria; performing block-level deduplication of the identified data blocks to create a deduplicated set of data, wherein the block-level deduplication includes— determining a size for a container file to utilize when deduplicating the identified data blocks; and deduplicating at least some of the identified data blocks to create one or more container files containing deduplicated data, wherein at least one of the container files has the determined size; and storing the deduplicated set of data on the cloud storage site by; buffering data for later transmission to the cloud storage site; repeating the following steps while the data buffer is not full; receiving a file system request to write a group of data to the cloud storage site; and adding the group of data to the buffer; converting a file system request to one or more application program interface calls associated with the cloud storage site; and transmitting contents of the buffer to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. - View Dependent Claims (2, 3)
-
-
4. A system for creating a secondary copy of an original data set using a cloud storage site, wherein the original data set is received from one or more client computers, the system comprising a memory and processor that are configured to:
-
identify sub-objects of the original data set that satisfy certain criteria, wherein the certain criteria are related a storage policy; perform deduplication of the identified data sub-objects to create a deduplicated set of data; and
,forward the deduplicated set of data to the cloud storage site, wherein the forwarding includes; buffering data for later forwarding to the cloud storage site; converting file system requests into application program interface calls associated with the cloud storage site; and
,forwarding the buffered data to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. - View Dependent Claims (5, 6, 41, 42)
-
-
7. A computer-implemented method for copying multiple files at a cloud storage site, wherein the cloud storage site is coupled to a computer executing a file system for accessing a secondary storage computing device, the method comprising:
-
receiving a copy operation request to copy n number of files at the cloud storage site, wherein each of the n number of files includes metadata and data, and wherein the n number of files exceeds a threshold; establishing a container size reflecting one or more factors, wherein the factors include; a latency associated with a network connection to the secondary storage computing device; a bandwidth associated with a network connection to the secondary storage computing device; whether the cloud storage site imposes a restriction on a namespace associated with the computer or the file system; whether the cloud storage site permits sparsification of data files; a pricing structure associated with the cloud storage site; a maximum specified container file size; and a minimum specified container file size; processing the n number of files by— copying the metadata of each of the n number of files to a first container; copying at least a portion of the data for the n number of files into a second container, wherein the second container is separate from the first container; and updating a data structure, wherein the data structure— tracks, for each of the n number of files, a location of the metadata for that file in the first container, and tracks, for the at least a portion of the data for the n number of files, a location of the data in the second container, and wherein the size of at least one of the first and second containers is no greater than the established container size. - View Dependent Claims (8, 9)
-
-
10. A tangible computer-readable storage medium whose contents cause a data storage system to perform a method of migrating data from local primary storage to secondary storage located on a remote cloud storage site, the method comprising:
-
identifying no more than n−
1 data blocks, located within the local primary storage, that satisfy a criteria,wherein the n−
1 data blocks represent a portion of a data file consisting of n blocks, andwherein the n blocks contain data written by a file system associated with the local primary storage; and determining a size for a container file in which to store some or all of the no more than n−
1 data blocks;transferring data contained by the identified no more than n−
1 data blocks from the primary storage to the secondary storage located on a cloud storage site,wherein transferring data includes writing data first to a container file of the determined size; and updating an index with information associating the transferred data with information identifying blocks within the secondary storage that contain the transferred data, wherein the information includes at least one uniform resource locator or logical address that identifies at least one logical location from which the transferred data may be accessed. - View Dependent Claims (11, 43, 44, 45, 46)
-
-
12. A system for storing, on a cloud storage site, a secondary copy of an original data set, the system comprising:
-
means for identifying a cloud storage site on which to store a secondary copy of a primary data set; means for updating an index of content to reflect at least some data content in the primary data set; means for deduplicating at least some of the data content in the primary data set; means for creating one or more container files containing the deduplicated data; and means for transferring the one or more container files to the cloud storage site.
-
-
13-40. -40. (canceled)
Specification