DATA OBJECT STORE AND SERVER FOR A CLOUD STORAGE ENVIRONMENT, INCLUDING DATA DEDUPLICATION AND DATA MANAGEMENT ACROSS MULTIPLE CLOUD STORAGE SITES
First Claim
1. A system for storing a set of data files to a cloud storage site, the system comprising memory and a processor that are configured to:
- provide multiple requests for cloud storage to two or more cloud storage sites,wherein the multiple requests each include a request for data storage to a cloud storage site;
wherein the multiple requests each include—
information associated with a total size of the set of data files to be stored, andrequirements for the data storage for the set of files;
wherein the multiple requests each include at least one pricing rate request; and
,wherein the two or more cloud storage sites are respectively operated by two or more independent organizations;
receiving a response from each at least two of the two or more cloud storage sites,wherein each of the responses from the at least two cloud storage sites includes;
preferences or criteria associated with data storage for that cloud storage site, anda pricing quote for a data storage job at that cloud storage site;
selecting one of the at least two cloud storage sites based on the received responses,wherein the selecting is based at least in part on the pricing quote and the preferences or criteria associated with data storage for that cloud storage site; and
,providing to the selected cloud storage site the set of data files to be stored according to the provided request, the selected received response, or both the provided request and the selected received response.
2 Assignments
0 Petitions
Accused Products
Abstract
Data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, are performed within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.
60 Citations
21 Claims
-
1. A system for storing a set of data files to a cloud storage site, the system comprising memory and a processor that are configured to:
-
provide multiple requests for cloud storage to two or more cloud storage sites, wherein the multiple requests each include a request for data storage to a cloud storage site; wherein the multiple requests each include— information associated with a total size of the set of data files to be stored, and requirements for the data storage for the set of files; wherein the multiple requests each include at least one pricing rate request; and
,wherein the two or more cloud storage sites are respectively operated by two or more independent organizations; receiving a response from each at least two of the two or more cloud storage sites, wherein each of the responses from the at least two cloud storage sites includes; preferences or criteria associated with data storage for that cloud storage site, and a pricing quote for a data storage job at that cloud storage site; selecting one of the at least two cloud storage sites based on the received responses, wherein the selecting is based at least in part on the pricing quote and the preferences or criteria associated with data storage for that cloud storage site; and
,providing to the selected cloud storage site the set of data files to be stored according to the provided request, the selected received response, or both the provided request and the selected received response. - View Dependent Claims (2, 3, 4)
-
-
5. A system for identifying storage locations for a set of data files subject to a storage policy, wherein the set of data files is generated within a storage operation cell that has multiple client computers, and wherein the storage operation cell is coupled to multiple cloud storage sites via a network, the system comprising a processor that is configured to:
-
group the data files into at least one logical group of data files using a storage policy, wherein the storage policy defines performance-based classes of storage locations on which the set of data files may be stored, wherein the logical grouping of the set of data files facilitates deduplication of the set of data files; determine aggregate storage requirements of a logical group of data files based at least in part on the storage policy; estimate storage costs for each of two or more candidate storage sites based on historical or projected cost information stored within a storage manager computing device, wherein the storage manager computing device tracks and directs storage operations between client computing devices and secondary storage devices for the storage operation cell; based on the estimated storage costs, identify some of the two or more candidate cloud storage sites to store a copy of the logical group of data files, wherein each of the two or more candidate cloud storage sites are operated by independent organizations; generate a request for quotes for storing a copy of the logical group of data files on one of the candidate cloud storage sites, wherein the request for quotes includes the aggregate storage requirements of the logical group of data files; identify a target cloud storage site from the two or more candidate cloud storage sites by evaluating, based at least in part on received quotes, storage costs of storing a copy of the logical group of data files; wherein the received quotes further comprise at least two of; a first pricing rate for an initial upload to the candidate cloud storage sites, a second pricing rate for downloads from the candidate cloud storage sites, a third pricing rate for searching or accessing the candidate cloud storage sites, and a fourth pricing rate for continued storage and maintenance of data on the candidate cloud storage sites; wherein the storage costs include estimated monetary expenses associated with storing the logical group of data objects; and transmit to storage at least some of the logical group of data files from a client computer to the target cloud storage site.
-
-
6. A non-transitory computer-readable medium storing computer-implementable instructions for identifying storage locations for a set of data files, the method comprising:
-
determining storage requirements of a group of data files, wherein the group of data files are logically organized in a primary storage device to facilitate deduplication of the group of data files; estimating storage costs for each of two or more candidate storage sites based on historical or projected cost information stored within a storage manager computing device, wherein the storage manager computing device tracks and directs storage operations between client computing devices and secondary storage devices for a storage operation cell; based on estimating the storage costs, identifying some of the two or more candidate cloud storage sites to store a copy of the group of data files, wherein each of the two or more candidate cloud storage sites are operated by independent organizations; generating a request for quotes for storing a copy of the group of data files on one of the candidate cloud storage sites, wherein the request for quotes includes the aggregate storage requirements of the group of data files, and wherein the request for quotes is provided to the two or more candidate cloud storage sites; receiving one or more quotes from each of the two or more candidate cloud storage sites, wherein the received quotes include a least price rates for different types of storage media used at the two or more candidate cloud storage sites, wherein received quotes from individual candidate cloud storage sites further comprise at least two of; a first pricing rate for an initial upload to the candidate cloud storage sites, a second pricing rate for downloads from the candidate cloud storage sites, a third pricing rate for searching or accessing the candidate cloud storage sites, and a fourth pricing rate for continued storage and maintenance of data on the candidate cloud storage sites; identifying a target cloud storage site from the two or more candidate cloud storage sites by evaluating, based at least in part on the received quotes, storage costs of storing a copy of the group of data files, wherein the storage costs include estimated monetary expenses associated with storing the group of data objects, and wherein evaluating storage costs is based at least in part on a reputation or reliability of the candidate cloud storage sites; and causing at least some of the group of data files to be sent to the target cloud storage site for storage. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for storing a secondary copy, of an original data set, on a cloud storage site using a cloud gateway, wherein the cloud gateway is coupled between multiple computers and one or more cloud storage sites via a network, the method comprising:
-
identifying data blocks within a cache of the cloud gateway that satisfy certain criteria, wherein the original data set comprises data blocks, wherein the certain criteria are from a storage policy, and wherein the certain criteria include time-based criteria; performing block-level deduplication of the identified data blocks to create a deduplicated set of data, wherein the block-level deduplication includes— determining a size for a container file to utilize when deduplicating the identified data blocks; and deduplicating at least some of the identified data blocks to create one or more container files containing deduplicated data, wherein at least one of the container files has the determined size; and storing the deduplicated set of data on the cloud storage site by; buffering data for later transmission to the cloud storage site; repeating the following steps while the data buffer is not full; receiving a file system request to write a group of data to the cloud storage site; and adding the group of data to the buffer; converting a file system request to one or more application program interface calls associated with the cloud storage site; and transmitting contents of the buffer to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. - View Dependent Claims (16, 17)
-
-
18. A system for creating a secondary copy of an original data set using a cloud storage site, wherein the original data set is received from one or more client computers, the system comprising a memory and processor that are configured to:
-
identify sub-objects of the original data set that satisfy certain criteria, wherein the certain criteria are related a storage policy; perform deduplication of the identified data sub-objects to create a deduplicated set of data; and
,forward the deduplicated set of data to the cloud storage site, wherein the forwarding includes; buffering data for later forwarding to the cloud storage site; converting file system requests into application program interface calls associated with the cloud storage site; and
,forwarding the buffered data to the cloud storage site using the one or more application program interface calls associated with the cloud storage site. - View Dependent Claims (19, 20)
-
-
21. A computer-readable medium, excluding a transitory propagating signal, that carries instructions, which when executed by a processor, utilize cloud storage resources to store at least a first portion of at least one data object within a network attached storage (NAS) device, wherein the NAS device includes a NAS file system and a non-volatile data store, and wherein the NAS device is communicatively coupled to access the cloud storage resources, comprising:
-
accessing calls to or from the NAS file system for reading of data from, or writing of data to, the non-volatile data store of the NAS device, wherein the at least one data object consists of multiple data blocks, wherein the non-volatile data store of the NAS device stores the multiple data blocks of the at least one data object; wherein the NAS file system of the NAS device controls the reading of data from or the writing of data to the multiple data blocks of the at least one data object, and wherein the accessing includes identifying individual blocks or groups of blocks within the multiple data blocks of the at least one data object that the NAS file system of the NAS device reads data from or writes data to; based on the accessing, identifying a portion of the multiple data blocks of the at least one data object that satisfies a data storage criteria; and automatically transferring the identified portion of the multiple data blocks for storage by the cloud storage resources.
-
Specification