Cloud storage and networking agents, including agents for utilizing multiple, different cloud storage sites
First Claim
1. A system for storing, on each of multiple target cloud storage sites, a secondary copy of an original data set, the system comprising:
- a memory having instructions; and
a processor coupled to the memory to execute the instructions, the instructions including;
a network agent comprising an hypertext transfer protocol (HTTP) subagent configured to establish and manage a network connection between the system and the multiple cloud storage sites,wherein the network connection utilizes at least one of HTTP and HTTP over Transport Layer Security/Secure Sockets Layer,wherein the multiple cloud storage sites are each operated by a different vendor, andwherein each of the multiple cloud storage sites employs vendor-specific calls specified by an application programming interface for that specific cloud storage site; and
a cloud storage submodule configured to at least open, read, and write data files stored on each of the multiple cloud storage sites and to direct the multiple cloud storage sites to perform data storage operations, wherein the cloud storage submodule is configured to create a secondary copy of an original data set by at least;
determining that the original data set is not already buffered in a first buffer as a result of being associated with an earlier received data transfer request;
buffering in a second buffer as a substantially parallel process a series of received data transfer requests and a copy of a subset of the original data set,wherein the buffering includes;
determining a latency of the network connection,
increasing buffer size for at least the second buffer when the latency of the network connection increases, and
decreasing the buffer size for at least the second buffer when the latency of the network connection decreases, to improve data transmission efficiency;
converting a series of received generic file system commands to store the copy of the subset of the original data set into vendor-specific calls specified by the application programming interface utilized by a selected one of the multiple cloud storage sites;
indexing content associated with the secondary copy, to create an index that facilitates searching of indexed content of the secondary copy while the secondary copy resides on one or more of the multiple cloud storage sites; and
transferring the buffered copy of the subset of the original data set over the network connection established by the network agent to the selected one cloud storage site, to create the secondary copy of the original data set at the selected one cloud storage site,wherein buffering in the second buffer the copy of the subset of the original data set occurs prior to transferring the buffered copy of the subset of the original data set to the selected one cloud storage site,wherein indexing content of the secondary copy occurs prior to transferring the buffered copy of the subset of the original data set to the selected one cloud storage site, and,wherein the index for the indexed content is stored locally, and not at the selected one cloud storage site, to facilitate searching of the indexed content by locally networked client computing devices.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for performing data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods are disclosed for content indexing data stored within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containerized deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, systems and methods for providing a cloud gateway and a scalable data object store within a cloud environment are disclosed, along with other features.
377 Citations
20 Claims
-
1. A system for storing, on each of multiple target cloud storage sites, a secondary copy of an original data set, the system comprising:
-
a memory having instructions; and a processor coupled to the memory to execute the instructions, the instructions including; a network agent comprising an hypertext transfer protocol (HTTP) subagent configured to establish and manage a network connection between the system and the multiple cloud storage sites, wherein the network connection utilizes at least one of HTTP and HTTP over Transport Layer Security/Secure Sockets Layer, wherein the multiple cloud storage sites are each operated by a different vendor, and wherein each of the multiple cloud storage sites employs vendor-specific calls specified by an application programming interface for that specific cloud storage site; and a cloud storage submodule configured to at least open, read, and write data files stored on each of the multiple cloud storage sites and to direct the multiple cloud storage sites to perform data storage operations, wherein the cloud storage submodule is configured to create a secondary copy of an original data set by at least; determining that the original data set is not already buffered in a first buffer as a result of being associated with an earlier received data transfer request; buffering in a second buffer as a substantially parallel process a series of received data transfer requests and a copy of a subset of the original data set, wherein the buffering includes;
determining a latency of the network connection,
increasing buffer size for at least the second buffer when the latency of the network connection increases, and
decreasing the buffer size for at least the second buffer when the latency of the network connection decreases, to improve data transmission efficiency;converting a series of received generic file system commands to store the copy of the subset of the original data set into vendor-specific calls specified by the application programming interface utilized by a selected one of the multiple cloud storage sites; indexing content associated with the secondary copy, to create an index that facilitates searching of indexed content of the secondary copy while the secondary copy resides on one or more of the multiple cloud storage sites; and transferring the buffered copy of the subset of the original data set over the network connection established by the network agent to the selected one cloud storage site, to create the secondary copy of the original data set at the selected one cloud storage site, wherein buffering in the second buffer the copy of the subset of the original data set occurs prior to transferring the buffered copy of the subset of the original data set to the selected one cloud storage site, wherein indexing content of the secondary copy occurs prior to transferring the buffered copy of the subset of the original data set to the selected one cloud storage site, and, wherein the index for the indexed content is stored locally, and not at the selected one cloud storage site, to facilitate searching of the indexed content by locally networked client computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for managing storage of data within various storage resources, including local storage devices and remote cloud storage resources, wherein the system forms part of a storage operation cell hierarchy, wherein the storage operation cell hierarchy includes multiple storage operation cells organized in one or more hierarchical relationships, the system comprising:
-
one or more computing devices; one or more local storage devices coupled to the one or more computing devices over a local or proprietary network, wherein the one or more local storage devices are configured to store data files from the one or more computing devices; and a storage operation cell within the storage operation cell hierarchy, wherein the storage operation cell hierarchy includes multiple storage operation cells organized in one or more hierarchical relationships, wherein the storage operation cell includes— a data agent component for accessing the data files of the one or more computing devices or the one or more local storage devices; a secondary storage computing component for communicating with the one or more computing devices or one or more local storage devices, wherein the secondary storage computing component further comprises— a network agent configured to establish a network connection between the secondary storage computing component and the cloud storage resources; and a cloud storage submodule configured to request storage of the data files via the cloud storage resources, wherein the cloud storage submodule is further configured to; determine that the data files are not already buffered in a first buffer as a result of being associated with an earlier received storage request, buffer in a second buffer as a substantially parallel process a copy of a subset of the data files from the one or more computing devices,
wherein buffering the copy of the subset of the data files includes;
determining a latency of the network connection, and
adjusting a size of the at least the second buffer based on the determined latency of the network connection;convert received generic file system commands to store the data files into calls specified by an interface for the cloud storage resources; create a content index of the subset of the data files to facilitate searching of content of the data files by one or more computing device,
wherein the content index is stored on the local or proprietary network; andafter buffering the copy of the subset of the data files and after creating the content index, send the copy of the subset of the data files over the established network connection for storage by the cloud storage resources. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A system for storing, on each of multiple target cloud storage sites, a secondary copy of an original data set, the system comprising:
-
network connection means for managing communications utilizing a network protocol, and for establishing a network connection, at least indirectly, with each of the multiple cloud storage sites, wherein the multiple cloud storage sites are each operated by a different vendor, and wherein each of the multiple cloud storage sites employ differing interfaces or commands for writing data files to or reading data files from the cloud storage site a cloud storage means for at least opening, reading, and writing data files stored on each of the cloud storage site and to direct the cloud storage site to perform data storage operations, wherein the cloud storage means creates the secondary copy of the original data set by at least; determining that the original data set is not already buffered in a first buffer as a result of being associated with an earlier request to store a copy of data, buffering in a second buffer as a substantially parallel process a copy of a subset of the original data set, wherein the buffering includes; determining a latency of the network connection, and adjusting buffer sizes for at least the second buffer based on the latency of the network connection; converting a series of received generic file system commands to store the copy of the subset of the original data set into vendor-specific commands utilized by a selected one of the multiple cloud storage sites; and indexing content of the original data set to facilitate searching the content of the original data set by one or more local computing devices while the original data set resides on one or more of the multiple cloud storage sites as the secondary copy; and data transfer means for transferring the original data set to storage within the cloud storage site, wherein the means for transferring includes transferring the original data set after buffering the copy and after indexing content of original data set. - View Dependent Claims (18)
-
-
19. A method for storing, on each of multiple target cloud storage sites, a secondary copy of an original data set, the method comprising:
-
managing communications with a network protocol for establishing a network connection, at least indirectly, with each of the multiple cloud storage sites, wherein the multiple cloud storage sites are each operated by a different vendor, and wherein each of the multiple cloud storage sites employ differing interfaces or commands for writing data files to or reading data files from the cloud storage site; determining that the original data set is not already buffered in a first buffer as a result of being associated with an earlier received data transfer request, buffering in a second buffer as a substantially parallel process a series of received data transfer requests and a copy of a subset of the original data set, wherein buffering includes; determining a latency of the network connection, and adjusting a size of at least the second buffer used for the buffering, based on the latency determined for the network connection; deduplicating the buffered copy of the subset to reduce a duration of transmission of the buffered copy over the network connection; creating a local copy of a content index associated with content of the buffered copy of the subset to facilitate local searching of at least part of the original data set after the subset of the original data set is transferred to a selected one of the multiple cloud storage sites; converting a series of received generic file system commands to store the copy of the subset of the original data set into vendor-specific calls specified by an application programming interface utilized by the selected one of the multiple cloud storage sites, wherein the differing interfaces or commands are the vendor-specific calls specified by the application programming interface; and transferring the buffered copy of the subset of the original data set over the network connection to the selected one of the multiple cloud storage sites. - View Dependent Claims (20)
-
Specification