PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD ENVIRONMENT, INCLUDING CONTAINERIZED DEDUPLICATION, DATA PRUNING, AND DATA TRANSFER
First Claim
1. A computer-implemented method for indexing and searching multiple content items, the method comprising:
- selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items,wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items,wherein the primary copy is available by the computer system over a local area network, andwherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system;
for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system;
analyzing content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and
based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and
storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and
identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index.
4 Assignments
0 Petitions
Accused Products
Abstract
Various systems and methods may be used for performing data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods for content indexing data stored within a cloud environment may facilitate later searching, including collaborative searching. Methods for performing containerized deduplication may reduce the strain on a system namespace, effectuate cost savings, etc. Methods may identify suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, the systems and methods may be used for providing a cloud gateway and a scalable data object store within a cloud environment.
189 Citations
17 Claims
-
1. A computer-implemented method for indexing and searching multiple content items, the method comprising:
-
selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items, wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items, wherein the primary copy is available by the computer system over a local area network, and wherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system; for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system; analyzing content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer system for indexing and searching multiple content items, the computer system comprising:
-
a processor; a memory; a secondary copy component configured to select or access at least one secondary copy of the multiple content items, wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items, wherein the primary copy is available by the computer system over a local area network, and wherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system; a content indexing component configured to, for at least some of the multiple content items included in the secondary copy; analyze content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and based upon the analysis, generate metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and store in a content index the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and an index searching component configured to identify one or more indexed content items based on a search query and the metadata stored within the content index. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-implemented method for copying multiple files at a cloud storage site, wherein the cloud storage site is coupled to a computer executing a file system for accessing a secondary storage computing device, the method comprising:
-
receiving a copy operation request to copy n number of files at the cloud storage site, wherein each of the n number of files includes metadata and data, and wherein the n number of files exceeds a threshold; establishing a container size reflecting one or more factors, wherein the factors include; a latency associated with a network connection to the secondary storage computing device; a bandwidth associated with a network connection to the secondary storage computing device; whether the cloud storage site imposes a restriction on a namespace associated with the computer or the file system; whether the cloud storage site permits sparsification of data files; a pricing structure associated with the cloud storage site; a maximum specified container file size; and a minimum specified container file size; processing the n number of files by— copying the metadata of each of the n number of files to a first container; copying at least a portion of the data for the n number of files into a second container, wherein the second container is separate from the first container; and updating a data structure, wherein the data structure— tracks, for each of the n number of files, a location of the metadata for that file in the first container, and tracks, for the at least a portion of the data for the n number of files, a location of the data in the second container, and wherein the size of at least one of the first and second containers is no greater than the established container size. - View Dependent Claims (12, 13)
-
-
14. A method of pruning files containing data that is performed by one or more computing systems, each computing system including a processor and memory, the method comprising:
-
receiving an indication to delete a first file, wherein the first file includes a first set of data, and wherein the first file is stored at a cloud storage location; determining, by the one or more computing systems, if the first set of data references a second set of data included in a second file located at the cloud storage location; if the first set of data references the second set of data, then; causing to be deleted any references to the second set of data by the first set of data at the cloud storage location; and causing to be deleted the second file at the cloud storage location; determining, by the one or more computing systems, if the first set of data is referenced by at least a third set of data included in a third file at the cloud storage location; and if the first set of data is referenced by at least the third set of data, then; deleting any references to the first set of data by the third set of data at the cloud storage location; and storing an indication to delete the first file at the cloud storage location. - View Dependent Claims (15)
-
-
16. A tangible computer-readable storage medium whose contents cause a data storage system to perform a method of migrating data from local primary storage to secondary storage located on a remote cloud storage site, the method comprising:
-
identifying no more than n−
1 data blocks, located within the local primary storage, that satisfy a criteria, wherein the n−
1 data blocks represent a portion of a data file consisting of n blocks and the n blocks contain data written by a file system associated with the local primary storage; anddetermining a size for a container file in which to store some or all of the no more than n−
1 data blocks;transferring data contained by the identified no more than n−
1 data blocks from the primary storage to the secondary storage located on a cloud storage site, wherein transferring data comprises writing data first to a container file of the determined size; andupdating an index with information associating the transferred data with information identifying blocks within the secondary storage that contain the transferred data, wherein the information includes at least one uniform resource locator or logical address that identifies at least one logical location from which the transferred data may be accessed. - View Dependent Claims (17)
-
Specification