Performing data storage operations with a cloud environment, including containerized deduplication, data pruning, and data transfer
First Claim
1. A computer-implemented method for indexing and searching multiple content items, the method comprising:
- selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items,wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items,wherein the primary copy is available by the computer system over a local area network, andwherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system;
for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system;
analyzing content of a content item, including analyzing a summary of the content item;
based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and
storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and
identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index.
4 Assignments
0 Petitions
Accused Products
Abstract
Various systems and methods may be used for performing data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods for content indexing data stored within a cloud environment may facilitate later searching, including collaborative searching. Methods for performing containerized deduplication may reduce the strain on a system namespace, effectuate cost savings, etc. Methods may identify suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, the systems and methods may be used for providing a cloud gateway and a scalable data object store within a cloud environment.
267 Citations
15 Claims
-
1. A computer-implemented method for indexing and searching multiple content items, the method comprising:
-
selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items, wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items, wherein the primary copy is available by the computer system over a local area network, and wherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system; for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system; analyzing content of a content item, including analyzing a summary of the content item; based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer system for indexing and searching multiple content items, the computer system comprising:
-
a processor configured to communicate with components of the computer system; a memory configured to communicate with the processor; a secondary copy component configured to select or access at least one secondary copy of the multiple content items, wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items, wherein the primary copy is available by the computer system over a local area network, and wherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system; a content indexing component configured to, for at least some of the multiple content items included in the secondary copy; analyze content of a content item, including analyzing a summary of the content item; and based upon the analysis, generate metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and store in a content index the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and an index searching component configured to identify one or more indexed content items based on a search query and the metadata stored within the content index. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-readable medium, excluding a transitory propagating signal, that stores instructions that when executed by a computer system cause the computer system to perform operation for indexing and searching multiple content items, the operations comprising:
-
selecting or accessing at least one secondary copy of the multiple content items, wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items, wherein the primary copy is available by the computer system over a local area network, and wherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system; for at least some of the multiple content items included in the secondary copy; analyzing content of a content item, including analyzing a summary of the content item; based upon the analysis, generate metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and storing, in a content index the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and identifying one or more indexed content items based on a search query and the metadata stored within the content index. - View Dependent Claims (12, 13, 14, 15)
-
Specification