PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD ENVIRONMENT, INCLUDING CONTAINERIZED DEDUPLICATION, DATA PRUNING, AND DATA TRANSFER

US 20130238572A1
Filed: 03/26/2013
Published: 09/12/2013
Est. Priority Date: 06/30/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for indexing and searching multiple content items, the method comprising:

selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items,wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items,wherein the primary copy is available by the computer system over a local area network, andwherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system;

for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system;

analyzing content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and

based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and

storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and

identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various systems and methods may be used for performing data storage operations, including content-indexing, containerized deduplication, and policy-driven storage, within a cloud environment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud environment that requires data transfer over wide area networks, such as the Internet, which may have appreciable latency and/or packet loss, using various network protocols, including HTTP and FTP. Methods for content indexing data stored within a cloud environment may facilitate later searching, including collaborative searching. Methods for performing containerized deduplication may reduce the strain on a system namespace, effectuate cost savings, etc. Methods may identify suitable storage locations, including suitable cloud storage sites, for data files subject to a storage policy. Further, the systems and methods may be used for providing a cloud gateway and a scalable data object store within a cloud environment.

189 Citations

17 Claims

1. A computer-implemented method for indexing and searching multiple content items, the method comprising:
- selecting or accessing, with a secondary copy component of a computing system, at least one secondary copy of the multiple content items,wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items,wherein the primary copy is available by the computer system over a local area network, andwherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system;
  
  for at least some of the multiple content items included in the secondary copy, with a content indexing component of the computing system;
  
  analyzing content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and
  
  based upon the analysis, generating metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and
  
  storing, in a content index, the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and
  
  identifying, with an index searching component of the computing system, one or more indexed content items based on a search query and the metadata stored within the content index.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein the content index further comprises a file name for the content item, a logical descriptor for a client computer that originated the content item, and a size of the content item.
  - 3. The method of claim 1 wherein the content index further comprises a token to uniquely identify each of the multiple content items for the cloud storage site, and wherein the content index indexes content items accessible via the cloud storage site over an HTTP protocol.
  - 4. The method of claim 1, further comprising:
    - identifying and selecting, with the secondary copy component, from an index of secondary copies one secondary copy that is a storage medium of the computing system having higher availability than a secondary copy stored on magnetic tape storage medium.
  - 5. The method of claim 1, further comprising:
    - identifying and selecting, with the secondary copy component, from an index of secondary copies an unencrypted secondary copy versus an encrypted secondary copy.

6. A computer system for indexing and searching multiple content items, the computer system comprising:
- a processor;
  
  a memory;
  
  a secondary copy component configured to select or access at least one secondary copy of the multiple content items,wherein the secondary copy of the multiple content items is a copy of the multiple content items and is not a primary copy of the multiple content items,wherein the primary copy is available by the computer system over a local area network, andwherein the at least one secondary copy is stored at a cloud storage site located geographically remote from the computer system;
  
  a content indexing component configured to, for at least some of the multiple content items included in the secondary copy;
  
  analyze content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and
  
  based upon the analysis, generate metadata corresponding to the content item, wherein the metadata includes at least a logical address to the cloud storage site for accessing the content item; and
  
  store in a content index the generated metadata of the content, wherein the content index is not stored at the cloud storage site, but is locally accessible by the computer system; and
  
  an index searching component configured to identify one or more indexed content items based on a search query and the metadata stored within the content index.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The computer system of claim 6 wherein the content index further comprises a file name for the content item, a logical descriptor for a client computer that originated the content item, and a size of the content item.
  - 8. The computer system of claim 6 wherein the content index further comprises a token to uniquely identify each of the multiple content items for the cloud storage site, and wherein the content index indexes content items accessible via the cloud storage site over an HTTP protocol.
  - 9. The computer system of claim 6 wherein the secondary copy component is further configured to identify and select from an index of secondary copies one secondary copy that is on storage medium having higher availability than a secondary copy stored on magnetic tape storage medium.
  - 10. The computer system of claim 6 wherein the secondary copy component is further configured to identify and select from an index of secondary copies an unencrypted secondary copy versus an encrypted secondary copy.

11. A computer-implemented method for copying multiple files at a cloud storage site, wherein the cloud storage site is coupled to a computer executing a file system for accessing a secondary storage computing device, the method comprising:
- receiving a copy operation request to copy n number of files at the cloud storage site,wherein each of the n number of files includes metadata and data, andwherein the n number of files exceeds a threshold;
  
  establishing a container size reflecting one or more factors, wherein the factors include;
  
  a latency associated with a network connection to the secondary storage computing device;
  
  a bandwidth associated with a network connection to the secondary storage computing device;
  
  whether the cloud storage site imposes a restriction on a namespace associated with the computer or the file system;
  
  whether the cloud storage site permits sparsification of data files;
  
  a pricing structure associated with the cloud storage site;
  
  a maximum specified container file size; and
  
  a minimum specified container file size;
  
  processing the n number of files by—
  
  copying the metadata of each of the n number of files to a first container;
  
  copying at least a portion of the data for the n number of files into a second container, wherein the second container is separate from the first container; and
  
  updating a data structure, wherein the data structure—
  
  tracks, for each of the n number of files, a location of the metadata for that file in the first container, andtracks, for the at least a portion of the data for the n number of files, a location of the data in the second container,and wherein the size of at least one of the first and second containers is no greater than the established container size.
- View Dependent Claims (12, 13)
- - 12. A computer-implemented method of claim 11 wherein the threshold is a number of files that the file system can operate on without system degradation.
  - 13. A computer-implemented method of claim 11 wherein the threshold is related to at least of one of the factors.

14. A method of pruning files containing data that is performed by one or more computing systems, each computing system including a processor and memory, the method comprising:
- receiving an indication to delete a first file, wherein the first file includes a first set of data, and wherein the first file is stored at a cloud storage location;
  
  determining, by the one or more computing systems, if the first set of data references a second set of data included in a second file located at the cloud storage location;
  
  if the first set of data references the second set of data, then;
  
  causing to be deleted any references to the second set of data by the first set of data at the cloud storage location; and
  
  causing to be deleted the second file at the cloud storage location;
  
  determining, by the one or more computing systems, if the first set of data is referenced by at least a third set of data included in a third file at the cloud storage location; and
  
  if the first set of data is referenced by at least the third set of data, then;
  
  deleting any references to the first set of data by the third set of data at the cloud storage location; and
  
  storing an indication to delete the first file at the cloud storage location.
- View Dependent Claims (15)
- - 15. The method of claim 14 method comprising:
    - causing the second file to be deleted by converting one or more generic file system commands to one or more vendor-specific calls for the cloud storage site.

16. A tangible computer-readable storage medium whose contents cause a data storage system to perform a method of migrating data from local primary storage to secondary storage located on a remote cloud storage site, the method comprising:
- identifying no more than n−
  
  1 data blocks, located within the local primary storage, that satisfy a criteria, wherein the n−
  
  1 data blocks represent a portion of a data file consisting of n blocks and the n blocks contain data written by a file system associated with the local primary storage; and
  
  determining a size for a container file in which to store some or all of the no more than n−
  
  1 data blocks;
  
  transferring data contained by the identified no more than n−
  
  1 data blocks from the primary storage to the secondary storage located on a cloud storage site, wherein transferring data comprises writing data first to a container file of the determined size; and
  
  updating an index with information associating the transferred data with information identifying blocks within the secondary storage that contain the transferred data, wherein the information includes at least one uniform resource locator or logical address that identifies at least one logical location from which the transferred data may be accessed.
- View Dependent Claims (17)
- - 17. The tangible computer-readable storage medium of claim 16 wherein the index further comprises information associating the transferred data with information identifying tape offsets for secondary storage that contain the transferred data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Prahlad, Anand, Muller, Marcus S., Kottomtharayil, Rajiv, Kavuri, Srinivas, Gokhale, Parag, Vijayan, Manoj Kumar

Granted Patent

US 9,171,008 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/692
CPC Class Codes

G06F 11/3485   for I/O devices

G06F 16/122   using management policies b...

G06F 16/1748   De-duplication implemented ...

G06F 16/1827   Management specifically ada...

G06F 16/1844   Management specifically ada...

G06F 16/41   Indexing; Data structures t...

G06F 3/06   Digital input from, or digi...

G06F 3/0605   by facilitating the interac...

G06F 3/061   Improving I/O performance

G06F 3/0626   Reducing size or complexity...

G06F 3/0631   by allocating resources to ...

G06F 3/0641   De-duplication techniques

G06F 3/0649   Lifecycle management

G06F 3/0667   at data level, e.g. file, r...

G06F 3/067   Distributed or networked st...

G06Q 30/02   Marketing; Price estimation...

G06Q 30/0206   Price or cost determination...

G06Q 50/188   Electronic negotiation

H04L 63/0428   wherein the data content is...

H04L 67/02   based on web technology, e....

H04L 67/06 : specially adapted for file ...

H04L 67/1095 : Replication or mirroring of...

H04L 67/1097 : for distributed storage of ...

H04L 67/535 : Tracking the activity of th...

H04L 67/56 : Provisioning of proxy servi...

H04L 67/5682 : Policies or rules for updat...

H04L 69/08 : Protocols for interworking;...

Y04S 40/20 : Information technology spec...

View All

PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD ENVIRONMENT, INCLUDING CONTAINERIZED DEDUPLICATION, DATA PRUNING, AND DATA TRANSFER

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

189 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

PERFORMING DATA STORAGE OPERATIONS WITH A CLOUD ENVIRONMENT, INCLUDING CONTAINERIZED DEDUPLICATION, DATA PRUNING, AND DATA TRANSFER

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

189 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others