METHOD AND SYSTEM FOR OFFLINE INDEXING OF CONTENT AND CLASSIFYING STORED DATA

US 20110093470A1
Filed: 12/20/2010
Published: 04/21/2011
Est. Priority Date: 10/17/2006
Status: Active Grant

First Claim

Patent Images

1. In a data management system residing within a private computer network, a method for indexing multiple content items that is performed by one or more computing systems, each computing system having a processor and memory, the method comprising:

selecting a secondary copy of the multiple content items from the private computer network,wherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items,wherein the primary copy is available from one or more live computers within the private computer network,wherein the secondary copy of the multiple content items is created at a first time, andwherein the secondary copy of the multiple content items is selected at a second time later than the first time;

identifying at least some of the multiple content items within the secondary copy; and

for each of the identified content items;

analyzing substantially all of the contents of an identified content item, by the one or more computing systems; and

creating and/or updating multiple index entries within a content index, by the one or more computing systems,wherein an index entry corresponds to an identified content item,wherein an index entry is created and/or updated based upon the analysis of substantially all of the contents of an identified content item,wherein the analyzing and creating and/or updating are performed without impacting the one or more live computers from which the primary copy of the multiple content items is available, andwherein the index entry is created and/or updated to include an availability criteria.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for creating an index of content without interfering with the source of the content includes an offline content indexing system that creates an index of content from an offline copy of data. The system may associate additional properties or tags with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. Users can search the created index to locate content that is no longer available or based on the associate attributes.

Citations

21 Claims

1. In a data management system residing within a private computer network, a method for indexing multiple content items that is performed by one or more computing systems, each computing system having a processor and memory, the method comprising:
- selecting a secondary copy of the multiple content items from the private computer network,wherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items,wherein the primary copy is available from one or more live computers within the private computer network,wherein the secondary copy of the multiple content items is created at a first time, andwherein the secondary copy of the multiple content items is selected at a second time later than the first time;
  
  identifying at least some of the multiple content items within the secondary copy; and
  
  for each of the identified content items;
  
  analyzing substantially all of the contents of an identified content item, by the one or more computing systems; and
  
  creating and/or updating multiple index entries within a content index, by the one or more computing systems,wherein an index entry corresponds to an identified content item,wherein an index entry is created and/or updated based upon the analysis of substantially all of the contents of an identified content item,wherein the analyzing and creating and/or updating are performed without impacting the one or more live computers from which the primary copy of the multiple content items is available, andwherein the index entry is created and/or updated to include an availability criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein selecting a secondary copy comprises examining backup data.
  - 3. The method of claim 1 wherein selecting a secondary copy comprises examining a change journal.
  - 4. The method of claim 1 wherein selecting a secondary copy comprises examining a data snapshot.
  - 5. The method of claim 1 wherein creating and/or updating the multiple index entries within the content index comprises creating and/or updating the multiple index entries within the content index in response to receiving a search request.
  - 6. The method of claim 1 wherein creating and/or updating the multiple index entries within the content index comprises creating and/or updating the multiple index entries within the content index in response to an index update policy
  - 7. The method of claim 1 further comprising before creating and/or updating the content index, eliminating duplicate content within the selected secondary copy.
  - 8. The method of claim 1 wherein creating and/or updating the multiple index entries within the content index comprises creating and/or updating the multiple index entries within the content index incrementally based on incremental changes to one or more of the multiple content items.

9. A computer system for indexing and searching multiple content items, the computer system comprising:
- a processor;
  
  a memory;
  
  a secondary copy component configured to select a secondary copy of the multiple content itemswherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items,wherein the primary copy is available from one or more computer systems,wherein the secondary copy of the multiple content items is created at a first time, andwherein the secondary copy component selects the secondary copy of the multiple content items at a second time later than the first time;
  
  a content indexing component configured to, for at least some of the multiple content items included in the selected secondary copy;
  
  analyze content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and
  
  based upon the analysis, create and/or update a content index; and
  
  an index searching component configured to identify one or more indexed content items based on a received search query,wherein the content indexing component performs the analyzing and creating and/or updating without consuming resources of the one or more computing systems from which the primary copy of the multiple content items is available, andwherein the index searching component provides an availability criterion for at least some of the identified one or more indexed content items.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The system of claim 9 wherein the content indexing component is further configured to decrypt encrypted content.
  - 11. The system of claim 9 wherein the content indexing component is further configured to create and/or update the content index based on an indexing policy.
  - 12. The system of claim 9 further comprising a data classification component configured to classify content and add classifications to the content index.
  - 13. The system of claim 9 wherein the secondary copy component is further configured to select a secondary copy of the multiple content items from among multiple secondary copies of the multiple content items based on the time required to access each of the multiple secondary copies of the multiple content items.

14. In a data management system residing within a private computer network, a method for indexing content that is performed by one or more computing systems, each computing system having a processor and memory, comprising:
- selecting a secondary copy of the content from the private computer network,wherein the secondary copy of the content is a copy of the content that is not a primary copy of the content,wherein the primary copy is available from one or more live computers within the private computer network,wherein the secondary copy of the content is created at a first time, andwherein the secondary copy of the content is selected at a second time later than the first time;
  
  identifying at least some of the content within the secondary copy;
  
  analyzing attributes of the identified content, by the one or more computer systems;
  
  based upon the analysis, classifying the identified content, by the one or more computer systems; and
  
  updating a content index that includes the classifications of the identified content, by the one or more computing systems,wherein the analyzing, classifying, and updating are performed without impacting the one or more live computers that store the primary copy of the content, andwherein the context index includes an availability of at least the identified content.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. The method of claim 14 wherein updating the content index comprises determining a state of protection of the identified content.
  - 16. The method of claim 14 wherein updating the content index comprises determining whether the identified content is encrypted.
  - 17. The method of claim 14 wherein updating the content index comprises determining whether the identified content has associated access control information.
  - 18. The method of claim 14 wherein updating the content index comprises determining a topology of a network in which the identified content is stored.
  - 19. The method of claim 14 wherein updating the content index comprises determining whether the identified content contains one or more specified keywords.
  - 20. The method of claim 14 further comprising before updating the content index, eliminating duplicate content within the selected secondary copy.
  - 21. The method of claim 14 wherein updating a content index comprises updating the content index incrementally based on incremental changes to the content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Gokhale, Parag, Attarde, Deepak R., Ahn, Jun H., Kottomtharayil, Rajiv

Granted Patent

US 8,037,031 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/741
CPC Class Codes

G06F 11/1446   Point-in-time backing up or...

G06F 16/2228   Indexing structures

G06F 16/2372   Updates performed during of...

G06F 16/319   Inverted lists

G06F 2201/84   Using snapshots, i.e. a log...

METHOD AND SYSTEM FOR OFFLINE INDEXING OF CONTENT AND CLASSIFYING STORED DATA

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR OFFLINE INDEXING OF CONTENT AND CLASSIFYING STORED DATA

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links