Method and system for offline indexing of content and classifying stored data

US 9,158,835 B2
Filed: 05/01/2012
Issued: 10/13/2015
Est. Priority Date: 10/17/2006
Status: Expired due to Fees

First Claim

Patent Images

1. In a data management system residing within a private computer network, a method for indexing content, comprising:

identifying a production copy having one or more production data files each having keywords and metadata,wherein the production copy is available from a production data server within the private computer network;

identifying an offline copy from the private computer network,wherein the offline copy includes one or more offline data files each having keywords and metadata, andwherein the offline data files are copies of the one or more production data files, andwherein the offline copy of the one or more offline data files is stored in one or more secondary storage devices;

restoring the identified offline copy to an intermediate server,wherein the intermediate server is different from the production data server andwherein the intermediate server has a higher availability than the secondary storage devices;

identifying keywords from the restored offline copy on the intermediate server,wherein the identifying of the keywords is performed without use of the production data server and without accessing the production copy;

creating a content index of the identified keywords on the intermediate server after the identifying of the keywords,wherein the content index classifies the identified keywords based on at least one or more user-defined classifications,wherein the user-defined classifications include administratively defined groups within an organization or organization departments,wherein the content index is in an unencrypted form even when the offline copy is encrypted, andwherein the creating of the content index is performed without affecting the production data server; and

updating the content index by associating the offline data files with the production data files.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for creating an index of content without interfering with the source of the content includes an offline content indexing system that creates an index of content from an offline copy of data. The system may associate additional properties or tags with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. Users can search the created index to locate content that is no longer available or based on the associate attributes.

314 Citations

20 Claims

1. In a data management system residing within a private computer network, a method for indexing content, comprising:
- identifying a production copy having one or more production data files each having keywords and metadata,wherein the production copy is available from a production data server within the private computer network;
  
  identifying an offline copy from the private computer network,wherein the offline copy includes one or more offline data files each having keywords and metadata, andwherein the offline data files are copies of the one or more production data files, andwherein the offline copy of the one or more offline data files is stored in one or more secondary storage devices;
  
  restoring the identified offline copy to an intermediate server,wherein the intermediate server is different from the production data server andwherein the intermediate server has a higher availability than the secondary storage devices;
  
  identifying keywords from the restored offline copy on the intermediate server,wherein the identifying of the keywords is performed without use of the production data server and without accessing the production copy;
  
  creating a content index of the identified keywords on the intermediate server after the identifying of the keywords,wherein the content index classifies the identified keywords based on at least one or more user-defined classifications,wherein the user-defined classifications include administratively defined groups within an organization or organization departments,wherein the content index is in an unencrypted form even when the offline copy is encrypted, andwherein the creating of the content index is performed without affecting the production data server; and
  
  updating the content index by associating the offline data files with the production data files.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 13, 15, 16, 18, 19)
- - 2. The method of claim 1 wherein creating or updating the content index comprises determining a state of data protection of the identified keywords.
  - 3. The method of claim 1 wherein creating or updating the content index comprises determining whether the identified keywords are encrypted.
  - 4. The method of claim 1 wherein creating or updating the content index comprises determining whether the identified keywords have associated access control information.
  - 5. The method of claim 1 wherein creating or updating the content index comprises determining at least a portion of a topology of a network in which the identified keywords is stored.
  - 6. The method of claim 1 wherein creating or updating the content index comprises determining whether the identified keywords contain one or more specified text strings or words.
  - 7. The method of claim 1 further comprising eliminating duplicate keywords within or among the offline data files.
  - 8. The method of claim 1 wherein the identifying an offline copy includes identifying a copy to use from among multiple offline copies based on a time required to access each of the multiple offline copies.
  - 13. The method of claim 1, further comprising:
    - associating availability information with index entries, wherein the availability information is based upon a location of an offline data file on one of the secondary storage devices referenced by the index entry;
      
      searching the index for keywords;
      
      generating search results based on the searching; and
      
      ranking the search results based on availability information.
  - 15. The method of claim 1, further comprising searching the index for information about an availability of data files, wherein the availability is based upon locations of the offline data files on one or more of the secondary storage devices, and providing search results indicating times required to access the data files based upon respective locations and availabilities of the offline data files.
  - 16. The method of claim 1, wherein the offline copy includes data files not available from mounted disk media or faster media, and wherein the offline copy is stored in an archive or secondary storage location.
  - 18. The method of claim 1, wherein the identifying the keywords is performed in response to receiving a request for information related to the keywords.
  - 19. The method of claim 1, wherein the indexing the keywords is delayed until single instancing is performed on the offline copy to reduce redundant indexing of the keywords.

9. A computer system for indexing and searching content of one or more data files, wherein the data files include keywords and metadata, and wherein the computer system is coupled to one or more secondary storage devices, the computer system comprising:
- a memory having instructions;
  
  a processor coupled to the memory to execute the instructions, wherein the instructions include;
  
  a production component configured to manage one or more production data files in the memorywherein each of the production data files includes keywords and metadata;
  
  an offline copy component configured to identify an offline copy of the one or more production data files after the offline copy is stored in the one or more secondary storage devices,wherein the offline copy contains one or more offline data files, andwherein the offline copy is distinguishable from a source of the one or more production data files;
  
  a restoring component configured to restore the identified offline copy to an intermediate server,wherein the intermediate server is different from the source of the one or more production data files andwherein the intermediate server has a higher availability than the secondary storage devices;
  
  an indexing component configured to create an index of keywords from the restored offline copy on the intermediate server after the offline copy component identifies the offline copy,wherein the index contains classifications of the one or more offline data files having the keywords;
  
  wherein the classifications include a level of confidentiality for the one or more offline data files having the keywords,wherein the index is in an unencrypted form even when the offline copy is encrypted, andwherein the index of the keywords is created without consuming additional resources of a system that is the source of the one or more production data files; and
  
  an index searching component configured to select certain indexed keywords based on a received search query and the classifications contained within the index.
- View Dependent Claims (10, 11, 12, 14, 17)
- - 10. The computer system of claim 9 wherein the indexing component decrypts encrypted keywords.
  - 11. The computer system of claim 9 wherein the indexing component selects a copy to use for indexing from among multiple offline copies of the production data files based on a time required to access each of the multiple offline copies.
  - 12. The computer system of claim 9 wherein the indexing component selects a copy to use for indexing from among multiple offline copies of the production data files based on a server load or backup schedules associated with at least some of the multiple offline copies.
  - 14. The computer system of claim 9, wherein the indexing component is further configured to associate availability information with index entries, wherein the availability information is based upon a location of an offline data file on one of the secondary storage devices referenced in the index entry.
  - 17. The computer system of claim 9, wherein the offline copy includes keywords not available from mounted disk media or faster media, and wherein the offline copy is stored in an archive or secondary storage location.

20. A non-transitory computer-readable medium storing instructions, which when executed by at least one data processor, indexes and searches content of one or more data files, wherein the data files include keywords and metadata, comprising:
- identifying an offline copy of one or more production data files after the offline copy is stored in one or more secondary storage devices,wherein the offline copy contains one or more offline data files,wherein the offline copy is distinguishable from a source of the one or more production data files, andwherein each of the offline data files has keywords and metadata;
  
  restoring the identified offline copy to an intermediate server,wherein the intermediate server is different from the source of the one or more production data files andwherein the intermediate server has a higher availability than the secondary storage devices;
  
  creating an index of keywords from the restored offline copy on the intermediate server after the offline copy is identified,wherein the index contains classifications of the one or more offline data files having the keywords,wherein the classifications include a level of confidentiality for the one or more offline data files having the keywords,wherein the index is in an unencrypted form even when the offline copy is encrypted, andwherein the index of the keywords is created without consuming additional resources of a system that is the source of the one or more production data files; and
  
  selecting certain indexed keywords based on a received search query and the classifications contained within the index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Prahlad, Anand, Schwartz, Jeremy A., Ngo, David, Brockway, Brian, Muller, Marcus S., Gokhale, Parag, Kottomtharayil, Rajiv
Primary Examiner(s)
Ahn, Sangwoo

Application Number

US13/461,434
Publication Number

US 20120215745A1
Time in Patent Office

1,260 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 11/1446   Point-in-time backing up or...

G06F 16/2228   Indexing structures

G06F 16/2372   Updates performed during of...

G06F 16/319   Inverted lists

G06F 2201/84   Using snapshots, i.e. a log...

Method and system for offline indexing of content and classifying stored data

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

314 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for offline indexing of content and classifying stored data

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

314 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links