METHOD AND SYSTEM FOR OFFLINE INDEXING OF CONTENT AND CLASSIFYING STORED DATA
First Claim
1. In a data management system residing within a private computer network, a method for indexing multiple content items that is performed by one or more computing systems, each computing system having a processor and memory, the method comprising:
- selecting a secondary copy of the multiple content items from the private computer network,wherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items,wherein the primary copy is available from one or more live computers within the private computer network,wherein the secondary copy of the multiple content items is created at a first time, andwherein the secondary copy of the multiple content items is selected at a second time later than the first time;
identifying at least some of the multiple content items within the secondary copy; and
for each of the identified content items;
analyzing substantially all of the contents of an identified content item, by the one or more computing systems; and
creating and/or updating multiple index entries within a content index, by the one or more computing systems,wherein an index entry corresponds to an identified content item,wherein an index entry is created and/or updated based upon the analysis of substantially all of the contents of an identified content item,wherein the analyzing and creating and/or updating are performed without impacting the one or more live computers from which the primary copy of the multiple content items is available, andwherein the index entry is created and/or updated to include an availability criteria.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for creating an index of content without interfering with the source of the content includes an offline content indexing system that creates an index of content from an offline copy of data. The system may associate additional properties or tags with data that are not part of traditional indexing of content, such as the time the content was last available or user attributes associated with the content. Users can search the created index to locate content that is no longer available or based on the associate attributes.
-
Citations
21 Claims
-
1. In a data management system residing within a private computer network, a method for indexing multiple content items that is performed by one or more computing systems, each computing system having a processor and memory, the method comprising:
-
selecting a secondary copy of the multiple content items from the private computer network, wherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items, wherein the primary copy is available from one or more live computers within the private computer network, wherein the secondary copy of the multiple content items is created at a first time, and wherein the secondary copy of the multiple content items is selected at a second time later than the first time; identifying at least some of the multiple content items within the secondary copy; and for each of the identified content items; analyzing substantially all of the contents of an identified content item, by the one or more computing systems; and creating and/or updating multiple index entries within a content index, by the one or more computing systems, wherein an index entry corresponds to an identified content item, wherein an index entry is created and/or updated based upon the analysis of substantially all of the contents of an identified content item, wherein the analyzing and creating and/or updating are performed without impacting the one or more live computers from which the primary copy of the multiple content items is available, and wherein the index entry is created and/or updated to include an availability criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system for indexing and searching multiple content items, the computer system comprising:
-
a processor; a memory; a secondary copy component configured to select a secondary copy of the multiple content items wherein the secondary copy of the multiple content items is a copy of the multiple content items that is not a primary copy of the multiple content items, wherein the primary copy is available from one or more computer systems, wherein the secondary copy of the multiple content items is created at a first time, and wherein the secondary copy component selects the secondary copy of the multiple content items at a second time later than the first time; a content indexing component configured to, for at least some of the multiple content items included in the selected secondary copy; analyze content of a content item, including analyzing a summary of the content item as well as analyzing additional content of the content item; and based upon the analysis, create and/or update a content index; and an index searching component configured to identify one or more indexed content items based on a received search query, wherein the content indexing component performs the analyzing and creating and/or updating without consuming resources of the one or more computing systems from which the primary copy of the multiple content items is available, and wherein the index searching component provides an availability criterion for at least some of the identified one or more indexed content items. - View Dependent Claims (10, 11, 12, 13)
-
-
14. In a data management system residing within a private computer network, a method for indexing content that is performed by one or more computing systems, each computing system having a processor and memory, comprising:
-
selecting a secondary copy of the content from the private computer network, wherein the secondary copy of the content is a copy of the content that is not a primary copy of the content, wherein the primary copy is available from one or more live computers within the private computer network, wherein the secondary copy of the content is created at a first time, and wherein the secondary copy of the content is selected at a second time later than the first time; identifying at least some of the content within the secondary copy; analyzing attributes of the identified content, by the one or more computer systems; based upon the analysis, classifying the identified content, by the one or more computer systems; and updating a content index that includes the classifications of the identified content, by the one or more computing systems, wherein the analyzing, classifying, and updating are performed without impacting the one or more live computers that store the primary copy of the content, and wherein the context index includes an availability of at least the identified content. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
Specification