Distributed indexing system for data storage
First Claim
1. A method of storing index information describing secondary copies of data, the method comprising:
- receiving at a media agent executing in one or more computer processors, data copied during a first data storage operation, wherein the media agent is configured to convey data from a primary copy of data between a client computer and one or more first data storage devices associated with the media agent to create a secondary copy of the primary copy of data, the primary copy of data generated by one or more software applications running on the client computer and stored in a data store associated with the client computer that is separate from the one or more first data storage devices;
indexing with the media agent the secondary copy to determine content from the secondary copy, wherein indexing the secondary copy creates a first index of indexed data,wherein the first index is associated with a primary index server and wherein the indexed data comprises information about the content of the secondary copy and information about location of the secondary copy on the first data storage devices;
selecting at least a secondary index server among multiple available index servers based on a failover policy wherein the secondary index server is configured to store a second index that is created using the first index and is a replica of the first index, wherein the multiple index servers are networked together and collectively provide a distributed index;
sending from the media agent a reference to the indexed data associated with the secondary copy to the primary index server and to the secondary index server, wherein the primary index server and the secondary index server retrieve the indexed data from the media agent using the reference such that the second index remains a replica of the first index;
updating the distributed index using the indexed data retrieved from the media agent;
receiving an index update about migrated data associated with a migration of at least a portion of the secondary copy from the first data storage devices to at least a second data storage device based on one or more storage policies;
wherein the index update comprises information about the new location of the migrated data on the second storage device and information about the content of the migrated data;
determining that the primary index server is not available; and
updating the distributed index about the migrated data using the index update retrieved from the media agent via the secondary index server.
4 Assignments
0 Petitions
Accused Products
Abstract
A distributed indexing system spreads out the load on an index of stored data in a data storage system. Rather than maintain a single index, the distributed indexing system maintains an index in each media agent of a federated data storage system and a master index that points to the index in each media agent. In some embodiments, the distributed indexing system includes an index server (or group of servers) that handles indexing requests and forwards the requests to the appropriate distributed systems. Thus, the distributed indexing system, among other things, increases the availability and fault tolerance of a data storage index.
230 Citations
15 Claims
-
1. A method of storing index information describing secondary copies of data, the method comprising:
-
receiving at a media agent executing in one or more computer processors, data copied during a first data storage operation, wherein the media agent is configured to convey data from a primary copy of data between a client computer and one or more first data storage devices associated with the media agent to create a secondary copy of the primary copy of data, the primary copy of data generated by one or more software applications running on the client computer and stored in a data store associated with the client computer that is separate from the one or more first data storage devices; indexing with the media agent the secondary copy to determine content from the secondary copy, wherein indexing the secondary copy creates a first index of indexed data, wherein the first index is associated with a primary index server and wherein the indexed data comprises information about the content of the secondary copy and information about location of the secondary copy on the first data storage devices; selecting at least a secondary index server among multiple available index servers based on a failover policy wherein the secondary index server is configured to store a second index that is created using the first index and is a replica of the first index, wherein the multiple index servers are networked together and collectively provide a distributed index; sending from the media agent a reference to the indexed data associated with the secondary copy to the primary index server and to the secondary index server, wherein the primary index server and the secondary index server retrieve the indexed data from the media agent using the reference such that the second index remains a replica of the first index; updating the distributed index using the indexed data retrieved from the media agent; receiving an index update about migrated data associated with a migration of at least a portion of the secondary copy from the first data storage devices to at least a second data storage device based on one or more storage policies; wherein the index update comprises information about the new location of the migrated data on the second storage device and information about the content of the migrated data; determining that the primary index server is not available; and updating the distributed index about the migrated data using the index update retrieved from the media agent via the secondary index server. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A distributed index system for maintaining an index of data storage information related to secondary copies of data, the system comprising:
-
a memory; one or more computer processors; at least one media agent executing in the one or more computer processors that is configured to receive data copied during a first data storage operation, wherein the media agent is configured to convey data from a primary copy of data between a client computer and one or more first data storage devices associated with the media agent to create a secondary copy of the primary copy of data, the primary copy of data generated by one or more software applications running on the client computer and stored in a data store associated with the client computer and that is separate from the one or more first data storage devices, wherein the media agent is further configured to index the secondary copy to create index data that comprises information about content of the secondary copy and information about location of the secondary copy on the first storage devices; multiple index server components configured to store index data for one or more index data sources, wherein each index server component is configured to; receive a reference from a media agent to the index data of the secondary copy stored at the media agent, retrieve the index data from the media agent using the reference received from the media agent, and update an index associated with each index server with the index data retrieved from the media agent; wherein the multiple index server components are networked together and collectively provide a distributed index; a primary index server component from the index server components that has a first index created from the indexed data provided by the media agent; a failover component that selects, based on a failover policy, at least a secondary index server component from the index server components, wherein the secondary index server component comprises a second index that is a replica of the first index created using the first index; wherein the media agent generates an index update about migrated data associated with a migration of at least a portion of the secondary copy from the first data storage devices to at least a second data storage device based on one or more storage policies and wherein the index update comprises information about the new location of the migrated data on the second storage device and information about the content of the migrated data; and wherein when the primary index server is not available, the distributed index is updated using the index update about the migrated data via the secondary index server. - View Dependent Claims (12, 13, 14, 15)
-
Specification