Multi-site clustering

US 9,124,612 B2
Filed: 04/30/2014
Issued: 09/01/2015
Est. Priority Date: 05/15/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving data at a particular indexer belonging to a first user-specified grouping of indexers, the first user-specified grouping of indexers associated with a particular geographic location;

storing, by the particular indexer, at least one grouped subset of the data in a data store accessible by the particular indexer;

selecting, by a master node, a set of peer indexers based on both of (i) a user-specified indexer replication factor indicating a number of separate indexers at which the at least one grouped subset of the data is to be stored, and (ii) a separate user-specified site replication factor indicating a number of sites at which the at least one grouped subset of the data is to be stored, each site corresponding to a separate geographic location that is different from the first geographic location;

generating, by the master node, replication instructions identifying the selected set of peer indexers;

receiving, by the particular indexer, the data replication instructions identifying the selected peer indexers; and

sending, by the particular indexer, the at least one grouped subset of the data to the peer indexers based on the data replication instructions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to various embodiments, techniques are described for managing data within a multi-site clustered data intake and query system. A data intake and query system as described herein generally refers to a system for collecting, retrieving, and analyzing data. In this context, a clustered data intake and query system generally refers to a system environment that is configured to provide data redundancy and other features that improve the availability of data stored by the system. For example, a clustered data intake and query system may be configured to store multiple copies of data stored by the system across multiple components such that recovery from a failure of one or more of the components is possible by using copies of the data stored elsewhere in the cluster.

Citations

16 Claims

1. A method comprising:
- receiving data at a particular indexer belonging to a first user-specified grouping of indexers, the first user-specified grouping of indexers associated with a particular geographic location;
  
  storing, by the particular indexer, at least one grouped subset of the data in a data store accessible by the particular indexer;
  
  selecting, by a master node, a set of peer indexers based on both of (i) a user-specified indexer replication factor indicating a number of separate indexers at which the at least one grouped subset of the data is to be stored, and (ii) a separate user-specified site replication factor indicating a number of sites at which the at least one grouped subset of the data is to be stored, each site corresponding to a separate geographic location that is different from the first geographic location;
  
  generating, by the master node, replication instructions identifying the selected set of peer indexers;
  
  receiving, by the particular indexer, the data replication instructions identifying the selected peer indexers; and
  
  sending, by the particular indexer, the at least one grouped subset of the data to the peer indexers based on the data replication instructions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein sending the at least one grouped subset of the data to the one or more peer indexers causes the one or more peer indexers to store the at least one grouped subset of the data in one or more separate data stores.
  - 3. The method of claim 1, further comprising separating the data into a plurality of events, and wherein the at least one grouped subset of the data includes one or more of the plurality of events.
  - 4. The method of claim 1, wherein the at least one grouped subset corresponds to a particular time span.
  - 5. The method of claim 1, further comprising sending to each peer indexer of the one or more peer indexers an indication of whether the peer indexer is to store a searchable or non-searchable copy of the data.
  - 6. The method of claim 1, wherein the replication instructions specify a number of peer indexers for replicating the data, and wherein the number of peer indexers corresponds to a user configured replication factor.
  - 7. The method of claim 1, further comprising receiving generation information indicating whether the indexer has primary responsibility for responding to queries for the at least one grouped subset of the data, and wherein the generation information is associated with a generation identifier.
  - 8. The method of claim 1, wherein the instructions include search affinity information indicating whether the indexer has primary responsibility for responding to queries originating from the first grouping of indexers for the at least one grouped subset of the data, and wherein the search affinity information is associated with a generation identifier.

9. One or more non-transitory computer-readable storage media, storing software instructions, which when executed by one or more processors cause performance of steps of:
- receiving data at a particular indexer belonging to a first user-specified grouping of indexers, the first user-specified grouping of indexers associated with a particular geographic location;
  
  storing, by the particular indexer, at least one grouped subset of the data in a data store accessible by the particular indexer;
  
  selecting, by a master node, a set of peer indexers based on both of (i) a user-specified indexer replication factor indicating a number of separate indexers at which the at least one grouped subset of the data is to be stored, and (ii) a separate user-specified site replication factor indicating a number of sites at which the at least one grouped subset of the data is to be stored, each site corresponding to a separate geographic location that is different from the first geographic location;
  
  generating, by the master node, replication instructions identifying the selected set of peer indexers;
  
  receiving, by the particular indexer, the data replication instructions identifying the selected peer indexers; and
  
  sending, by the particular indexer, the at least one grouped subset of the data to the peer indexers based on the data replication instructions.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The one or more non-transitory computer-readable storage media of claim 9, wherein sending the at least one grouped subset of the data to the one or more peer indexers causes the one or more peer indexers to store the at least one grouped subset of the data in one or more separate data stores.
  - 11. The one or more non-transitory computer-readable storage media of claim 9, wherein the instructions, when executed by the one or more computing devices, further cause performance of separating the data into a plurality of events, and wherein the at least one grouped subset of the data includes one or more of the plurality of events.
  - 12. The one or more non-transitory computer-readable storage media of claim 9, wherein the at least one grouped subset corresponds to a particular time span.
  - 13. The one or more non-transitory computer-readable storage media of claim 9, wherein the instructions, when executed by the one or more computing devices, further cause performance of sending an indication of whether the peer indexer is to store a searchable or non-searchable copy of the data.
  - 14. The one or more non-transitory computer-readable storage media of claim 9, wherein the replication instructions specify a number of peer indexers for replicating the data, and wherein the number of peer indexers corresponds to a user configured replication factor.
  - 15. The one or more non-transitory computer-readable storage media of claim 9, further comprising receiving generation information indicating whether the indexer has primary responsibility for responding to queries for the at least one grouped subset of the data, and wherein the generation information is associated with a generation identifier.
  - 16. The one or more non-transitory computer-readable storage media of claim 9, wherein the instructions include search affinity information indicating whether the indexer has primary responsibility for responding to queries originating from the first grouping of indexers for the at least one grouped subset of the data, and wherein the search affinity information is associated with a generation identifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Splunk Inc. (Cisco Systems, Inc.)
Original Assignee
Splunk Inc. (Cisco Systems, Inc.)
Inventors
Vasan, Sundar Rengarajan, Blank, Mitchell Neuman Jr., Patel, Vishal, Xu, Da, Kerai, Jagannath
Primary Examiner(s)
PHILLIPS, III, ALBERT M

Application Number

US14/266,817
Publication Number

US 20140236890A1
Time in Patent Office

489 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 11/20   using active fault-masking,...

G06F 11/2094   Redundant storage or storag...

G06F 11/2097   maintaining the standby con...

G06F 16/27   Replication, distribution o...

H04L 67/1097   for distributed storage of ...

Multi-site clustering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-site clustering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links