×

System and method for self-healing a database server in a cluster

  • US 10,169,138 B2
  • Filed: 01/29/2016
  • Issued: 01/01/2019
  • Est. Priority Date: 09/22/2015
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a plurality of database servers, each database server in the plurality of database servers hosting shards of a database, each shard of the shards of the database having been split from a partition of the database and each partition of the database having been split from the database, each database server in the plurality of database servers having a unique identifier such that a status of each database server in the plurality of database servers can be accessed by other servers in the plurality of database servers, wherein each database server in the plurality of database servers is configured to;

    receive a triggering action comprising;

    receiving an indication that a minimum timer has expired; and

    receiving a pre-determined number of queries;

    detect a suspicious observation;

    discover that a particular server is underperforming;

    compile a plurality of statistics regarding itself, wherein the plurality of statistics is chosen from one of the following;

    memory usage, disk activity levels, CPU load, and error rates; and

    store the plurality of statistics in a data store accessible by;

    (1) each database server in the plurality of database servers; and

    (2) a load balancer; and

    the load balancer configured to;

    allocate queries among the plurality of database servers using load balancing techniques;

    determine when a condition has occurred by;

    accessing the plurality of statistics in the data store; and

    determining that a malfunctioning database server of the plurality of database servers is malfunctioning, comprising determining when one or more of the plurality of statistics stored in the data store by the malfunctioning database server does not meet performance thresholds;

    initiate an automatic self-corrective action in a database server in the plurality of database servers, the automatic self-corrective action comprising the database server taking itself out of a rotation for a predetermined amount of time configured to allow the database server to catch up; and

    perform a corrective action on the malfunctioning database server comprising;

    determining that the malfunctioning database server cannot correct itself;

    writing an entry in the data store indicating that the malfunctioning database server is not available;

    causing the malfunctioning database server to no longer receive instructions; and

    forwarding shard-level queries originally directed to the malfunctioning database server to one or more other database servers of the plurality of database servers.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×