×

Fault tolerance for a distributed computing system

  • US 9,645,811 B2
  • Filed: 04/01/2014
  • Issued: 05/09/2017
  • Est. Priority Date: 04/01/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • detecting a failure of a container, of a set of containers, in a controller node, the container executing a service being performed and isolated from at least one other service being performed in at least one other container on the controller node;

    terminating, by the controller node, the container executing the service;

    determining, by the controller node, a particular known state for the service, wherein the particular known state is known to be operational without including one or more changes that caused the failure, and wherein the service saves the changes to the particular known state during operation separately from the particular known state;

    restarting, by the controller node, the service in a new container that replaces the terminated container, wherein the restarted service starts from the particular known state without using the changes;

    wherein an orchestration service, configured to manage the set of containers, detects the failure;

    wherein the orchestration service detects the failure via monitoring a communication service in which a status of the service is input; and

    wherein the method is performed by at least one device including a hardware processor.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×