Mechanism for Recovery from Site Failure in a Stream Processing System
First Claim
1. A method for providing failure recovery in cooperative data stream processing, the method comprising:
- identifying a plurality of distributed sites, each site comprising one or more nodes and capable of independently hosting on the nodes applications associated with jobs derived from inquiries to process continuous dynamic streams of data; and
using an inter-site back-up mechanism to provide failure recovery, the inter-site back-up mechanism comprising back-up sites selected from the identified plurality of sites.
1 Assignment
0 Petitions
Accused Products
Abstract
A failure recovery framework to be used in cooperative data stream processing is provided that can be used in a large-scale stream data analysis environment. Failure recovery supports a plurality of independent distributed sites, each having its own local administration and goals. The distributed sites cooperate in an inter-site back-up mechanism to provide for system recovery from a variety of failures within the system. Failure recovery is both automatic and timely through cooperation among sites. Back-up sites associated with a given primary site are identified. These sites are used to identify failures within the primary site including failures of applications running on the nodes of the primary site. The failed applications are reinstated on one or more nodes within the back-up sites using job management instances local to the back-up sites in combination with previously stored state information and data values for the failed applications. In additions to inter-site mechanisms, each one of the plurality of sites employs an intra-site back-up mechanism to handle failure recoveries within the site.
-
Citations
20 Claims
-
1. A method for providing failure recovery in cooperative data stream processing, the method comprising:
-
identifying a plurality of distributed sites, each site comprising one or more nodes and capable of independently hosting on the nodes applications associated with jobs derived from inquiries to process continuous dynamic streams of data; and using an inter-site back-up mechanism to provide failure recovery, the inter-site back-up mechanism comprising back-up sites selected from the identified plurality of sites. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-readable medium containing a computer-readable code that when read by a computer causes the computer to perform a method for providing failure recovery in cooperative data stream processing, the method comprising:
-
identifying a plurality of distributed sites, each site comprising one or more nodes and capable of independently hosting on the nodes applications associated with jobs derived from inquiries to process continuous dynamic streams of data; and using an inter-site back-up mechanism to provide failure recovery, the inter-site back-up mechanism comprising back-up sites selected from the identified plurality of sites. - View Dependent Claims (20)
-
Specification