Stream processing at scale
First Claim
1. A stream processing system, comprising:
- a cluster manager;
a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager;
a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and
a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts,wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples,wherein each SPS instance includes a gateway thread and a task execution thread, wherein each gateway thread controls communication and data movement in and out of the respective SPS instance,wherein the task execution thread performs a function on data in a data stream received by the gateway thread to arrive at a solution and transfers the solution through the gateway thread to the stream manager,wherein each thread executes as a task on one or more of the cluster nodes,wherein the gateway thread and the task execution thread use a queue to buffer metrics data transferred from the task execution thread to the gateway thread, andwherein the queue is reviewed periodically to determine if a size of the queue should be increased or decreased.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for data stream processing. Two or more instances are connected as a topology, wherein at least one of the instances is a spout and at least one of the instances is a bolt. The topology is submitted to a scheduler, wherein the service scheduler receives resource offers from a cluster manager representing computing resources available on one or more of cluster nodes and determines resources to accept and computations to run on the accepted computing resources. The topology is scheduled as one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance represents one of the instances in the topology.
19 Citations
16 Claims
-
1. A stream processing system, comprising:
-
a cluster manager; a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager; a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts, wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples, wherein each SPS instance includes a gateway thread and a task execution thread, wherein each gateway thread controls communication and data movement in and out of the respective SPS instance, wherein the task execution thread performs a function on data in a data stream received by the gateway thread to arrive at a solution and transfers the solution through the gateway thread to the stream manager, wherein each thread executes as a task on one or more of the cluster nodes, wherein the gateway thread and the task execution thread use a queue to buffer metrics data transferred from the task execution thread to the gateway thread, and wherein the queue is reviewed periodically to determine if a size of the queue should be increased or decreased. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A stream processing system, comprising:
-
a cluster manager; a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager; a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts, wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples, wherein the stream processing system further comprises a distributed coordination service and a standby topology master, wherein the distributed coordination service stores state information regarding the topology master, wherein the standby topology master replaces the topology master based on the state information. - View Dependent Claims (10)
-
-
11. A stream processing system, comprising:
-
a cluster manager; a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager; a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts, wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples, wherein the stream processing system further comprises a distributed coordination service and a standby topology master, wherein the distributed coordination service stores state information regarding the topology master, wherein, when the topology master fails, the standby topology master replaces the topology master based on the state information, the first container restarts and wherein the topology master in the first container becomes a standby topology master. - View Dependent Claims (12)
receive metrics information from the stream manager and from one or more of the SPS instances; and transfer the metrics information to a monitoring system.
-
-
13. A stream processing system, comprising:
-
a cluster manager; a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager; a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts, wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples, wherein, when the stream manager in the second container fails, the stream manager is restarted in the second container, the stream manager rediscovers the topology master, and the stream manager retrieves a copy of a physical plan from the topology master. - View Dependent Claims (14)
-
-
15. A stream processing system, comprising:
-
a cluster manager; a cluster including a plurality of cluster nodes, wherein each cluster node includes computing resources and wherein the cluster nodes are managed by the cluster manager; a service scheduler, wherein the service scheduler receives resource offers from the cluster manager representing computing resources available on one or more of the cluster nodes and determines resources to accept and computations to run on the accepted resources; and a stream processor, wherein the stream processor includes one or more jobs, wherein each job includes two or more containers, including a first container and a second container, the first container including a topology master and the second container including a stream manager and one or more stream processing system (SPS) instances, wherein each SPS instance executes, on one or more of the cluster nodes, one or more tasks from a group of tasks associated with spouts and bolts, wherein each spout transfers tuples to one or more bolts and wherein each bolt performs a computation on the transferred tuples, wherein, when one of the SPS instances in the second container fails, the failed SPS instance is restarted in the second container and the restarted SPS instance contacts the stream manager in the second container, retrieves a copy of a physical plan, and executes user code corresponding to the physical plan. - View Dependent Claims (16)
-
Specification