System and method for data stream processing
First Claim
Patent Images
1. A method for processing a data stream using a map-reduce process, the method comprising:
- executing, until the occurrence of a cut condition defining a partition of the data stream into portions including a first portion, a map function from a set of query processing steps, the map function mapping data into groups to be processed in parallel to generate map results for each group of data in the first portion of the data stream;
executing a reduce function from the set of query processing steps, the reduce function to aggregate the map results to generate history-sensitive data for the first portion; and
rewinding the set of query processing steps, without a termination of processing for a second execution of the map function and reduce function on a second portion of the data stream.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system for processing a data stream are described. The method executes, until the occurrence of a cut condition, a map function from a set of query processing steps to generate map results for a first portion of the data stream, executes a reduce function from the set of query processing steps to generate history-sensitive data from the map results, and rewinds the set of query processing steps, without termination of processing. The history-sensitive data is maintained for a second execution of the map function and reduce function on a second portion of the data stream.
132 Citations
25 Claims
-
1. A method for processing a data stream using a map-reduce process, the method comprising:
-
executing, until the occurrence of a cut condition defining a partition of the data stream into portions including a first portion, a map function from a set of query processing steps, the map function mapping data into groups to be processed in parallel to generate map results for each group of data in the first portion of the data stream; executing a reduce function from the set of query processing steps, the reduce function to aggregate the map results to generate history-sensitive data for the first portion; and rewinding the set of query processing steps, without a termination of processing for a second execution of the map function and reduce function on a second portion of the data stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for processing a data stream comprising:
-
a first plurality of processing nodes for receiving a portion of the data stream, for processing a first part of said received portion using a cut condition to signal a halt in processing, and for forwarding processing results after halts in processing have been signaled at all of the first plurality of nodes; and a second plurality of processing nodes for receiving the forwarded processing results, wherein at least one of the second plurality of processing nodes generates an output, and processing is rewound at the first and second pluralities of nodes, without terminating processing, to maintain the output for processing a next part. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory machine-readable medium having stored thereon instructions that if executed by a machine, result in:
-
executing, until the occurrence of a cut condition defining a partition of the data stream into portions including a first portion, a map function from a set of query processing steps, the map function mapping data into groups to be processed in parallel to generate map results for each group of data in the first portion of the data stream; executing a reduce function from the set of query processing steps, the reduce function to aggregate the map results to generate history-sensitive data for the first portion; and rewinding the set of query processing steps, without a termination of processing, for a second execution of the map function and reduce function on a second portion of the data stream using the history-sensitive data. - View Dependent Claims (20)
-
-
21. A method for processing a data stream comprising:
-
extending a query engine to include a rewind function for continuously cycling through a set of set of query processing steps without terminating the cycle; associating, with a query compiler, program code for a stream source function; associating, with a query compiler, program code for a user-defined buffer function; receiving a query containing expressions for the program codes for the stream source function and the user-defined buffer function; processing the expressions to create a set of executable query processing steps comprising an executable stream source function and an executable user-defined buffer function; and executing the executable query processing steps to process a data stream. - View Dependent Claims (22)
-
-
23. A method for processing a data stream comprising:
-
distributing, to a first node of a first plurality of nodes, an executable map function operator for mapping a data stream portion into groups to be processed in parallel and an executable stream source function operator; distributing, to a second node of a second plurality of nodes, an executable reduce function operator for aggregating the map results of groups in the data stream portion, capable of accessing a storage element; and processing a data stream according to a cycle, wherein; the first node forwards results from an execution of the map function operator to the second node after the stream source operator at the first node signals an occurrence of a cut condition defining a partition of the data stream into a portion, the executable reduce function operator updates the storage element, and the processing of the first and second pluralities of nodes is rewound while maintaining the updated storage element. - View Dependent Claims (24, 25)
-
Specification