Data flow windowing and triggering
First Claim
Patent Images
1. A method comprising:
- receiving, at data processing hardware, data corresponding to one of streaming data or batch data;
determining, by the data processing hardware, an event time of the data for slicing the data;
grouping, by the data processing hardware, a first subset of the data into a first window, the first window defining a first sub-event time;
aggregating, by the data processing hardware, a first aggregated result processed from the first subset of the data for the first window;
determining, by the data processing hardware, a first trigger time to;
emit the first aggregated result; and
maintain the first aggregated result in a persistent state;
when a next aggregated result of a second subset of the data associated with the first window emits after emitting the first aggregated result;
emitting a retraction of the first aggregated result from the persistent state; and
emitting a combined session result for the first window, the combined session result comprising a sum of the first aggregated result and the next aggregated result; and
when the received data corresponds to streaming data;
setting, by the data processing hardware, an input timestamp on an element of the streaming data;
when the input timestamp on the element occurs earlier than a watermark, determining, by the data processing hardware, the streaming data comprises late streaming data; and
one of;
dropping the late streaming data;
orallowing the late streaming data by creating a duplicate window in an output for the late streaming data.
2 Assignments
0 Petitions
Accused Products
Abstract
A method includes receiving data corresponding one of streaming data or batch data and a content of the received data for computation. The method also includes determining an event time of the data for slicing the data, determining a processing time to output results of the received data, and emitting at least a portion of the results of the received data based on the processing time and the event time.
51 Citations
22 Claims
-
1. A method comprising:
-
receiving, at data processing hardware, data corresponding to one of streaming data or batch data; determining, by the data processing hardware, an event time of the data for slicing the data; grouping, by the data processing hardware, a first subset of the data into a first window, the first window defining a first sub-event time; aggregating, by the data processing hardware, a first aggregated result processed from the first subset of the data for the first window; determining, by the data processing hardware, a first trigger time to; emit the first aggregated result; and maintain the first aggregated result in a persistent state; when a next aggregated result of a second subset of the data associated with the first window emits after emitting the first aggregated result; emitting a retraction of the first aggregated result from the persistent state; and emitting a combined session result for the first window, the combined session result comprising a sum of the first aggregated result and the next aggregated result; and when the received data corresponds to streaming data; setting, by the data processing hardware, an input timestamp on an element of the streaming data; when the input timestamp on the element occurs earlier than a watermark, determining, by the data processing hardware, the streaming data comprises late streaming data; and one of; dropping the late streaming data;
orallowing the late streaming data by creating a duplicate window in an output for the late streaming data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising; receiving data corresponding to one of streaming data or batch data; determining an event time of the data for slicing the data; grouping a first subset of the data into a first window, the first window defining a first sub-event time; aggregating a first aggregated result processed from the first subset of the data for the first window; and determining a first trigger time to; emit the first aggregated result; and maintain the first aggregated result in a persistent state; and when a next aggregated result of a second subset of the data associated with the first window emits after emitting the first aggregated result; emitting a retraction of the first aggregated result from the persistent state; and emitting a combined session result for the first window, the combined session result comprising a sum of the first aggregated result and the next aggregated result; and when the received data corresponds to streaming data; setting an input timestamp on an element of the streaming data; when the input timestamp on the element occurs earlier than a watermark, determining the streaming data comprises late streaming data; and one of; dropping the late streaming data;
orallowing the late streaming data by creating a duplicate window in an output for the late streaming data. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification