Method and apparatus for adaptive load shedding
First Claim
1. A method for processing data streams, the method comprising:
- receiving at least a first data stream into at least a first sliding window of memory;
selecting tuples from said at least said first data stream for processing in accordance with at least one data stream operation, where said tuples that are selected represent a subset of all tuples contained within said at least said first sliding window, wherein said selecting tuples from said at least said first data stream comprises;
determining a total number of tuples to be selected for processing, wherein said determining adapts to a rate at which said at least said first data stream is received, wherein said determining said total number of tuples to be selected for processing comprises;
calculating a fraction of said at least said first sliding window, said fraction indicating how much of said at least said first sliding window can be selected for processing, wherein said calculating comprises;
counting a first number of tuples, said first number of tuples representing a number of tuples selected from said at least said first sliding window for processing in a period of time, counting a second number of tuples, said second number of tuples representing a number of tuples received by said at least said first sliding window in said period of time, and basing said fraction, at least in part, on a ratio of said first number of tuples to said second number of tuples; and
selecting specific tuples for processing in accordance with said total number of tuples, wherein said selecting specific tuples adapts to a time correlation between tuples from said at least said first data stream and tuples from at least a second data stream, wherein said selecting specific tuples comprises;
partitioning said at least said first sliding window into a first plurality of sub-windows;
partitioning at least a second sliding window for receiving said at least said second data stream into at least a second plurality of sub-windows;
sorting said first plurality of sub-windows into a first prioritized array of sub-windows, wherein said tuples from said first data stream are sorted within said first prioritized array of sub-windows in a descending order based on a number of output tuples that each of said tuples from said first data stream is expected to produce when compared to a tuple from among said tuples from said at least said second data stream;
sorting said at least said second plurality of sub-windows into a second prioritized array of sub-windows, wherein said tuples from said second data stream are sorted within said second prioritized array of sub-windows in a descending order based on a number of output tuples that each of said tuples from said second data stream is expected to produce when compared to a tuple from among said tuples from said first data stream; and
joining, using a processor, at least one tuple from said first sliding window with at least one tuple from said second sliding window in accordance with said first prioritized array of sub-windows and said second prioritized array of sub-windows; and
ignoring tuples from said at least said first data stream that are not selected for processing.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output.
-
Citations
1 Claim
-
1. A method for processing data streams, the method comprising:
-
receiving at least a first data stream into at least a first sliding window of memory; selecting tuples from said at least said first data stream for processing in accordance with at least one data stream operation, where said tuples that are selected represent a subset of all tuples contained within said at least said first sliding window, wherein said selecting tuples from said at least said first data stream comprises; determining a total number of tuples to be selected for processing, wherein said determining adapts to a rate at which said at least said first data stream is received, wherein said determining said total number of tuples to be selected for processing comprises; calculating a fraction of said at least said first sliding window, said fraction indicating how much of said at least said first sliding window can be selected for processing, wherein said calculating comprises;
counting a first number of tuples, said first number of tuples representing a number of tuples selected from said at least said first sliding window for processing in a period of time, counting a second number of tuples, said second number of tuples representing a number of tuples received by said at least said first sliding window in said period of time, and basing said fraction, at least in part, on a ratio of said first number of tuples to said second number of tuples; andselecting specific tuples for processing in accordance with said total number of tuples, wherein said selecting specific tuples adapts to a time correlation between tuples from said at least said first data stream and tuples from at least a second data stream, wherein said selecting specific tuples comprises; partitioning said at least said first sliding window into a first plurality of sub-windows; partitioning at least a second sliding window for receiving said at least said second data stream into at least a second plurality of sub-windows; sorting said first plurality of sub-windows into a first prioritized array of sub-windows, wherein said tuples from said first data stream are sorted within said first prioritized array of sub-windows in a descending order based on a number of output tuples that each of said tuples from said first data stream is expected to produce when compared to a tuple from among said tuples from said at least said second data stream; sorting said at least said second plurality of sub-windows into a second prioritized array of sub-windows, wherein said tuples from said second data stream are sorted within said second prioritized array of sub-windows in a descending order based on a number of output tuples that each of said tuples from said second data stream is expected to produce when compared to a tuple from among said tuples from said first data stream; and joining, using a processor, at least one tuple from said first sliding window with at least one tuple from said second sliding window in accordance with said first prioritized array of sub-windows and said second prioritized array of sub-windows; and ignoring tuples from said at least said first data stream that are not selected for processing.
-
Specification