System and method for load shedding in data mining and knowledge discovery from stream data
First Claim
1. A method of providing load shedding in mining data streams, said method comprising the steps of:
- accepting streams of data to be mined, the streams of data containing data stream elements;
ranking the importance of data stream elements;
investigating data stream elements of higher importance; and
thereafter shedding a plurality of data stream elements;
wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit; and
wherein the quality of decision value is
2 Assignments
0 Petitions
Accused Products
Abstract
Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
26 Citations
8 Claims
-
1. A method of providing load shedding in mining data streams, said method comprising the steps of:
-
accepting streams of data to be mined, the streams of data containing data stream elements; ranking the importance of data stream elements; investigating data stream elements of higher importance; and thereafter shedding a plurality of data stream elements; wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit; and wherein the quality of decision value is - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification