SYSTEM AND METHOD FOR LOAD SHEDDING IN DATA MINING AND KNOWLEDGE DISCOVERY FROM STREAM DATA
First Claim
1. A method of providing load shedding in mining data streams, said method comprising the steps of:
- accepting streams of data to be mined, the streams of data containing data stream elements;
ranking the importance of data stream elements;
investigating data stream elements of higher importance; and
thereafter shedding a plurality of data stream elements;
wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit.
1 Assignment
0 Petitions
Accused Products
Abstract
Load shedding schemes for mining data streams. A scoring function is used to rank the importance of stream elements, and those elements with high importance are investigated. In the context of not knowing the exact feature values of a data stream, the use of a Markov model is proposed herein for predicting the feature distribution of a data stream. Based on the predicted feature distribution, one can make classification decisions to maximize the expected benefits. In addition, there is proposed herein the employment of a quality of decision (QoD) metric to measure the level of uncertainty in decisions and to guide load shedding. A load shedding scheme such as presented herein assigns available resources to multiple data streams to maximize the quality of classification decisions. Furthermore, such a load shedding scheme is able to learn and adapt to changing data characteristics in the data streams.
-
Citations
19 Claims
-
1. A method of providing load shedding in mining data streams, said method comprising the steps of:
-
accepting streams of data to be mined, the streams of data containing data stream elements; ranking the importance of data stream elements; investigating data stream elements of higher importance; and thereafter shedding a plurality of data stream elements; wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 19)
-
-
9. An apparatus for providing load shedding in mining data streams, said apparatus comprising:
-
an arrangement for accepting streams of data to be mined, the streams of data containing data stream elements; an arrangement for ranking the importance of data stream elements; an arrangement for investigating data stream elements of higher importance; and an arrangement for shedding a plurality of data stream elements; wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit; a processor to rank the data stream elements. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 18)
-
-
17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for:
-
accepting data streams to be mined, the data streams containing data stream elements; ranking the importance of data stream elements; investigating data stream elements of higher importance; and thereafter shedding a plurality of data stream elements; wherein the plurality of data stream elements shed have a higher quality of decision value than the data stream elements of higher importance and the quality of decision value is based on the predicted distribution of feature values in a next time unit.
-
Specification