System and method for continuous diagnosis of data streams
First Claim
1. An apparatus for facilitating the mining of time-evolving data streams, said apparatus comprising:
- an input arrangement for accepting a data stream comprising unlabeled data; and
an arrangement for determining an amount of drifts in the data stream comprising unlabeled data;
said determining arrangement being adapted to employ a signature profile of an inductive model in determining an amount of drifts in the data stream.
1 Assignment
0 Petitions
Accused Products
Abstract
In connection with the mining of time-evolving data streams, a general framework that mines changes and reconstructs models from a data stream with unlabeled instances or a limited number of labeled instances. In particular, there are defined herein statistical profiling methods that extend a classification tree in order to guess the percentage of drifts in the data stream without any labelled data. Exact error can be estimated by actively sampling a small number of true labels. If the estimated error is significantly higher than empirical expectations, there preferably re-sampled a small number of true labels to reconstruct the decision tree from the leaf node level.
32 Citations
11 Claims
-
1. An apparatus for facilitating the mining of time-evolving data streams, said apparatus comprising:
-
an input arrangement for accepting a data stream comprising unlabeled data; and
an arrangement for determining an amount of drifts in the data stream comprising unlabeled data;
said determining arrangement being adapted to employ a signature profile of an inductive model in determining an amount of drifts in the data stream. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of facilitating the mining of time-evolving data streams, said method comprising the steps of:
-
accepting a data stream comprising unlabeled data; and
determining an amount of drifts in the data stream comprising unlabeled data;
said determining step comprising employing a signature profile of an inductive model in determining an amount of drifts in the data stream. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for facilitating the mining of time-evolving data streams, said method comprising the steps of:
-
accepting a data stream comprising unlabeled data; and
determining an amount of drifts in the data stream comprising unlabeled data;
said determining step comprising employing a signature profile of an inductive model in determining an amount of drifts in the data stream.
-
Specification