Systems and methods for providing real-time classification of continuous data streams
First Claim
1. A method for real-time classification of a continuous data stream by a data stream processing server comprising a processor and implementing a classification module, the method comprising:
- receiving a continuous data stream by the data stream processing server;
clustering, incrementally, a set of data records in each contiguous segment of the received data stream into a plurality of micro-clusters, wherein the plurality of micro-clusters is stored as a snapshot in time, the snapshot updating with time and indicating a dominant micro-cluster in the data stream;
generating a target profile for each segment of the received data stream based on the snapshot of micro-clusters associated with each segment, wherein generating the target profile comprises generating a histogram profile for a given segment using summary information of data records associated with the micro-clusters for the given segment, wherein the histogram profile is generated based on relative frequencies of data points associated with each micro-cluster for the given segment as compared to a total number of data points in the micro-clusters for the given segment; and
classifying, by the classification module, each segment of the received data stream using the target profile associated with each segment.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
35 Citations
23 Claims
-
1. A method for real-time classification of a continuous data stream by a data stream processing server comprising a processor and implementing a classification module, the method comprising:
-
receiving a continuous data stream by the data stream processing server; clustering, incrementally, a set of data records in each contiguous segment of the received data stream into a plurality of micro-clusters, wherein the plurality of micro-clusters is stored as a snapshot in time, the snapshot updating with time and indicating a dominant micro-cluster in the data stream; generating a target profile for each segment of the received data stream based on the snapshot of micro-clusters associated with each segment, wherein generating the target profile comprises generating a histogram profile for a given segment using summary information of data records associated with the micro-clusters for the given segment, wherein the histogram profile is generated based on relative frequencies of data points associated with each micro-cluster for the given segment as compared to a total number of data points in the micro-clusters for the given segment; and classifying, by the classification module, each segment of the received data stream using the target profile associated with each segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory program storage device readable by machine, embodying a program of instructions executable by a processor of the machine to perform method steps for real-time classification of a continuous data stream, the method steps comprising:
-
receiving a continuous data stream; clustering, incrementally, speech data in each contiguous segment of the received data stream into a plurality of micro-clusters, wherein the plurality of micro-clusters is stored as a snapshot in time, the snapshot updating with time and indicating a dominant micro-cluster in the data stream; generating a target profile for each segment of the received data stream based on the snapshot of micro-clusters associated with each segment, wherein generating the target profile comprises generating a histogram profile for a given segment using summary information of data records associated with the micro-clusters for the given segment, wherein the histogram profile is generated based on relative frequencies of data points associated with each micro-cluster for the given segment as compared to a total number of data points in the micro-clusters for the given segment; and classifying each segment of the received data stream using the target profile associated with each segment. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for processing continuous data streams, comprising:
-
a data stream capturing system to capture a target data stream over a network; a data clustering system that incrementally clusters a set of data records in each contiguous segment of the captured data stream into a plurality of micro-clusters, wherein the plurality of micro-clusters is stored as a snapshot in time, the snapshot updating with time and indicating a dominant micro-cluster in the target data stream, and generates a target profile for each segment of the captured data stream based on the snapshot of micro-clusters associated with each segment, wherein generating the target profile comprises generating a histogram profile for a given segment using summary information of data records associated with the micro-clusters for the given segment, wherein the histogram profile is generated based on relative frequencies of data points associated with each micro-cluster for the given segment as compared to a total number of data points in the micro-clusters for the given segment; and a classification system to classify each segment of the captured data stream using the target profile associated with each segment.
-
Specification