Method and apparatus for processing data streams
First Claim
Patent Images
1. A method of processing a continuously progressing data stream, comprising the steps of:
- maintaining a cluster structure, the cluster structure representing one or more clusters in the continuously progressing data stream, wherein the cluster structure comprises one or more data points in a multidimensional space, further wherein the one or more data points of the cluster structure fade as the continuously progressing data stream progresses;
determining a set of projected dimensions for each of the one or more clusters using the one or more data points in the cluster structure, a given set of projected dimensions of a cluster being associated with a subset of a total number of dimensions of the continuously progressing data stream and the given set of projected dimensions being represented by a d-dimensional bit vector such that dimensions of the d dimensions in the bit vector that are present in the cluster are assigned one value and dimensions of the d dimensions in the bit vector that are not in the cluster are assigned another value, the given set of projected dimensions being continuously redefined to minimize a cluster radius, wherein redefining the set of projected dimensions comprises varying the values of the bit vector based on which dimensions are present and not present in the cluster as the data stream progresses and the one or more data points fade; and
determining assignments for incoming data points of the continuously progressing data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters;
wherein one or more of the steps of maintaining the cluster structure, determining the set of projected dimensions, and determining the assignments are implemented as one or more software components that are loaded from a memory and executed by a processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for processing a data stream includes the following steps/operations. A cluster structure representing one or more clusters in the data stream is maintained. A set of projected dimensions is determined for each of the one or more clusters using data points in the cluster structure. Assignments are determined for incoming data points of the data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. Further, the cluster structure maybe used for classification of data in the data stream.
-
Citations
24 Claims
-
1. A method of processing a continuously progressing data stream, comprising the steps of:
-
maintaining a cluster structure, the cluster structure representing one or more clusters in the continuously progressing data stream, wherein the cluster structure comprises one or more data points in a multidimensional space, further wherein the one or more data points of the cluster structure fade as the continuously progressing data stream progresses; determining a set of projected dimensions for each of the one or more clusters using the one or more data points in the cluster structure, a given set of projected dimensions of a cluster being associated with a subset of a total number of dimensions of the continuously progressing data stream and the given set of projected dimensions being represented by a d-dimensional bit vector such that dimensions of the d dimensions in the bit vector that are present in the cluster are assigned one value and dimensions of the d dimensions in the bit vector that are not in the cluster are assigned another value, the given set of projected dimensions being continuously redefined to minimize a cluster radius, wherein redefining the set of projected dimensions comprises varying the values of the bit vector based on which dimensions are present and not present in the cluster as the data stream progresses and the one or more data points fade; and determining assignments for incoming data points of the continuously progressing data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters; wherein one or more of the steps of maintaining the cluster structure, determining the set of projected dimensions, and determining the assignments are implemented as one or more software components that are loaded from a memory and executed by a processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. Apparatus for processing a continuously progressing data stream, comprising:
-
a memory; and at least one processor coupled to the memory, the at least one processor being operative to;
(i) maintain a cluster structure, the cluster structure representing one or more clusters in the continuously progressing data stream, wherein the cluster structure comprises one or more data points in a multidimensional space, further wherein the one or more data points of the cluster structure fade as the continuously progressing data stream progresses;
(ii) determine a set of projected dimensions for each of the one or more clusters using the one or more data points in the cluster structure, a given set of projected dimensions of a cluster being associated with a subset of a total number of dimensions of the continuously progressing data stream and the given set of projected dimensions being represented by a d-dimensional bit vector such that dimensions of the d dimensions in the bit vector that are present in the cluster are assigned one value and dimensions of the d dimensions in the bit vector that are not in the cluster are assigned another value, the given set of projected dimensions being continuously redefined to minimize a cluster radius, wherein redefining the set of projected dimensions comprises varying the values of the bit vector based on which dimensions are present and not present in the cluster as the data stream progresses and the one or more data points fade; and
(iii) determine assignments for incoming data points of the continuously progressing data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. Apparatus, comprising:
a server, responsive to a continuously progressing data stream associated with one or more client devices, the server comprising;
a memory and at least one processor coupled to the memory, the at least one processor being operative to;
(i) maintain a cluster structure, the cluster structure representing one or more clusters in the continuously progressing data stream, wherein the cluster structure comprises one or more data points in a multidimensional space, further wherein the one or more data points of the cluster structure fade as the continuously progressing data stream progresses;
(ii) determining a set of projected dimensions for each of the one or more clusters using the one or more data points in the cluster structure, a given set of projected dimensions of a cluster being associated with a subset of a total number of dimensions of the continuously progressing data stream and the given set of projected dimensions being represented by a d-dimensional bit vector such that dimensions of the d dimensions in the bit vector that are present in the cluster are assigned one value and dimensions of the d dimensions in the bit vector that are not in the cluster are assigned another value, the given set of projected dimensions being continuously redefined to minimize a cluster radius, wherein redefining the set of projected dimensions comprises varying the values of the bit vector based on which dimensions are present and not present in the cluster as the data stream progresses and the one or more data points fade; and
(iii) determine assignments for incoming data points of the continuously progressing data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters.
-
24. An article of manufacture for use in processing a continuously progressing data stream, the article comprising a computer readable storage medium containing one or more programs which when executed implement the steps of:
-
maintaining a cluster structure, the cluster structure representing one or more clusters in the continuously progressing data stream, wherein the cluster structure comprises one or more data points in a multidimensional space, further wherein the one or more data points of the cluster structure fade as the continuously progressing data stream progresses; determining a set of projected dimensions for each of the one or more clusters using the one or more data points in the cluster structure, a given set of projected dimensions of a cluster being associated with a subset of a total number of dimensions of the continuously progressing data stream and the given set of projected dimensions being represented by a d-dimensional bit vector such that dimensions of the d dimensions in the bit vector that are present in the cluster are assigned one value and dimensions of the d dimensions in the bit vector that are not in the cluster are assigned another value, the given set of projected dimensions being continuously redefined to minimize a cluster radius, wherein redefining the set of projected dimensions comprises varying the values of the bit vector based on which dimensions are present and not present in the cluster as the data stream progresses and the one or more data points fade; and determining assignments for incoming data points of the continuously progressing data stream to the one or more clusters using distances associated with each set of projected dimensions for each of the one or more clusters.
-
Specification