System, method and computer program for multi-dimensional temporal and relative data mining framework, analysis and sub-grouping
First Claim
1. A computer implemented data mining method for controlling mining of data streams in a distributed computing environment configured to provide a distribution layer operable to maintain consistencies across multiple distributed computing systems when performing distributed data processing and analysis, wherein different attributes are associated with each of a plurality data streams, the computer implemented data mining method comprising:
- (a) using a central distribution computer system component to store and maintain consistency of a data mining framework configured to support data mining across the multiple distributed computing systems, the data mining framework including at least;
(i) a series of temporal rules deployable to a subset of multiple distributed computing systems that are targets for a query; and
(ii) relative rules adapted for relatively aligning time series multi-dimensional data based on at least one time point of interest, the central distribution computer system being configured for determining a subset of particular temporal rules that are applicable to the time series multi-dimensional data associated to a particular site, based on the different attributes associated with the data streams;
(b) distributing, from the central distribution computer system to the multiple distributed computing systems, the series of temporal rules and the relative rules to be applied by each distributed computing systems of the multiple distributed computing systems to pre-process the time series multi-dimensional data and to generate new temporally abstracted and relatively aligned time series data representing trends and patterns that include one or more indications of a potential future clinical event;
(c) collecting, and cleaning at the multiple distributed computing systems, the time series multi-dimensional data, the time series multi-dimensional data obtained through one or more corresponding data streams of the plurality of data streams;
(d) temporally abstracting, at the multiple distributed computing systems, the collected and cleaned time series multi-dimensional data by accessing and applying the applicable temporal rules so as to generate temporally abstracted time series multi-dimensional data categorized both on similarity and frequency, and relatively aligning the temporally abstracted time series multi-dimensional data based on an at least one time point of interest by accessing and applying the applicable relative rules; and
(e) collecting the temporally abstracted and relatively aligned time series multi-dimensional data from the multiple distributed computing systems to provide multi-dimensional, temporal, multi-site time series data for use in data mining operations.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to a system, method and computer program product that is a multi-dimensional data mining environment and that operable to apply a series of temporal and relative rules (i.e., STDMn0) and is further operable in at least one of the following ways: to incorporate a framework to support temporal abstractions and relative alignments to data (i.e., STDMn0); and to derive characteristics within the data (STDMn0). The present invention may incorporate data from multiple sources, and potentially multiple centers. The analysis and alignment of the data may involve both temporal dimensions and other dimensions (or relative aspects) of the data. The present invention may further be a data mining environment that is flexible enough to permit relatively open ended queries thereby enabling, for example, the detection of trends, including trends with new dimensions, or trends based on relatively small data sets.
20 Citations
19 Claims
-
1. A computer implemented data mining method for controlling mining of data streams in a distributed computing environment configured to provide a distribution layer operable to maintain consistencies across multiple distributed computing systems when performing distributed data processing and analysis, wherein different attributes are associated with each of a plurality data streams, the computer implemented data mining method comprising:
-
(a) using a central distribution computer system component to store and maintain consistency of a data mining framework configured to support data mining across the multiple distributed computing systems, the data mining framework including at least; (i) a series of temporal rules deployable to a subset of multiple distributed computing systems that are targets for a query; and (ii) relative rules adapted for relatively aligning time series multi-dimensional data based on at least one time point of interest, the central distribution computer system being configured for determining a subset of particular temporal rules that are applicable to the time series multi-dimensional data associated to a particular site, based on the different attributes associated with the data streams; (b) distributing, from the central distribution computer system to the multiple distributed computing systems, the series of temporal rules and the relative rules to be applied by each distributed computing systems of the multiple distributed computing systems to pre-process the time series multi-dimensional data and to generate new temporally abstracted and relatively aligned time series data representing trends and patterns that include one or more indications of a potential future clinical event; (c) collecting, and cleaning at the multiple distributed computing systems, the time series multi-dimensional data, the time series multi-dimensional data obtained through one or more corresponding data streams of the plurality of data streams; (d) temporally abstracting, at the multiple distributed computing systems, the collected and cleaned time series multi-dimensional data by accessing and applying the applicable temporal rules so as to generate temporally abstracted time series multi-dimensional data categorized both on similarity and frequency, and relatively aligning the temporally abstracted time series multi-dimensional data based on an at least one time point of interest by accessing and applying the applicable relative rules; and (e) collecting the temporally abstracted and relatively aligned time series multi-dimensional data from the multiple distributed computing systems to provide multi-dimensional, temporal, multi-site time series data for use in data mining operations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A data mining computer system for mining data from multiple distributed computing systems, wherein different attributes may be associated with data streams, the system controlling mining of the data streams in a distributed computing environment configured to provide a distribution layer operable to maintain consistencies across the multiple distributed computing systems when performing distributed data processing and analysis, the system comprising:
-
(a) a central distribution computer system component configured to store and maintain a data mining framework configured to support data mining across the multiple distributed computing systems, the data mining framework including at least; (i) a series of temporal rules deployable to a subset of multiple distributed computing systems that are targets for a query and (ii) relative rules adapted for relatively aligning time series multi-dimensional data based on at least one time point of interest, the central distribution computer system being configured for determining a subset of particular temporal rules that are applicable to data associated to a particular site based on the different attributes associated with the data streams; the central distribution computer system component configured to distribute to the multiple distributed computing systems the data mining framework, including at least the series of temporal rules and the relative rules to be applied by each distributed computing systems of the multiple distributed computing systems to pre-process the time series multi-dimensional data and to generate new temporally abstracted and relatively aligned time series data representing trends and patterns that include one or more indications of a potential future clinical event; (b) one or more devices associated with two or more of the multiple distributed computing systems, the devices collecting data in a plurality of data streams at the multiple distributed computing systems; and (c) at least one local computer at each distributed computing system connected to central distribution computer system; wherein; the central distribution computer system is configured to manage the temporal abstraction and relative alignment of the data streams so as to support data mining operations for multi-dimensional data across the multiple sites by; accessing, from the at least one local computer, information regarding the different attributes for the data streams; providing, to the at least one local computer, the applicable temporal rules and applicable relative rules thereby enabling temporal abstraction of the time series multi-dimensional data to generate temporally abstracted time series multi-dimensional data, and to generate relative alignment of the temporally abstracted time series multi-dimensional data based on an at least one time point of interest in a way that addresses the different attributes; and collecting the temporally abstracted and relatively aligned time series multi-dimensional data from the multiple sites by communicating with the at least one local computer and initiating the retrieval and transfer of the temporally abstracted and relatively aligned data based on a data mining request. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification