Systems and methods for machine learning using classifying, clustering, and grouping time series data
First Claim
1. A system for performing data mining and statistical learning techniques on a data set, the system comprising:
- a processor; and
a non-transitory computer-readable storage medium including instructions stored thereon, which when executed by the processor, cause the system to perform operations including;
receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop the prediction hierarchy, each individual time series of the plurality of time series comprising one or more need output characteristics and a need output pattern for an object, the one or more need output characteristics including at least one of a need output data, an intermittence, or a time period of a year, the need output pattern indicating one or more time intervals for which need output for the object is greater than a threshold amount;
pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes executing tasks in parallel using a grid-enabled computing environment, the tasks comprising, for each time series of the plurality of time series;
determining a classification for the individual time series based on the one or more need output characteristics;
determining a pattern group for each individual time series by comparing the need output pattern to need output patterns for other time series in the plurality of time series; and
determining a level of the prediction hierarchy at which the each individual time series comprises a need output amount greater than the threshold amount, wherein determining the level further includes, for each time series in each level of the hierarchy and starting with a lowest level of the hierarchy;
determining whether the individual time series includes a sufficient volume of data by determining whether the individual time series includes an amount of need output above the threshold amount; and
based upon the determination, for each time series that does not include an amount of need output above the threshold amount, aggregating multiple time series from a particular level into a node that is one level higher than the particular level in the hierarchy;
generating an additional prediction hierarchy using the prediction hierarchy, the classification, the pattern group, and the determined level, wherein utilizing the additional prediction hierarchy generates more accurate need output predictions than need output predictions generated utilizing the prediction hierarchy; and
transmitting, to one or more nodes in the grid-enabled computing environment, prediction data related to at least one time series of the plurality of time series based on the additional prediction hierarchy.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for performing data mining and statistical learning techniques on a big data set. More specifically, systems and methods are provided for linear regression using safe screening techniques. Techniques may include receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop an improved prediction hierarchy. It may include pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes tasks performed in parallel using a grid-enabled computing environment. For each time series, the system may determine a classification for the individual time series, a pattern group for the individual time series, and a level of the prediction hierarchy at which the each individual time series comprises an need output amount greater than a threshold amount. The computing system may generate an additional prediction hierarchy using the first prediction hierarchy, the classification, the pattern group, and the level.
177 Citations
27 Claims
-
1. A system for performing data mining and statistical learning techniques on a data set, the system comprising:
-
a processor; and a non-transitory computer-readable storage medium including instructions stored thereon, which when executed by the processor, cause the system to perform operations including; receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop the prediction hierarchy, each individual time series of the plurality of time series comprising one or more need output characteristics and a need output pattern for an object, the one or more need output characteristics including at least one of a need output data, an intermittence, or a time period of a year, the need output pattern indicating one or more time intervals for which need output for the object is greater than a threshold amount; pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes executing tasks in parallel using a grid-enabled computing environment, the tasks comprising, for each time series of the plurality of time series; determining a classification for the individual time series based on the one or more need output characteristics; determining a pattern group for each individual time series by comparing the need output pattern to need output patterns for other time series in the plurality of time series; and determining a level of the prediction hierarchy at which the each individual time series comprises a need output amount greater than the threshold amount, wherein determining the level further includes, for each time series in each level of the hierarchy and starting with a lowest level of the hierarchy; determining whether the individual time series includes a sufficient volume of data by determining whether the individual time series includes an amount of need output above the threshold amount; and based upon the determination, for each time series that does not include an amount of need output above the threshold amount, aggregating multiple time series from a particular level into a node that is one level higher than the particular level in the hierarchy; generating an additional prediction hierarchy using the prediction hierarchy, the classification, the pattern group, and the determined level, wherein utilizing the additional prediction hierarchy generates more accurate need output predictions than need output predictions generated utilizing the prediction hierarchy; and transmitting, to one or more nodes in the grid-enabled computing environment, prediction data related to at least one time series of the plurality of time series based on the additional prediction hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause a data processing apparatus to perform operations including:
-
receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop the prediction hierarchy, each individual time series of the plurality of time series comprising one or more need output characteristics and a need output pattern for an object, the one or more need output characteristics including at least one of a need output data, an intermittence, or a time period of a year, the need output pattern indicating one or more time intervals for which need output for the object is greater than a threshold value amount; pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes executing tasks in parallel using a grid-enabled computing environment, the tasks comprising, for each time series of the plurality of time series; determining a classification for the individual time series based on the one or more need output characteristics; determining a pattern group for each individual time series by comparing the need output pattern to need output patterns for other time series in the plurality of time series; and determining a level of the prediction hierarchy at which the each individual time series comprises a need output amount greater than the threshold amount, wherein determining the level further includes, for each time series in each level of the hierarchy and starting with a lowest level of the hierarchy; determining whether the individual time series includes a sufficient volume of data by determining whether the individual time series includes an amount of need output above the threshold amount; and based upon the determination, for each time series that does not include an amount of need output above the threshold amount, aggregating multiple time series from a particular level into a node that is one level higher than the particular level in the hierarchy; generating an additional prediction hierarchy using the prediction hierarchy, the classification, the pattern group, and the determined level, wherein utilizing the additional prediction hierarchy generates more accurate need output predictions than need output predictions generated utilizing the prediction hierarchy; and transmitting, to one or more nodes in the grid-enabled computing environment, prediction data related to at least one time series of the plurality of time series based on the additional prediction hierarchy. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for performing data mining and statistical learning techniques on a data set, the method comprising:
-
receiving a plurality of time series included in a prediction hierarchy for performing statistical learning to develop the prediction hierarchy, each individual time series of the plurality of time series comprising one or more need output characteristics and a need output pattern for an object, the one or more need output characteristics including at least one of a need output data, an intermittence, or a time period of a year, the need output pattern indicating one or more time intervals for which need output for the object is greater than a threshold amount; pre-processing data associated with each of the plurality of time series, wherein the pre-processing includes executing tasks in parallel using a grid-enabled computing environment, the tasks comprising, for each time series of the plurality of time series; determining a classification for the individual time series based on the one or more need output characteristics; determining a pattern group for each individual time series by comparing the need output pattern to need output patterns for other time series in the plurality of time series; and determining a level of the prediction hierarchy at which the each individual time series comprises a need output amount greater than the threshold amount, wherein determining the level further includes, for each time series in each level of the hierarchy and starting with a lowest level of the hierarchy; determining whether the individual time series includes a sufficient volume of data by determining whether the individual time series includes an amount of need output above the threshold amount; and based upon the determination, for each time series that does not include an amount of need output above the threshold amount, aggregating multiple time series from a particular level into a node that is one level higher than the particular level in the hierarchy; generating an additional prediction hierarchy using the prediction hierarchy, the classification, the pattern group, and the determined level, wherein utilizing the additional prediction hierarchy generates more accurate need output predictions than need output predictions generated utilizing the prediction hierarchy; and transmitting, to one or more nodes in the grid-enabled computing environment, prediction data related to at least one time series of the plurality of time series based on the additional prediction hierarchy. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
-
Specification