PATTERN CHANGE DISCOVERY BETWEEN HIGH DIMENSIONAL DATA SETS
First Claim
1. A method for pattern change discovery between high-dimensional data sets, comprising:
- determining a linear model of a dominant subspace for each pair of high dimensional data sets using at least one automated processor, and using matrix factorization to produce a set of principal angles representing differences between the linear models;
defining a set of basis vectors under a null hypothesis of no statistically significant pattern change and under an alternative hypothesis of a statistically significant pattern change;
performing a statistical test on the basis vectors with respect to the null hypothesis and the alternate hypothesis to automatically determine whether a statistically significant difference is present; and
producing an output selectively dependent on whether the statistically significant difference is present.
2 Assignments
0 Petitions
Accused Products
Abstract
The general problem of pattern change discovery between high-dimensional data sets is addressed by considering the notion of the principal angles between the subspaces is introduced to measure the subspace difference between two high-dimensional data sets. Current methods either mainly focus on magnitude change detection of low-dimensional data sets or are under supervised frameworks. Principal angles bear a property to isolate subspace change from the magnitude change. To address the challenge of directly computing the principal angles, matrix factorization is used to serve as a statistical framework and develop the principle of the dominant subspace mapping to transfer the principal angle based detection to a matrix factorization problem. Matrix factorization can be naturally embedded into the likelihood ratio test based on the linear models. The method may be unsupervised and addresses the statistical significance of the pattern changes between high-dimensional data sets.
-
Citations
20 Claims
-
1. A method for pattern change discovery between high-dimensional data sets, comprising:
-
determining a linear model of a dominant subspace for each pair of high dimensional data sets using at least one automated processor, and using matrix factorization to produce a set of principal angles representing differences between the linear models; defining a set of basis vectors under a null hypothesis of no statistically significant pattern change and under an alternative hypothesis of a statistically significant pattern change; performing a statistical test on the basis vectors with respect to the null hypothesis and the alternate hypothesis to automatically determine whether a statistically significant difference is present; and producing an output selectively dependent on whether the statistically significant difference is present. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for determining pattern change discovery between high-dimensional data sets, comprising:
-
an input configured to receive a pair of high dimensional data sets; at least one automated processor, configured to; determine a linear model of a dominant subspace for each pair of high dimensional data sets; factoring at least one matrix to produce a set of principal angles representing differences between the linear models; define a set of basis vectors under a null hypothesis of no statistically significant pattern change and under an alternative hypothesis of a statistically significant pattern change; and perform a statistical test on the basis vectors with respect to the null hypothesis and the alternate hypothesis to determine whether a statistically significant difference is present; and an output configured to communicate data selectively dependent on whether the statistically significant difference is present. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A nontransitory computer readable medium, comprising instructions for controlling a programmable processor to perform a method comprising:
-
determining a linear model of a dominant subspace for each pair of high dimensional data sets using at least one automated processor, and using matrix factorization to produce a set of principal angles representing differences between the linear models; defining a set of basis vectors under a null hypothesis of no statistically significant pattern change and under an alternative hypothesis of a statistically significant pattern change; performing a statistical test on the basis vectors with respect to the null hypothesis and the alternate hypothesis to automatically determine whether a statistically significant difference is present; and producing an output selectively dependent on whether the statistically significant difference is present.
-
Specification