Hierarchically organizing data using a partial least squares analysis (PLS-trees)
First Claim
1. A computer-implemented method comprising:
- providing, by a computer, a first data matrix and a second data matrix, each of the first and second data matrices including one or more variables, and a plurality of data points;
determining, by the computer, a first score from the first data matrix using a partial least squares (PLS) analysis or orthogonal PLS (OPLS) analysis; and
partitioning, by the computer, the first and second data matrices row-wise into a first group and a second group based on the first score of the first data matrix, and a penalty function that evaluates the first group and the second group based on (a) the variance of the first data matrix, and (b) a variance in the first and second groups relative to the variances of the first and second data matrices.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and system for partitioning (clustering) large amounts of data in a relatively short processing time. The method involves providing a first data matrix and a second data matrix where each of the first and second data matrices includes one or more variables, and a plurality of data points. The method also involves determining a first score from the first data matrix using a partial least squares (PLS) analysis or orthogonal PLS (OPLS) analysis and partitioning the first and second data matrices (e.g., row-wise) into a first group and a second group based on the sorted first score, the variance of the first data matrix, and a variance of the first and second groups relative to the variances of the first and second data matrices.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
providing, by a computer, a first data matrix and a second data matrix, each of the first and second data matrices including one or more variables, and a plurality of data points; determining, by the computer, a first score from the first data matrix using a partial least squares (PLS) analysis or orthogonal PLS (OPLS) analysis; and partitioning, by the computer, the first and second data matrices row-wise into a first group and a second group based on the first score of the first data matrix, and a penalty function that evaluates the first group and the second group based on (a) the variance of the first data matrix, and (b) a variance in the first and second groups relative to the variances of the first and second data matrices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer program product, tangibly embodied in a non-transitory computer readable medium, the computer program product including instructions being operable to cause data processing apparatus to:
-
receive a first data matrix and a second data matrix, each of the first and second data matrices including one or more data points; determine a first score from the first data matrix using a partial least squares (PLS) analysis or OPLS analysis of the first and second data matrices; and partition the first and second data matrices row-wise into a first group and a second group based on the first score of the first data matrix, and a penalty function the evaluates the first group and the second group based on (a) the variance of the first data matrix, and (b) a variance in the first and second groups relative to the variances of the first and second data matrices.
-
-
19. A system for hierarchically organizing data, the system comprising:
-
(a) a memory including; (a1) a data structure including a first data matrix and a second data matrix; (b) a processor operatively coupled to the memory, the processor comprising; (b1) a module for determining a first score based in part on a partial least squares analysis or OPLS analysis of the first data matrix; (b2) a module for partitioning the first and second data matrices to generate a first group and a second group based in part on the first score of the first data matrix, and a penalty function that evaluates the first group and the second group based on (a) the variance of the first data matrix, and (b) a variance in the first and second groups relative to the first and second data matrices; and (c) a display operatively coupled to the processor to display the first and second groups and an association of the first and second groups to the first and second data matrices.
-
-
20. A system for analyzing data, the system comprising:
-
a data retrieval means for retrieving a first data matrix and a second data matrix from a memory, each of the first and second data matrices including one or more data points; a data analysis means to determine a first score from the first data matrix using a partial least squares (PLS) analysis or OPLS analysis; and a data partitioning means to divide the first and second data matrices into a first group and a second group based on the first score of the first data matrix, and a penalty function that evaluates the first group and the second group based on (a) the variance of the first data matrix, and (b) a variance in the first and second groups relative to the variances of the first and second data matrices.
-
Specification