COMPUTER SYSTEMS AND METHODS FOR HIERARCHICAL CLUSTER ANALYSIS OF LARGE SETS OF BIOLOGICAL DATA INCLUDING HIGHLY DENSE GENE ARRAY DATA
First Claim
1. A system for analyzing information based on measurements of at least one measurement type, a measurement of each measurement type performed on each of a plurality of biological test subjects, the system comprising:
- an input component configured to receive a data file of a test matrix containing sets of measurement values, each set of measurement values containing a measurement of each measurement type for one of the plurality of biological test subjects;
a pre-conditioning component configured to assign each of the sets of measurement values to one of a plurality of nonhierarchical clusters, at least one of the nonhierarchical clusters having more than one set of measurement values assigned;
a reduction component configured to generate a data file of a reduced test matrix from the data file of the test matrix, the reduced test matrix containing one set of representative values associated with each nonhierarchical cluster, each set of representative values based on the sets of measurement values assigned to the nonhierarchical cluster associated with the each set of representative values; and
a hierarchical clustering component configured to order the sets of representative values into hierarchical clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and corresponding method analyzes biological data for sets of test subjects such as gene arrays of group test subjects into clusters and order the clusters into a hierarchy based on similarities and differences of biological data corresponding to the test subjects. A combination of nonhierarchical clustering and hierarchical clustering methods is used to efficiently and effectively perform hierarchical clustering of such biological data as highly dense gene arrays containing many thousand test subjects such as genes. First the test subjects are nonhierarchically clustered according to similarities and differences of their biological data as determined by distance techniques. Representative values, such as mean values, of the biological data are determined for each nonhierarchical cluster of test subjects. These representative values are then used to hierarchically cluster the nonhierarchical clusters. Biological data for each test subject is displayed in a row of a table. The rows of the table are arranged by the nonhierarchical clustering and further by the hierarchical clustering. Each value of the biological data is color coded according to its value to display patterns in the hierarchically clustered biological data.
36 Citations
55 Claims
-
1. A system for analyzing information based on measurements of at least one measurement type, a measurement of each measurement type performed on each of a plurality of biological test subjects, the system comprising:
-
an input component configured to receive a data file of a test matrix containing sets of measurement values, each set of measurement values containing a measurement of each measurement type for one of the plurality of biological test subjects;
a pre-conditioning component configured to assign each of the sets of measurement values to one of a plurality of nonhierarchical clusters, at least one of the nonhierarchical clusters having more than one set of measurement values assigned;
a reduction component configured to generate a data file of a reduced test matrix from the data file of the test matrix, the reduced test matrix containing one set of representative values associated with each nonhierarchical cluster, each set of representative values based on the sets of measurement values assigned to the nonhierarchical cluster associated with the each set of representative values; and
a hierarchical clustering component configured to order the sets of representative values into hierarchical clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40)
-
-
15. An analysis system for biological data, the system comprising:
-
a receiver configured to receive the biological data on biological subjects, the biological subjects assigned to nonhierarchical clusters; and
a clustering component configured to hierarchically cluster the nonhierarchical clusters according to values representative of the nonhierarchical clusters.
-
-
19. A data conditioning system comprising:
-
an input component configured to receive biological data on biological samples, the biological data nonhierarchically ordered according to nonhierarchical clusters of the biological samples, the nonhierarchical clusters generated by the nonhierarchical clustering system; and
a conversion component configured to generate sets of representative data for input into the hierarchical clustering system, one of the sets of representative data being generated for each nonhierarchical cluster of biological samples, each set of representative data based on the received nonhierarchically ordered biological data of the respective nonhierarchical cluster of biological samples.
-
-
24. A computer-readable medium for storing computer-readable instructions, the instructions written to program a computer to perform a method, the method comprising:
-
receiving a data file of biological data for biological samples;
assigning biological sample data to nonhierarchical clusters;
generating representative values for each nonhierarchical cluster; and
ordering the nonhierarchical clusters of biological data according to a hierarchical clustering based on the representative values.
-
-
37. A system for displaying hierarchically clustered biological data comprising:
-
a color monitor;
a computer system coupled to the color monitor; and
a software program configured to instruct the computer system to display on the color monitor values representative of nonhierarchical clusters of biological data in a table having hierarchical cluster order, portions of the table colored according to the representative values.
-
-
41. A data structure stored on a computer-readable medium, the data structure having a plurality of records containing information generated from biological samples, each of the records comprising:
-
a section containing the information generated from the biological samples;
a section containing a number or label indicating a nonhierarchical assignment; and
a section containing a number or label indicating a hierarchical assignment. - View Dependent Claims (42)
-
-
43. A method for generating information based on biological samples, the method comprising:
-
receiving a data file of a test matrix containing sets of measurement values, each set of measurement values containing a measurement of each measurement type for one of the plurality of biological test subjects;
assigning each of the sets of measurement values to one of a plurality of nonhierarchical clusters, at least one of the nonhierarchical clusters having more than one set of measurement values assigned; and
generating a data file of a reduced test matrix from the data file of the test matrix, the reduced test matrix containing one set of representative values associated with each nonhierarchical cluster, each set of representative values based on the sets of measurement values assigned to the nonhierarchical cluster associated with the each set of representative values; and
ordering the sets of representative values into hierarchical clusters. - View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. A method of displaying biological data comprising:
-
receiving biological data records or representative records, each biological data record associated with a biological sample, each representative record representing at least one biological data record, at least one of the representative records representing a nonhierarchical cluster of biological data records;
assigning the biological data records or representative records to a table having fields for values of the representative records respectively, each field containing one value;
ordering each placed biological data record in the table according to the nonhierarchically ordered cluster of its associated biological sample;
arranging each placed representative record or each ordered placed biological data record in the table according to a hierarchically ordered clustering based on the placed representative record or the representative record associated with the ordered placed biological data record; and
displaying portions of the table containing values of the arranged ordered placed biological data records or arranged placed representative records, the displaying of portions of the table according to predetermined key with respect to each displayed value. - View Dependent Claims (53, 54, 55)
-
Specification