METHODS AND SYSTEMS FOR CUSTOMIZABLE CLUSTERING OF SUB-NETWORKS FOR BIOINFORMATICS AND HEALTH CARE APPLICATIONS
First Claim
1. A method for clustering a plurality of sub-networks comprising:
- receiving as input in a computing device, one or more expression data sets of one or more samples;
preprocessing the input expression data for obtaining a plurality of seed markers;
wherein the seed markers are biomarkers or input marker genes, and obtained by methods such but not limited to thresholding normalized expression data based on a predefined threshold value;
extracting a set of sub-networks for the input samples using expression values,set of seed markers obtained as above and the interaction network;
selecting sub networks among the plurality of the extracted or input sub-networks;
building a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein each of the plurality of clusters correspond to the selected sub-networks;
building a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each of the local heap among the plurality of local heaps;
merging the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;
calculating a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap and each cluster corresponding to one of the local heap; and
returning each intermediate cluster as a final cluster, if each the calculated similarity coefficients are below a predefined link cutoff value.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and devices for clustering a plurality of sub-networks of a larger interaction network using an enhanced hierarchical clustering algorithm are disclosed. The methods provide expression based sub-network generation using differentially expressed markers. The enhanced hierarchical clustering algorithm clusters the generated sub-networks based on a user defined customizable similarity coefficient. The methods use non-Boolean links to cluster similar sub-networks. This provides consideration of indirect relationships among sub-networks. The customizable similarity coefficient enables the methods to be used for diverse applications such as biomarker detection, patient stratification, personalized therapy, drug efficacy prediction, genetic similarity analysis in genetic diseases. The methods enable patient grouping based on the enhanced hierarchical clustering algorithm.
-
Citations
19 Claims
-
1. A method for clustering a plurality of sub-networks comprising:
-
receiving as input in a computing device, one or more expression data sets of one or more samples; preprocessing the input expression data for obtaining a plurality of seed markers;
wherein the seed markers are biomarkers or input marker genes, and obtained by methods such but not limited to thresholding normalized expression data based on a predefined threshold value;extracting a set of sub-networks for the input samples using expression values, set of seed markers obtained as above and the interaction network; selecting sub networks among the plurality of the extracted or input sub-networks; building a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein each of the plurality of clusters correspond to the selected sub-networks; building a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each of the local heap among the plurality of local heaps; merging the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters; calculating a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap and each cluster corresponding to one of the local heap; and returning each intermediate cluster as a final cluster, if each the calculated similarity coefficients are below a predefined link cutoff value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 19)
-
-
10. A device for clustering a plurality of sub-networks derived from a larger network using an enhanced hierarchical clustering algorithm, wherein the device comprises:
-
an integrated circuit further comprising at least one processor; at least one memory having a computer program code within the circuit; the at least one memory and the computer program code with the at least one processor cause the device, when the computer program code is executed by the processor, to; receive a data set representing a plurality of sub-networks derived from a network; select sub networks among the plurality of sub-networks; build a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein the plurality of clusters correspond to a plurality of selected sub-networks among the plurality of sub-networks; build a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each the local heap among the plurality of local heaps; merge the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters; calculate a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap, each cluster corresponding to one of the local heap; and return each intermediate cluster as a final cluster, if each the calculated link is below a predefined link cutoff value. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification