METHODS AND SYSTEMS FOR CUSTOMIZABLE CLUSTERING OF SUB-NETWORKS FOR BIOINFORMATICS AND HEALTH CARE APPLICATIONS

US 20150213115A1
Filed: 01/23/2015
Published: 07/30/2015
Est. Priority Date: 01/24/2014
Status: Active Grant

First Claim

Patent Images

1. A method for clustering a plurality of sub-networks comprising:

receiving as input in a computing device, one or more expression data sets of one or more samples;

preprocessing the input expression data for obtaining a plurality of seed markers;

wherein the seed markers are biomarkers or input marker genes, and obtained by methods such but not limited to thresholding normalized expression data based on a predefined threshold value;

extracting a set of sub-networks for the input samples using expression values,set of seed markers obtained as above and the interaction network;

selecting sub networks among the plurality of the extracted or input sub-networks;

building a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein each of the plurality of clusters correspond to the selected sub-networks;

building a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each of the local heap among the plurality of local heaps;

merging the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;

calculating a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap and each cluster corresponding to one of the local heap; and

returning each intermediate cluster as a final cluster, if each the calculated similarity coefficients are below a predefined link cutoff value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and devices for clustering a plurality of sub-networks of a larger interaction network using an enhanced hierarchical clustering algorithm are disclosed. The methods provide expression based sub-network generation using differentially expressed markers. The enhanced hierarchical clustering algorithm clusters the generated sub-networks based on a user defined customizable similarity coefficient. The methods use non-Boolean links to cluster similar sub-networks. This provides consideration of indirect relationships among sub-networks. The customizable similarity coefficient enables the methods to be used for diverse applications such as biomarker detection, patient stratification, personalized therapy, drug efficacy prediction, genetic similarity analysis in genetic diseases. The methods enable patient grouping based on the enhanced hierarchical clustering algorithm.

Citations

19 Claims

1. A method for clustering a plurality of sub-networks comprising:
- receiving as input in a computing device, one or more expression data sets of one or more samples;
  
  preprocessing the input expression data for obtaining a plurality of seed markers;
  
  wherein the seed markers are biomarkers or input marker genes, and obtained by methods such but not limited to thresholding normalized expression data based on a predefined threshold value;
  
  extracting a set of sub-networks for the input samples using expression values,set of seed markers obtained as above and the interaction network;
  
  selecting sub networks among the plurality of the extracted or input sub-networks;
  
  building a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein each of the plurality of clusters correspond to the selected sub-networks;
  
  building a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each of the local heap among the plurality of local heaps;
  
  merging the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;
  
  calculating a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap and each cluster corresponding to one of the local heap; and
  
  returning each intermediate cluster as a final cluster, if each the calculated similarity coefficients are below a predefined link cutoff value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 19)
- - 2. The method as in claim 1, wherein a value of the link is based on a user defined customizable similarity coefficient used for computing a functional relationship quantifier.
  - 3. The method as in claim 1, wherein the method further comprises pushing the each intermediate cluster into the global heap if the calculated link is above a predefined link cutoff value.
  - 4. The method as in claim 1, wherein the method comprises building the local heap for the each cluster by adding each cluster from the remaining clusters to each local heap if the computed link for the cluster is above a predefined link cutoff value.
  - 5. The method as in claim 1, wherein the method comprises ranking at least one cluster in each local heap and at least one cluster in the global heap to determine the highest ranked cluster in each local heap and the highest ranked cluster in the global heap based on a value of the computed link for the at least one cluster in the each local heap and the at least one cluster in the global heap.
  - 6. The method as in claim 1, wherein the method further comprises performing grouping based on the enhanced hierarchical clustering algorithm by:
    - generating the plurality of sub-networks for each sample among a plurality of samples;
      
      clustering sub-networks within the plurality of sub-networks of each sample using the enhanced hierarchical clustering algorithm based on the customizable similarity coefficient; and
      
      generating a data set of clusters by pooling clusters across the plurality of samples.
  - 7. The method as in claim 5, wherein the method further comprises:
    - initializing a plurality of clusters-of-interest from the data set of clusters;
      
      growing the clusters-of-interest using the data set of clusters;
      
      determining membership of each sample in each the plurality of cluster-of-interest based on the clustered sub-networks for the each sample; and
      
      grouping the plurality of samples into a group among a plurality of groups based on the determined membership of the sample, wherein samples in the group exhibit identical cluster memberships.
  - 8. The method as in claim 5, wherein the method further comprises generating the plurality of sub-networks by:
    - generating a set of first level sub-networks around a plurality of seed markers based on differential marker expression;
      
      growing the set of generated first level sub-networks based on a predefined scoring function;
      
      wherein the predefined scoring function is defined as but not limited to the first derivative of a log likelihood function, andmerging the set of grown first level sub-networks based on one of;
      
      the enhanced clustering algorithm and a predefined similarity coefficient to generate the plurality of sub-networks; and
      
      merging of the sub-networks using a highest differentially expressed marker in a neighborhood.
  - 9. The method as in claim 6, wherein the similarity coefficient can be based on similarity measures such as but not limited to Jaccard coefficient, Edge interaction coefficient (EIC) and Common neighborhood interaction coefficient (CNIC).
  - 19. A computer program that is implemented by hardware and is stored in a medium to execute the method of claim 1.

10. A device for clustering a plurality of sub-networks derived from a larger network using an enhanced hierarchical clustering algorithm, wherein the device comprises:
- an integrated circuit further comprising at least one processor;
  
  at least one memory having a computer program code within the circuit;
  
  the at least one memory and the computer program code with the at least one processor cause the device, when the computer program code is executed by the processor, to;
  
  receive a data set representing a plurality of sub-networks derived from a network;
  
  select sub networks among the plurality of sub-networks;
  
  build a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein the plurality of clusters correspond to a plurality of selected sub-networks among the plurality of sub-networks;
  
  build a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each the local heap among the plurality of local heaps;
  
  merge the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;
  
  calculate a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap, each cluster corresponding to one of the local heap; and
  
  return each intermediate cluster as a final cluster, if each the calculated link is below a predefined link cutoff value.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The device as in claim 10, wherein a value of the link is based on a user defined customizable similarity coefficient used for computing a functional relationship quantifier.
  - 12. The device as in claim 10, wherein the device is further configured to push each intermediate cluster into the global heap if each the calculated link is above the predefined link cutoff value.
  - 13. The device as in claim 10, wherein the device is configured to build the local heap for the each cluster by adding each cluster from the remaining clusters to each local heap if the computed link for the cluster is above the predefined link cutoff value.
  - 14. The device as in claim 10, wherein the device is configured to rank at least one cluster in each local heap and at least one cluster in the global heap to determine the highest ranked cluster in each local heap and the highest ranked cluster in the global heap based on a value of the computed link for the at least one cluster in each local heap and the at least one cluster in the global heap.
  - 15. The device as in claim 10, wherein the device is further configured to perform patient grouping based on the enhanced hierarchical clustering algorithm by:
    - generating the plurality of sub-networks for each patient among a plurality of patients;
      
      clustering sub-networks within the plurality of sub-networks of each patient using the enhanced hierarchical clustering algorithm based on the customizable similarity coefficient; and
      
      generating a data set of clusters by pooling clusters across the plurality of patients.
  - 16. The device as in claim 15, wherein the device is further configured to:
    - initialize a plurality of clusters-of-interest from the data set of clusters;
      
      grow the clusters-of-interest using the data set of clusters;
      
      determine membership of the each patient in each the plurality of cluster-of-interest based on the clustered sub-networks for the each patient; and
      
      group the plurality of patients into a group among a plurality of groups based on the determined membership of the each patient, wherein patients in the group exhibit identical cluster membership.
  - 17. The device as in claim 15, wherein the device is further configured to generate the plurality of sub-networks by:
    - generating a set of first level sub-network around a plurality of seed markers based on differential marker expression;
      
      growing the set of generated first level sub-networks based on a predefined scoring function; and
      
      merging the set of grown first level sub-networks based on one of;
      
      the enhanced hierarchical clustering algorithm and the predefined scoring function to generate the plurality of sub-networks, wherein the predefined scoring function is defined as the first derivative of a log likelihood function.
  - 18. The device as in claim 10, wherein the device is further configured to refine at least one biomarker by clustering sub-networks generated from an incomplete set of disease specific input marker genes, wherein the clustering is based on a customizable similarity coefficient.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Mukherjee, Subhankar, Ahn, Taejin, Bopardikar, Ajit S., Bhaduri, Anirban, Mallavarapu, Srikanth Rama

Granted Patent

US 9,690,844 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/287 Visualization; Browsing

METHODS AND SYSTEMS FOR CUSTOMIZABLE CLUSTERING OF SUB-NETWORKS FOR BIOINFORMATICS AND HEALTH CARE APPLICATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND SYSTEMS FOR CUSTOMIZABLE CLUSTERING OF SUB-NETWORKS FOR BIOINFORMATICS AND HEALTH CARE APPLICATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links