SPANNING-TREE PROGRESSION ANALYSIS OF DENSITY-NORMALIZED EVENTS (SPADE)

US 20130060775A1
Filed: 03/02/2012
Published: 03/07/2013
Est. Priority Date: 12/27/2010
Status: Active Application

First Claim

Patent Images

1. A computer implemented method of analyzing and sorting feature data from a large number of samples comprising:

detecting features of said samples using a feature detecting system;

determining numerical feature values representing said detected features;

storing said numerical feature values in an initial sample database in a digital memory at a computer system, said initial sample database comprising an array with dimensions roughly equal to the number of said samples by the number of different feature values stored for each sample;

density-dependent downsampling said sample database using executable logic at said computer system by determining a local density value for samples in said array and removing a portion of samples in dense regions of said array;

storing a downsampled sample database comprising a downsampled array in said digital memory at said computer system;

clustering samples in said downsampled array by agglomerative clustering using executable logic at said computer system to determine a plurality of sample clusters;

storing data regarding said sample clusters in said digital memory at said computer system;

determining one or more progression trees connecting said clusters using said executable logic at said computer system;

storing data regarding said progression trees at said computer system, andsaid computer system outputting to a user multiple representations of a progression tree of said clusters, a topology of said representations indicating a progression or hierarchy of said clusters, and color or other indicators of said representations indicating different feature values of said clusters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for determining progression and other characteristics of microarray expression levels and similar information, alternatively using a network or communications medium or tangible storage medium or logic processor.

12 Citations

64 Claims

1. A computer implemented method of analyzing and sorting feature data from a large number of samples comprising:
- detecting features of said samples using a feature detecting system;
  
  determining numerical feature values representing said detected features;
  
  storing said numerical feature values in an initial sample database in a digital memory at a computer system, said initial sample database comprising an array with dimensions roughly equal to the number of said samples by the number of different feature values stored for each sample;
  
  density-dependent downsampling said sample database using executable logic at said computer system by determining a local density value for samples in said array and removing a portion of samples in dense regions of said array;
  
  storing a downsampled sample database comprising a downsampled array in said digital memory at said computer system;
  
  clustering samples in said downsampled array by agglomerative clustering using executable logic at said computer system to determine a plurality of sample clusters;
  
  storing data regarding said sample clusters in said digital memory at said computer system;
  
  determining one or more progression trees connecting said clusters using said executable logic at said computer system;
  
  storing data regarding said progression trees at said computer system, andsaid computer system outputting to a user multiple representations of a progression tree of said clusters, a topology of said representations indicating a progression or hierarchy of said clusters, and color or other indicators of said representations indicating different feature values of said clusters.
- View Dependent Claims (4, 5, 7, 8, 13, 18, 19, 29, 31, 32, 40, 45, 52, 54, 63)
- - 4. The method of claim 1 further comprising:
    - said samples comprise about M samples;
      
      said features comprise about N features;
      
      said initial samples database comprises an array of about M samples by about N features;
      
      said downsampled sample database comprises a downsampled array of about M* samples by about N* features;
      
      said clusters comprise C clusters.
  - 5. The method of claim 1 further comprising:
    - outputting to a user at least two visual representations of said progression trees, each representation showing data on said progression tree (such as by color-coding) to indicate intensity of one or more of said features; and
      
      outputting one representation of said progression tree for each feature of interest.
  - 7. The method of claim 1 further comprising:
    - storing additional numerical feature values in said initial samples database, said additional values not used in said downsampling or said clustering;
      
      said computer system outputting to a user multiple representations of a progression tree of said clusters, the distance and connection of nodes of said multiple trees indicating a progression of said samples using said features, and color or other indicators of said nodes or said trees indicating one or more additional numerical features.
  - 8. The method of claim 1 further comprising:
    - selecting a first subset of said features to use for said density-dependent downsampling;
      
      selecting a second subset of said features to use for said clustering;
      
      selecting a third subset of said features to use for said determining one or more progression trees; and
      
      selecting a fourth subset of said features to use for said outputting.
  - 13. The method of claim 5 further comprising:
    - outputting to a user at least one visual representation of said progression tree that allows a user to interactively select clusters of samples to perform gating.
  - 18. The method of claim 1 further wherein said samples comprise a plurality of cells and said features comprise a number of detectable cellular features or characteristics, the method further providing:
    - constructing and outputting progression trees based on a selected subset of features to perform gating.
  - 19. The method of claim 1 further wherein said density-dependent downsampling comprises:
    - determining a local density for each sample in said sample array using said computer system;
      
      heavily downsampling samples with local densities above a target density (TD), so that their local densities reduce to approximately the target density after downsampling;
      
      preventing downsampling of samples with local densities less than said target density;
      
      such that abundant and rare sample types are relatively equally represented, and rare sample types are more likely to form their own clusters;
      
      such that the number of samples in said array is significantly reduced while most samples of the rare sample type remain after downsampling and an overall distribution or shape of samples in the array of the original dataset is preserved.
  - 29. The method of claim 1 further comprising:
    - after said clustering is complete, said computer system upsamples the cluster data by assigning each sample in the original dataset to a cluster and said upsampling comprises;
      
      calculating median intensity and other statistics of each cluster with high accuracy;
      
      assigning each sample in the original dataset to one cluster by determining its nearest neighbor in the downsampled data and assigning the sample to the cluster that the nearest neighbor in the downsampled data belongs to.
  - 31. The method of claim 1 further wherein the determining comprises:
    - defining a fully connected undirected weighted graph wherein each node represents one sample cluster;
      
      determining a weight on the edge that connects nodes i and j is defined as the Euclidean distance between the feature expressions of sample clusters i and j;
      
      applying Boruvka'"'"'s algorithm to derive the MST from the fully connected graph;
      
      wherein since the MST connects all the nodes using minimum total edge weights, it tends to connect sample clusters that are more similar to each other;
      
      such that starting from one cluster and moving along the edges of the MST, a gradual change of feature expression levels is observed.
  - 32. The method of claim 1 further comprising:
    - analyzing dynamics of features under different perturbations;
      
      such that for a feature and a perturbation, determining for each cluster the ratio between the median intensities of features in the unstimulated (basal) condition and the median intensities of features in the stimulated (perturbed) condition;
      
      indicating dynamics of features under perturbations in said outputting of cluster representations (e.g., by color coding nodes) in said trees using said ratio.
  - 40. The method of claim 1 further comprising:
    - determining from said progression trees, one or more progression branches;
      
      comparing said progression branches across multiple features; and
      
      associating said branches with one or more hierarchies or progression states of said samples.
  - 45. The method of claim 1 further wherein said samples comprise a plurality of cells and said features comprise a number of detectable cellular features or characteristics, the method further providing:
    - constructing and outputting progression trees based on a subset of features to identify uncharacterized features by correlating their progression behavior to well-characterized features.
  - 52. The method of claim 1 further comprising:
    - determining a progression similarity between two or more feature indicating the number of progressions supported in common by the feature;
      
      wherein the progression similarity is an integer count of progression concordant with the feature according to a selected threshold.
  - 54. The method of claim 52 further wherein the progression similarity matrix quantifies the progression similarity between pairs of features, wherein the (u;
    - v) element of the progression similarity matrix is the number of MSTs that are concordant with both features u and v.
  - 63. The method of claim 1 further comprising:
    - receiving into a provided memory readable and writable by a provided CPU the flow cytometry data, said flow cytometry data containing data values indicative of a plurality of features measured for a larger plurality of samples;
      
      clustering samples into a smaller number of sample clusters, where clusters are determined by comparing features across multiple samples from the received flow cytometry data;
      
      determining per-feature progressions for one or more selected features from the flow cytometry data;
      
      identifying progression-similar features by identifying which features have high progression similarity to multiple per-feature progressions; and
      
      using the progression-concordant features to determine a most likely overall progression of samples from flow cytometry data; and
      
      outputting said most likely overall progression of flow cytometry samples to a user using a provided computer output device.

2. (canceled)

3. A computer implemented method for clustering and visualization of multicolor flow cytometry data comprising:
- receiving cell samples from one or more subjects;
  
  analyzing the samples using a flow cytometer, thereby yielding a multi-dimensional data set;
  
  estimating a density function for cell sample points in said multi-dimensional data set;
  
  creating a down-sampled array by removing a portion of samples in dense regions of said array;
  
  clustering cell samples in said downsampled array by agglomerative clustering to determine a plurality of sample clusters;
  
  estimating one or more progression trees in a Euclidean space having a dimensionality of three or less representing progression or hierarchy of said clusters, where the steps of creating, clustering, and estimating are executed by a processor of a computing device; and
  
  graphically displaying relationships between clusters using data in the Euclidean space on a display of the computing device.

6. (canceled)

9-12. -12. (canceled)

14-17. -17. (canceled)

20-28. -28. (canceled)

30. (canceled)

33-39. -39. (canceled)

41-44. -44. (canceled)

46-51. -51. (canceled)

55-60. -60. (canceled)

61. A system for flow cytometry or biologic analysis or diagnosis comprising:
- an input component reading sample data comprising multiple feature values for each sample;
  
  a density dependent downsampling component able to reduce the density of samples in a large dataset while preserving rare-samples and overall dataset shape;
  
  a clustering component and processor clustering samples into a number of sample clusters;
  
  a feature progression and differentiation determining component determining underlying sample cluster progression and differentiation using one or more of said feature values;
  
  a progression tree output and analysis module providing output and analysis of progression trees to determine hierarchy, progression, or differentiation of said samples.
- View Dependent Claims (62)
- - 62. The system of claim 61 further comprisinga progression similarity analysis processor and component that compares features to progressions determined for different features to identify features that support common progressions;
    - a feature selection component selecting features that support common progressions;
      
      a progression and differentiation determining component determining overall underlying sample progression and differentiation.

64-73. -73. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Board of Trustees of the Leland Stanford Junior University (Stanford Management Co.)
Original Assignee
Board of Trustees of the Leland Stanford Junior University (Stanford Management Co.)
Inventors
Qiu, Peng, Gentles, Andrew J., Plevritis, Sylvia K., Nolan, Garry, Sachs, Karen

Granted Patent

US 10,289,802 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/737
CPC Class Codes

G16B 25/00   ICT specially adapted for h...

G16B 40/00   ICT specially adapted for b...

G16B 40/30   Unsupervised data analysis

G16B 45/00   ICT specially adapted for b...

SPANNING-TREE PROGRESSION ANALYSIS OF DENSITY-NORMALIZED EVENTS (SPADE)

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

12 Citations

64 Claims

Specification

Solutions

Use Cases

Quick Links

SPANNING-TREE PROGRESSION ANALYSIS OF DENSITY-NORMALIZED EVENTS (SPADE)

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

64 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links