×

Categorization and filtering of scientific data

  • US 9,141,913 B2
  • Filed: 03/04/2009
  • Issued: 09/22/2015
  • Est. Priority Date: 12/16/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for evaluating a correlation between (i) a gene, a SNP, a SNP pattern, a portion of gene, a region of a genome, or a compound, and (ii) a disease or a genotype, the method comprising:

  • providing a taxonomy of categories of diseases and/or phenotypes arranged in a hierarchical structure comprising at least one top-level category;

    providing, a plurality of feature sets, each feature set comprising (a) two or more features, (b) associated experimentally-derived statistical information indicating one or more of;

    differential expression of said features, abundance of said features, responses of said features to a treatment or stimulus, and effects of said features on biological systems, and (c) a feature rank indicating the importance of the feature in an experiment from which the statistical information was derived,whereinthe features are genes, SNPs, SNP patterns, portions of genes, regions of a genome, or compounds,at least some of the features have different names but correspond to a same gene, SNP, SNP pattern, portion of gene, region of a genome, or compound,the plurality of feature sets is obtained from across different experiments, platforms, and/or organisms, andat least some of said feature sets are associated with one or more categories in the taxonomy;

    providing a plurality of globally unique mapping identifiers;

    identifying, for each globally unique mapping identifier, one or more features associated with the globally unique mapping identifier;

    mapping, for each globally unique mapping identifier, the identified one or more features to the globally unique mapping identifier, thereby providing mapping data indicating mapping between a plurality of features and the plurality of globally unique mapping identifiers, wherein at least some features having different names but corresponding to a same gene, SNP, SNP pattern, portion of gene, region of a genome, or compound are mapped to a same globally unique mapping identifier;

    storing the mapping data in an index set;

    identifying, for each of a plurality of the categories in the taxonomy, contributing feature sets that contribute to scoring a category under consideration by identifying all feature sets among the provided feature sets that are associated with the category under consideration and its child categories in the taxonomy;

    combining the feature ranks of all features in the contributing feature sets that can be mapped to a globally unique mapping identifier under consideration based on the mapping data in the index set to obtain an overall score; and

    evaluating a correlation between (i) a gene, a SNP, a SNP pattern, a portion of gene, a region of a genome, or a compound corresponding to the globally unique mapping identifier under consideration, and (ii) a disease or a genotype corresponding to the category under consideration based on the obtained overall score.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×