Techniques for facilitating identification of candidate genes

US 6,470,277 B1
Filed: 07/28/2000
Issued: 10/22/2002
Est. Priority Date: 07/30/1999
Status: Expired due to Fees

First Claim

Patent Images

1. In a computer system, a method of identifying candidate genes from a plurality of DNA sequences, the method comprising:

obtaining results of a homology search for a first plurality of DNA sequences, the homology search results comprising information about homologs of the first plurality of DNA sequences;

obtaining annotative information for the first plurality of DNA sequences, the annotative information comprising information about biochemical functions and physiological roles of the first plurality of DNA sequences, wherein obtaining the annotative information comprises;

identifying one or more known genes from the first plurality of DNA sequences based on the homology search results, wherein a DNA sequence from the first plurality of DNA sequences is identified as a known gene if a sequence identity of the DNA sequence to a sequence stored in a first database of sequences used for the homology search is at least equal to a first threshold value;

accessing one or more information sources storing annotative information for DNA sequence;

extracting annotative information from the one or more information sources for the known genes, the extracted annotative information comprising information about one or more biochemical functions and physiological roles of each known gene; and

assigning a reference score to the extracted annotative information for each known gene based on the level of acceptance of the roles or functions of the known gene as described by the annotative information such that annotative information with a high level of acceptance is assigned a higher reference score than annotative information with a low level of acceptance;

obtaining gene expression profile data for the first plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the first plurality of DNA sequences;

clustering the first plurality of DNA sequences based on the behavioral patterns of the first plurality of DNA sequences as described by the gene expression profile data;

storing the results of the homology search, the annotative information, the reference score assigned to the extracted annotative information for each known gene, the gene expression profile data, and results from clustering the first plurality of DNA sequences in a database;

receiving a query identifying criteria for the candidate genes; and

searching the database, in response to the query, to identify a set of DNA sequences from the first plurality of DNA sequences which satisfy the query criteria.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for facilitating the identification of candidate genes from a plurality of DNA sequences. According to an embodiment of the present invention, techniques are provided for extracting and integrating information from various information sources and results of various analyses, and storing the integrated information in a form which is conducive to identification of candidate genes. The stored information may include results of a homology search for the plurality of DNA sequences, annotative information for the plurality of DNA sequences indicating the biochemical functions and physiological roles of the plurality of DNA sequences, gene expression profile data for the plurality of DNA sequences describing behavioral patterns of the plurality of DNA sequences, results from clustering the plurality of DNA sequences based on time course data as described by the gene expression profile data, and other information.

172 Citations

11 Claims

1. In a computer system, a method of identifying candidate genes from a plurality of DNA sequences, the method comprising:
- obtaining results of a homology search for a first plurality of DNA sequences, the homology search results comprising information about homologs of the first plurality of DNA sequences;
  
  obtaining annotative information for the first plurality of DNA sequences, the annotative information comprising information about biochemical functions and physiological roles of the first plurality of DNA sequences, wherein obtaining the annotative information comprises;
  
  identifying one or more known genes from the first plurality of DNA sequences based on the homology search results, wherein a DNA sequence from the first plurality of DNA sequences is identified as a known gene if a sequence identity of the DNA sequence to a sequence stored in a first database of sequences used for the homology search is at least equal to a first threshold value;
  
  accessing one or more information sources storing annotative information for DNA sequence;
  
  extracting annotative information from the one or more information sources for the known genes, the extracted annotative information comprising information about one or more biochemical functions and physiological roles of each known gene; and
  
  assigning a reference score to the extracted annotative information for each known gene based on the level of acceptance of the roles or functions of the known gene as described by the annotative information such that annotative information with a high level of acceptance is assigned a higher reference score than annotative information with a low level of acceptance;
  
  obtaining gene expression profile data for the first plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the first plurality of DNA sequences;
  
  clustering the first plurality of DNA sequences based on the behavioral patterns of the first plurality of DNA sequences as described by the gene expression profile data;
  
  storing the results of the homology search, the annotative information, the reference score assigned to the extracted annotative information for each known gene, the gene expression profile data, and results from clustering the first plurality of DNA sequences in a database;
  
  receiving a query identifying criteria for the candidate genes; and
  
  searching the database, in response to the query, to identify a set of DNA sequences from the first plurality of DNA sequences which satisfy the query criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the homology search for the first plurality of DNA sequences comprises performing BLAST analysis, Smith-Waterman analysis, Hidden Markov Model (HMM) analysis, and EMotif analysis.
  - 3. The method of claim 2 wherein performing the BLAST analysis, the Smith-Waterman analysis, the Hidden Markov Model (HMM) analysis, and the EMotif analysis comprises:
4. The method of claim 1 wherein the one or more information sources include Genbank database, SWISS-PROT database, Medline database, and biomedical publications.
5. The method of claim 1 wherein:
- accessing the one or more information sources comprises accessing biomedical publications;
  
  assigning the reference score to the extracted annotative information for each known gene comprises;
  
  for annotative information extracted from each biomedical publication;
  
  assigning a reference score to the extracted annotative information based on characteristics of the biomedical publication, the reference score indicating the level of acceptance of the roles or functions of the known genes as described by the annotative information extracted from the biomedical publication.
6. The method of claim 5 wherein assigning the reference score comprises:
- using a score derived from a citation index database to calculate the reference score, the score derived from the citation index database indicating the number of times that the annotative information from the biomedical publication was referenced by other information sources.
7. The method of claim 5 wherein assigning the reference score further comprises:
- ranking the biomedical publications; and
  
  assigning the reference score to the annotative information extracted from the biomedical publication based on the ranking of the biomedical publication.
8. The method of claim 1 wherein clustering the first plurality of DNA sequences comprises determining relationships between clusters of DNA sequences from the first plurality of DNA sequences.
9. The method of claim 1 wherein clustering the first plurality of DNA sequences comprises clustering the first plurality of DNA sequences based on time-course data described by the gene expression profile data.
10. The method of claim 1 wherein storing the information in the database comprises correlating the annotative information for the first plurality of DNA sequences with the genes expression profile data for the first plurality of DNA sequences.

11. In a computer system, a method of identifying candidate genes comprising:
- configuring a query identifying criteria for the candidate genes;
  
  communicating the query to a server storing information related to a plurality of DNA sequences, the information comprising;
  
  results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
  
  annotative information about the biochemical functions and physiological roles of the plurality of DNA sequences, wherein the annotative information is obtained by;
  
  identifying known genes from the plurality of DNA sequences based on the homology search results, wherein a DNA sequence from the plurality of DNA sequences is identified as a known gene if a sequence identity of the DNA sequence to a sequence stored in a database of sequences used for the homology search is at least equal to a first threshold value; and
  
  accessing one or more information sources storing annotative information for DNA sequences;
  
  extracting annotative information from the one or more information sources for the known genes, the extracted annotative information comprising information about one or more biochemical functions and physiological roles of each known gene; and
  
  assigning a reference score to the extracted annotative information for each known gene based on the level of acceptance of the roles or functions of the known gene as described by the annotative information such that annotative information with a high level of acceptance is assigned a higher reference score than annotative information with a low level of acceptance, wherein the annotative information stored by the server includes the reference score assigned to the extracted annotative information for each known gene;
  
  information describing behavioral patterns of the plurality of DNA sequences; and
  
  results from clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data; and
  
  receiving from the server, in response to the query, a first set of DNA sequences from the plurality of DNA sequences, wherein the first set of DNA sequences satisfy the criteria for the candidate genes identified in the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AGY Therapeutics, Inc.
Original Assignee
AGY Therapeutics, Inc.
Inventors
Hendrix, Donna, Zhao, Oliver, Chin, Daniel J.
Primary Examiner(s)
Brusca, John S.
Assistant Examiner(s)
KIM, YOUNG J

Application Number

US09/628,202
Time in Patent Office

816 Days
Field of Search

707/10, 707/104, 707/3, 707/6, 707/7, 702/27, 702/19, 706/45, 706/47
US Class Current

702/19
CPC Class Codes

G16B 20/00   ICT specially adapted for f...

G16B 25/00   ICT specially adapted for h...

G16B 25/10   Gene or protein expression ...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

G16B 50/00   ICT programming tools or da...

G16B 50/20   Heterogeneous data integration

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99936   Pattern matching access

Y10S 707/99937   Sorting

Techniques for facilitating identification of candidate genes

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

172 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques for facilitating identification of candidate genes

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

172 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links