Techniques for facilitating identification of candidate genes
First Claim
1. A computer-implemented method of identifying candidate genes from a plurality of DNA sequences, the method comprising:
- obtaining results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
obtaining annotative information for the plurality of DNA sequences, the annotative information comprising information about the biochemical functions and physiological roles of the plurality of DNA sequences;
obtaining gene expression profile data for the plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the plurality of DNA sequences;
clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data;
storing the results of the homology search, the annotative information, the gene expression profile data, and results from clustering the plurality of DNA sequences in a database;
receiving a query identifying criteria for the candidate genes; and
searching the database, in response to the query, to identify a set of DNA sequences from the plurality of DNA sequences which satisfy the query criteria.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for facilitating the identification of candidate genes from a plurality of DNA sequences. According to an embodiment of the present invention, techniques are provided for extracting and integrating information from various information sources and results of various analyses, and storing the integrated information in a form which is conducive to identification of candidate genes. The stored information may include results of a homology search for the plurality of DNA sequences, annotative information for the plurality of DNA sequences indicating the biochemical functions and physiological roles of the plurality of DNA sequences, gene expression profile data for the plurality of DNA sequences describing behavioral patterns of the plurality of DNA sequences, results from clustering the plurality of DNA sequences based on time course data as described by the gene expression profile data, and other information.
25 Citations
27 Claims
-
1. A computer-implemented method of identifying candidate genes from a plurality of DNA sequences, the method comprising:
-
obtaining results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
obtaining annotative information for the plurality of DNA sequences, the annotative information comprising information about the biochemical functions and physiological roles of the plurality of DNA sequences;
obtaining gene expression profile data for the plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the plurality of DNA sequences;
clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data;
storing the results of the homology search, the annotative information, the gene expression profile data, and results from clustering the plurality of DNA sequences in a database;
receiving a query identifying criteria for the candidate genes; and
searching the database, in response to the query, to identify a set of DNA sequences from the plurality of DNA sequences which satisfy the query criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of identifying candidate genes comprising:
-
configuring a query identifying criteria for the candidate genes;
communicating the query to a server storing information related to a plurality of DNA sequences, the information comprising;
results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
information about the biochemical functions and physiological roles of the plurality of DNA sequences;
information describing behavioral patterns of the plurality of DNA sequences; and
results from clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data; and
receiving from the server, in response to the query, a first set of DNA sequences from the plurality of DNA sequences, wherein the first set of DNA sequences satisfy the criteria for the candidate genes identified in the query.
-
-
14. A data processing system for identifying candidate genes from a plurality of DNA sequences, the system comprising:
-
a processor; and
a memory coupled to the processor, the memory configured to store instructions for execution by the processor, the instructions comprising;
instructions for obtaining results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
instructions for obtaining annotative information for the plurality of DNA sequences, the annotative information comprising information about the biochemical functions and physiological roles of the plurality of DNA sequences;
instructions for obtaining gene expression profile data for the plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the plurality of DNA sequences;
instructions for clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data;
instructions for storing the results of the homology search, the annotative information, the gene expression profile data, and results from clustering the plurality of DNA sequences in the memory; and
instructions for searching the information stored in the memory, in response to a query identifying criteria for the candidate genes, to identify a set of DNA sequences from the plurality of DNA sequences which satisfy the query criteria. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A system for identifying candidate genes comprising:
-
a communication network;
a first computer coupled to the communication network; and
a second computer coupled to the communication network, the second computer configured to store;
results of a homology search for a plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
information about the biochemical functions and physiological roles of the plurality of DNA sequences;
information describing behavioral patterns of the plurality of DNA sequences; and
results from clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data;
wherein the first computer is configured to communicate a query to the second computer, the query identifying criteria for the candidate genes; and
wherein the first computer is configured to receive from the second computer, in response to the query, a first set of DNA sequences from the plurality of DNA sequences which satisfy the criteria for the candidate genes identified in the query.
-
-
26. A computer program product stored on a computer-readable storage medium for identifying candidate genes from a plurality of DNA sequences, the computer program product comprising:
-
code for obtaining results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
code for obtaining annotative information for the plurality of DNA sequences, the annotative information comprising information about the biochemical functions and physiological roles of the plurality of DNA sequences;
code for obtaining gene expression profile data for the plurality of DNA sequences, the gene expression profile data describing behavioral patterns of the plurality of DNA sequences;
code for clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data;
code for storing the results of the homology search, the annotative information, the gene expression profile data, and results from clustering the plurality of DNA sequences in a database;
code for receiving a query identifying criteria for the candidate genes;
code for searching the database, in response to the query, to identify a set of DNA sequences from the plurality of DNA sequences which satisfy the query criteria.
-
-
27. A computer program product stored on a computer-readable storage medium for identifying candidate genes, the computer program product comprising:
-
code for configuring a query identifying criteria for the candidate genes;
code for communicating the query to a server storing information related to a plurality of DNA sequences, the information comprising;
results of a homology search for the plurality of DNA sequences, the homology search results comprising information about homologs of the plurality of DNA sequences;
information about the biochemical functions and physiological roles of the plurality of DNA sequences;
information describing behavioral patterns of the plurality of DNA sequences; and
results from clustering the plurality of DNA sequences based on the behavioral patterns of the plurality of DNA sequences as described by the gene expression profile data; and
code for receiving from the server, in response to the query, a first set of DNA sequences from the plurality of DNA sequences, wherein the first set of DNA sequences satisfy the criteria for the candidate genes identified in the query.
-
Specification