Transcript mapping method
First Claim
1. A transcript mapping method comprising the steps of:
- obtaining a 5′
terminal tag and a 3′
terminal tag from a transcript of a gene;
matching the 5′
terminal tag to at least a portion of a genome sequence to thereby identify at least one 5′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
matching the 3′
terminal tag to at least a portion of the genome sequence to thereby identify at least one 3′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
identifying at least one occurring segment, each of the at least one occurring segment being a sequence segment along the genome sequence extending from one of the at least one 5′
site to one of the at least one 3′
site, each of the at least one occurring segment having a sequence length; and
identifying at least one feasible gene location, each of the feasible gene location being one of the at least one occurring segment having a sequence length not exceeding that of a predefined gene length.
1 Assignment
0 Petitions
Accused Products
Abstract
A transcript mapping method according to an embodiment of the invention is described hereinafter and combines short tag based (SAGE and MPSS) efficiency with the accuracy of full-length cDNA (flcDNA) for comprehensive characterization of transcriptomes. This method is also referred to as Gene Identification Signature (GIS) analysis. In this method, the 5′ and 3′ ends of full-length cDNA clones are initially extracted into a ditag structure, with the ditag concatemers of the ditag being subsequently sequenced in an efficient manner, and finally mapped to the genome for defining the gene structure. As a GIS ditag represents the 5′ and 3′ ends of a transcript, it is more informative than SAGE and MPSS tags. Segment lengths between 5′ and 3′ tag pairs are obtainable including orientation, ordering and chromosome family for efficient transcript mapping and gene location identification. Furthermore, a compressed suffix array (CSA) is used for indexing the genome sequence for improve mapping speed and to reduce computational memory requirements.
7 Citations
39 Claims
-
1. A transcript mapping method comprising the steps of:
-
obtaining a 5′
terminal tag and a 3′
terminal tag from a transcript of a gene;
matching the 5′
terminal tag to at least a portion of a genome sequence to thereby identify at least one 5′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
matching the 3′
terminal tag to at least a portion of the genome sequence to thereby identify at least one 3′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
identifying at least one occurring segment, each of the at least one occurring segment being a sequence segment along the genome sequence extending from one of the at least one 5′
site to one of the at least one 3′
site, each of the at least one occurring segment having a sequence length; and
identifying at least one feasible gene location, each of the feasible gene location being one of the at least one occurring segment having a sequence length not exceeding that of a predefined gene length. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A transcript mapping system comprising:
-
means for obtaining a 5′
terminal tag and a 3′
terminal tag from a transcript of a gene;
means for matching the 5′
terminal tag to at least a portion of a genome sequence to thereby identify at least one 5′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
means for matching the 3′
terminal tag to at least a portion of the genome sequence to thereby identify at least one 3′
site therefrom, each of the at least one 3′
site having a sequence matching the 3′
terminal tag;
means for identifying at least one occurring segment, each of the at least one occurring segment being a sequence segment along the genome sequence extending from one of the at least one 5′
site to one of the at least one 3′
site, each of the at least one occurring segment having a sequence length; and
means for identifying at least one feasible gene location, each of the feasible gene location being one of the at least one occurring segment having a sequence length not exceeding that of a predefined gene length. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A transcript mapping method comprising the steps of:
-
obtaining a 5′
terminal tag and a 3′
terminal tag from a transcript of a gene;
matching the 5′
terminal tag to at least a portion of a genome sequence to thereby identify at least one 5′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
matching the 3′
terminal tag to at least a portion of the genome sequence to thereby identify at least one 3′
site therefrom, each of the at least one 5′
site having a sequence matching the 5′
terminal tag;
identifying at least one occurring segment, each of the at least one occurring segment being a sequence segment along the genome sequence extending from one of the at least one 5′
site to one of the at least one 3′
site, each of the at least one occurring segment having a sequence length; and
identifying at least one feasible gene location from the at least one occurring segment, each of the at least one feasible gene location being one of the at least one occurring segment with at least one of the sequence length thereof not exceeding that of the predefined gene length, the sequence order thereof and of the at least one 5′
site and one of the at least one 3′
site corresponding thereto in accordance with a 5′
-occurring segment-3′
structure matching the sequence order of the corresponding portion of the genome sequence, the 5′
site and one of the at least one 5′
site and one of the at least one 3′
site corresponding thereto having a 5′
-3′
orientation, and one of the at least one 5′
site and one of the at least one 3′
site corresponding to each of the occurring segment being located within the same chromosome.
-
Specification