Ditag genome scanning technology
First Claim
Patent Images
1. A system for collecting genetic information using DNA sequences comprising the steps of:
- 1) collecting two short tags from both ends of DNA fragments to form a ditag;
2) using the 454 sequencing system for maximal collection of ditags at the genome scale;
3) identifying the DNA fragments in the human genome sequences that originated the ditags and identify the DNA fragments that are different from those in the reference human genome;
4) confirming the mapping results by using the ditag sequences directly as the sense and antisense primers in a PCR expansion to detect the original DNA fragments; and
5) performing computational and experimental analysis of DGS results.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides for a method for analyzing large genomes using a process by where the genomic DNA is digested by a small base pair restriction enzyme. The fragments are then cloned and a unique ta-vector-tag is created. The tag-vector-tag fragments are purified and re-ligated to create a “ditag” library, which are then sequenced. In the final step, the sequenced ditags can be mapped back to the genome using software containing mapping algorithms and a unique ditag reference database to provide a method for scanning large portions of the genome in a reduced amount of time and cost.
10 Citations
4 Claims
-
1. A system for collecting genetic information using DNA sequences comprising the steps of:
-
1) collecting two short tags from both ends of DNA fragments to form a ditag; 2) using the 454 sequencing system for maximal collection of ditags at the genome scale; 3) identifying the DNA fragments in the human genome sequences that originated the ditags and identify the DNA fragments that are different from those in the reference human genome; 4) confirming the mapping results by using the ditag sequences directly as the sense and antisense primers in a PCR expansion to detect the original DNA fragments; and 5) performing computational and experimental analysis of DGS results.
-
-
2. A method for determining of the genome origin of a Ditag through the Ditagmap reference database comprising the steps of:
-
a. dividing the identified ditags into three groups, and classifying as mapped ditags, those ditags having been identified with reference ditags in a one to one correspondence, and with mismatches up to two bases, of which the p values are higher than the cutoff of 1.0e−
5;
classifying as trouble-mapped ditags, those identified ditags of which the combined p values of mapping two single tags in reference ditag database are higher than the cutoff of 1.0e−
3, or, any single tag mapping p value is larger than 1.0e−
3, which allows at most one mismatch with reference tags; and
classifying as unmapped ditags, those ditags having p values that are less than the cutoff 1.0e−
3 when their two single tags are mapped to reference ditag database;b. selecting a reference ditag having a 32-bp tag from the 5′
end and a 32-bp tag from the 3′
end of a virtual DNA fragment;c. searching the DitagMap reference database for the experimental ditag, by comparing experimental ditags having a sequence shorter than 32 bp with reference ditags of the same length, and counting total mismatches without allowing gaps, wherein ditags having a sequence longer than 31 bp are compared with reference ditags with extra bases, such that the 16-bp in both ends of the longer ditags are aligned with the ends of each reference ditag, then the extra bases between the two 16-bp are compared with the bases in the reference ditag, and those bases with matches are assigned to the corresponding single tag; d. identifying length of experimental ditag and the mismatches with each reference ditag; e. calculating the probability of an experimental ditag/tag (wob) mapping in the reference database is calculated by using the formula;
p-score= - View Dependent Claims (3)
-
-
4. A method for producing and collecting ditag sequence information comprising the following steps:
-
a) obtaining a genomic DNA sample; b) fragmenting the genomic DNA sample by restriction enzyme digestion; c) cloning the DNA fragments generated in step b) into plasmid vectors to generate a genomic DNA library; d) digesting the library using the restriction enzyme MmeI such that two short tags are retained on each site of the cloned DNA fragment in the same plasmid vector in a tag-vector-tag orientation; e) religating the tag-vector tag fragments to form a ditag; f) releasing the ditags formed in step e) from the vectors by digestion with a restriction enzyme; g) concatemerizing the individual ditags having a suitable length for sequencing; h) sequencing the concatemerized ditags using a 454 sequencing system; i) extracting the ditags from the sequences based on the identification of their restriction sites; j) mapping the ditags extracted from step i) to a reference ditag database where restriction fragments of known reference genome sequences are stored; k) determining whether the ditag has a counterpart in the reference ditag database and identifying those ditags which have counterpart sequences to mapped; and l) identifying the ditags which do not have a counterpart in the reference ditag database as trouble-mapped ditags.
-
Specification