Computerized method of identifying and locating resonating, self-hybridizing nucleic acid elements
First Claim
1. A method for identifying R-structures in a nucleic acid sequence, the method comprising:
- a. obtaining a nucleic acid sequence comprising a plurality of nucleotides;
b. selecting a minimum number of base pairs in an R-structure to be identified;
c. selecting a maximum loop size of an R-structure to be identified;
d. identifying a first potential stem sequence within the nucleic acid sequence;
e. identifying a second potential stem sequence within the nucleic acid sequence; and
f. comparing the nucleotides in the first and second potential stem sequences to determine whether the minimum number of base pairs exists within the first and second potential stem sequences.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a computerized method of identifying self-hybridizing sequences in nucleic acid strands. Once the sequences are identified, genetic information frequently residing in or near the sequences can be more easily identified. A computer program is used to automatically and rapidly conduct the steps of the method. Under the method, a practical minimum possible length of a stem sequence is first determined and entered into a program. A maximum loop size is then determined and entered. Subsequently, a mismatch factor is determined as well as whether to include G-T base pairs in total energy calculations. The calculations are then made by identifying a potential upstream stem sequence and iterating through possible downstream stem sequences. Once all possible downstream stem sequences have been compared to the upstream sequence, the upstream sequence is incremented by one base location, and once again all possible downstream sequences are compared. The total number of bonds is calculated using a look-up matrix for every possible combination of downstream and upstream stem sequences. If a sufficient number of possible base pairs exist, together creating sufficient energy for an R-structure to form, the potential R-structure is stored to a file. The R-structure is then analyzed to determine its maximum length. The total located R-structures are then printed out. The located R-structures may also be examined to locate all possible wing structure sequences within each R-structure using a similar iterative process.
17 Citations
15 Claims
-
1. A method for identifying R-structures in a nucleic acid sequence, the method comprising:
-
a. obtaining a nucleic acid sequence comprising a plurality of nucleotides;
b. selecting a minimum number of base pairs in an R-structure to be identified;
c. selecting a maximum loop size of an R-structure to be identified;
d. identifying a first potential stem sequence within the nucleic acid sequence;
e. identifying a second potential stem sequence within the nucleic acid sequence; and
f. comparing the nucleotides in the first and second potential stem sequences to determine whether the minimum number of base pairs exists within the first and second potential stem sequences.
-
-
2. A method as recited in claim 1, wherein at least the step of comparing the nucleotides in the first and second potential stem sequences is conducted with a computer.
-
3. A method as recited in claim 2, wherein at least the step of comparing the nucleotides in the first and second potential stem sequences is conducted with the use of computer code residing in a memory structure and operating upon a CPU of the computer.
-
4. A method as recited in claim 1, wherein the existence of potential base pairs in the first and second potential stem sequences is determined with the use of a matrix of potential bond values.
-
5. A method as recited in claim 1, further comprising incrementing the second potential stem sequence to omit a first nucleotide therein and to include an adjacent nucleotide in the nucleic acid sequence, and comparing the first potential stem sequence to the incremented second potential stem sequence.
-
6. A method as recited in claim 5, further comprising conducting the steps of incrementing the second potential stem sequence and comparing the first potential stem sequence to the incremented second potential stem sequence a number of times substantially equal to the number of nucleotides in the maximum loop size.
-
7. A method as recited in claim 6, further comprising, after conducting the steps of incrementing the second potential stem sequence and comparing the first potential stem sequence to the incremented second potential stem sequence a number of times substantially equal to the number of nucleotides in the maximum loop size, incrementing the first potential stem sequence, and repeating the process until substantially the entire contents of the nucleic acid sequence has been included in the comparison step.
-
8. A method as recited in claim 6, further comprising comparing the nucleotides in a located R-structure for potential wing structures within the R-structure.
-
9. A memory device comprising thereon computer instructions for operating within the CPU of a computer to conduct a process comprising:
-
a. obtaining a nucleic acid sequence;
b. defining a minimum number of base pairs in an R-structure to be identified;
c. establishing a maximum loop size of the R-structure to be identified;
d. identifying a first potential stem sequence within the nucleic acid sequence, said first potential stem sequence comprising a plurality of nucleotides;
e. identifying a second potential stem sequence within the nucleic acid sequence, said second potential stem sequence comprising a plurality of nucleotides;
f. comparing the nucleotides in the first and second potential stem sequences to determine a number of base pairs that can be formed between the first and second potential stem sequences; and
g. determining whether the number of base pairs that can be formed is greater than or equal to the minimum number of base pairs.
-
-
10. A memory device as recited in claim 9, wherein the number of base pairs that can be formed is determined with the use of a matrix of potential bond values.
-
11. A memory device as recited in claim 9, further comprising incrementing the second potential stem sequence to omit a first nucleotide therein and to include an adjacent nucleotide in the nucleic acid sequence, and comparing the first potential stem sequence to the incremented second potential stem sequence.
-
12. A memory device as recited in claim 11, further comprising incrementing the second potential stem sequence and comparing the first potential stem sequence to the incremented second potential stem sequence a number of times substantially equal to the number of nucleotides in the maximum loop size.
-
13. A memory device as recited in claim 12, further comprising, after conducting the steps of incrementing the second potential stem sequence and comparing the first potential stem sequence to the incremented second potential stem sequence a number of times substantially equal to the number of nucleotides in the maximum loop size, incrementing the first potential stem sequence, and repeating the process until substantially the entire contents of the nucleic acid sequence has been included in the comparison step.
-
14. A memory device as recited in claim 12, further comprising comparing the nucleotides in a located R-structure for potential wing structures within the R-structure.
-
15. A memory device comprising thereon computer program instructions for creating data structures and data operator modules comprising:
-
a. an interface module for interfacing with a human operator;
b. a data loading module for loading selected nucleic acid sequences;
c. a hairpin locator module for locating R-structures within a selected nucleic acid sequence, the hairpin locator module comprising therein;
i. a base pair identifier module, and ii. a potential bond counter to keep track of potential identified bonds;
d. a wing locator module for locating wings within R-structures; and
e. an R-structure organizer for organizing located R-structures.
-
Specification