Methods and systems for identifying disease-induced mutations
First Claim
Patent Images
1. A method of identifying cancer-induced genetic mutations, comprising using a processor coupled to a non-transitory computer-readable medium to perform:
- obtaining a first nucleic acid sequence corresponding to a nucleic acid in a non-cancerous sample from a subject;
identifying differences between the first nucleic acid sequence and a selected reference sequence;
representing, in the non-transitory computer-readable storage medium, the identified differences between the first nucleic acid sequence and the selected reference sequence as two or more alternative paths in a first reference directed acyclic graph (DAG) comprising nodes, wherein each alternative path is placed at a position in the first reference DAG where there is a difference between the first nucleic acid sequence and the reference sequence;
aligning one or more sequence reads from a second sequence corresponding to a cancerous sample from the subject to the first reference DAG, wherein the aligning considers two or more alternative paths by looking backward to any prior nodes on the first reference DAG to find a maximum score for the one or more sequence reads; and
identifying, based on the aligned one or more sequence reads to the first reference DAG, differences between the second sequence and the first reference DAG as new mutations correlated with the cancer.
10 Assignments
0 Petitions
Accused Products
Abstract
The invention includes methods and systems for identifying diseased-induced mutations by producing multi-dimensional reference sequence constructs that account for variations between individuals, different diseases, and different stages of those diseases. Once constructed, these reference sequence constructs can be used to align sequence reads corresponding to genetic samples from patients suspected of having a disease, or who have had the disease and are in suspected remission. The reference sequence constructs also provide insight to the genetic progression of the disease.
-
Citations
20 Claims
-
1. A method of identifying cancer-induced genetic mutations, comprising using a processor coupled to a non-transitory computer-readable medium to perform:
-
obtaining a first nucleic acid sequence corresponding to a nucleic acid in a non-cancerous sample from a subject; identifying differences between the first nucleic acid sequence and a selected reference sequence; representing, in the non-transitory computer-readable storage medium, the identified differences between the first nucleic acid sequence and the selected reference sequence as two or more alternative paths in a first reference directed acyclic graph (DAG) comprising nodes, wherein each alternative path is placed at a position in the first reference DAG where there is a difference between the first nucleic acid sequence and the reference sequence; aligning one or more sequence reads from a second sequence corresponding to a cancerous sample from the subject to the first reference DAG, wherein the aligning considers two or more alternative paths by looking backward to any prior nodes on the first reference DAG to find a maximum score for the one or more sequence reads; and identifying, based on the aligned one or more sequence reads to the first reference DAG, differences between the second sequence and the first reference DAG as new mutations correlated with the cancer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19, 20)
-
-
14. A method of identifying mutations due to an advanced stage of cancer in a subject, comprising:
-
obtaining a first sequence corresponding to a non-cancerous sample from the subject and a second sequence corresponding to a cancerous sample from the subject organism; identifying differences between the first sequence and the second sequence; representing the identified differences between the first sequence and the second sequence as two or more alternative paths in a reference directed acyclic graph (DAG) comprising nodes, wherein each alternative path is placed at a position in the reference DAG where there is a difference between the first sequence and the second sequence; aligning a sequence read corresponding to an advanced cancerous sample from the subject to the reference DAG, wherein the aligning considers the two or more alternative paths by looking backward to any prior nodes on the reference DAG to find a maximum score for the sequence read; and identifying, based on the aligned sequence read to the reference DAG, differences between the sequence read and the reference DAG as new mutations correlated with an advanced stage of the cancer. - View Dependent Claims (15, 16, 17)
-
Specification