Systems and methods for visualizing structural variation and phasing information
First Claim
1. A system for providing structural variation and phasing information over a network connection to a remote client computer, the system comprising one or more microprocessors, a persistent memory and a non-persistent memory, the persistent memory and the non-persistent memory collectively storing one or more nucleic acid sequencing datasets, whereineach respective nucleic acid sequencing dataset in the one or more nucleic acid sequencing datasets corresponds to at least one target nucleic acid in a respective sample in a plurality of samples, whereinthe respective sample is associated with a genome of at least one species,the respective nucleic acid sequencing dataset comprises (i) a header, (ii) a synopsis, and (iii) a data section,the data section comprises a plurality of sequencing reads,each respective sequencing read in the plurality of sequencing reads comprises a nucleic acid sequence comprising a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers,each respective identifier is independent of the sequence of the at least one target nucleic acid, andthe plurality of sequencing reads collectively include the plurality of identifiers, and whereinthe persistent memory and the non-persistent memory further collectively store one or more programs that use the one or more microprocessors to:
- provide a visualization tool for installation on the remote client computer;
obtain a request, received from the remote client computer from a user, over a network connection, for structural variation and phasing information using a first dataset in the one or more datasets; and
responsive to obtaining the request, automatically parse the request by;
(i) loading the header and the synopsis of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory while retaining the data section in persistent memory,(ii) comparing the request to the synopsis of the first dataset thereby identifying one or more portions of the data section of the first dataset,(iii) loading the one or more identified portions of the data section into non-persistent memory, wherein the loading loads less than the entirety of the data section,(iv) formatting structural variation and phasing information for display on the client computer using the first dataset, and(v) transmitting the formatted structural variation and phasing information over the network connection to the remote client computer for display on the remote client computer.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for providing structural variation or phasing information is provided. The system accesses a nucleic acid sequence dataset corresponding to a target nucleic acid in a sample. The dataset comprises a header, synopsis, and data section. The data section comprises a plurality of sequencing reads. Each sequencing read comprises a first portion corresponding to a subset of the target nucleic acid and a second portion that encodes an identifier for the sequencing read from a plurality of identifiers. One or more programs in the memory of the system use a microprocessor of the system to provide a haplotype visualization tool that receives a request for structural variation or phasing information from the dataset. The request is evaluated against the synopsis thereby identifying portions of the data section. Structural variation or phasing information is formatted for display in the haplotype visualization tool using the identified portions of the data section.
298 Citations
22 Claims
-
1. A system for providing structural variation and phasing information over a network connection to a remote client computer, the system comprising one or more microprocessors, a persistent memory and a non-persistent memory, the persistent memory and the non-persistent memory collectively storing one or more nucleic acid sequencing datasets, wherein
each respective nucleic acid sequencing dataset in the one or more nucleic acid sequencing datasets corresponds to at least one target nucleic acid in a respective sample in a plurality of samples, wherein the respective sample is associated with a genome of at least one species, the respective nucleic acid sequencing dataset comprises (i) a header, (ii) a synopsis, and (iii) a data section, the data section comprises a plurality of sequencing reads, each respective sequencing read in the plurality of sequencing reads comprises a nucleic acid sequence comprising a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers, each respective identifier is independent of the sequence of the at least one target nucleic acid, and the plurality of sequencing reads collectively include the plurality of identifiers, and wherein the persistent memory and the non-persistent memory further collectively store one or more programs that use the one or more microprocessors to: -
provide a visualization tool for installation on the remote client computer; obtain a request, received from the remote client computer from a user, over a network connection, for structural variation and phasing information using a first dataset in the one or more datasets; and responsive to obtaining the request, automatically parse the request by; (i) loading the header and the synopsis of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory while retaining the data section in persistent memory, (ii) comparing the request to the synopsis of the first dataset thereby identifying one or more portions of the data section of the first dataset, (iii) loading the one or more identified portions of the data section into non-persistent memory, wherein the loading loads less than the entirety of the data section, (iv) formatting structural variation and phasing information for display on the client computer using the first dataset, and (v) transmitting the formatted structural variation and phasing information over the network connection to the remote client computer for display on the remote client computer. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
2. A system for providing structural variation and phasing information, the system comprising one or more microprocessors, a persistent memory and a non-persistent memory, the persistent memory and the non-persistent memory collectively storing one or more nucleic acid sequencing datasets, wherein
each respective nucleic acid sequencing dataset in the one or more nucleic acid sequencing datasets corresponds to at least one target nucleic acid in a respective sample in a plurality of samples, wherein the respective sample is associated with a genome of at least one species, the respective nucleic acid sequencing dataset comprises (i) a header, (ii) a synopsis, and (iii) a data section, the data section comprises a plurality of sequencing reads, each respective sequencing read in the plurality of sequencing reads comprises a nucleic sequence comprising a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers, each respective identifier is independent of the sequence of the at least one target nucleic acid, and the plurality of sequencing reads collectively include the plurality of identifiers, and wherein the persistent memory and the non-persistent memory further collectively store one or more programs that use the one or more microprocessors to: -
provide a visualization tool; obtain a request from a user, through the visualization tool, for structural variation and phasing information using a first dataset in the one or more datasets, responsive to obtaining the request, automatically parse the request by; (i) loading the header and the synopsis of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory while retaining the data section in persistent memory, (ii) comparing the request for sequence information to the synopsis of the first dataset thereby identifying one or more portions of the data section of the first dataset, (iii) loading the one or more identified portions of the data section into non-persistent memory, wherein the loading loads less than the entirety of the data section, (iv) formatting structural variation and phasing information for display in the visualization tool using the first dataset, and (v) displaying the formatted structural variation and phasing information in the visualization tool.
-
-
3. A system for obtaining structural variation and phasing information over a network connection from a remote computer, wherein the system comprises one or more microprocessors, and a memory that stores one or more programs, wherein the one or more programs use the one or more microprocessors to execute a method comprising:
-
(A) invoking a visualization tool; (B) obtaining, through the visualization tool from a user, a request for structural variation and phasing information in a first nucleic acid sequencing dataset from among one or more nucleic acid sequencing datasets stored on the remote computer, wherein each respective nucleic acid sequencing dataset in the one or more nucleic acid sequencing datasets corresponds to at least one target nucleic acid in a respective sample in a plurality of samples, wherein the respective sample is associated with a genome of at least one species, the respective nucleic acid sequencing dataset comprises (i) a header, (ii) a synopsis, and (iii) a data section, the data section comprises a plurality of sequencing reads, each respective sequencing read in the plurality of sequencing reads comprises a nucleic acid sequence comprising a first portion that corresponds to a subset of at least one target nucleic acid in the respective sample and a second portion that encodes a respective identifier for the respective sequencing read in a plurality of identifiers, each respective identifier is independent of the sequence of the at least one target nucleic acid, and the plurality of sequencing reads collectively include the plurality of identifiers; (C) sending the request to the remote computer over the network connection, wherein the remote computer has persistent memory and non-persistent memory, thereby causing the remote computer to execute a method comprising; (i) loading the header and the synopsis of the first dataset into the non-persistent memory if not already loaded into the non-persistent memory of the remote computer while retaining the data section in persistent memory, (ii) comparing the request for sequence information to the synopsis of the first dataset thereby identifying one or more portions of the data section of the first dataset, (iii) loading the one or more identified portions of the data section into non-persistent memory, wherein the loading loads less than the entirety of the data section, and (iv) formatting structural variation and phasing information; and (D) receiving the formatted structural variation and phasing information over the network connection from the remote computer for display in the visualization tool.
-
Specification