System, method, device, and computer program product for extraction, gathering, manipulation, and analysis of peak data from an automated sequencer

US 8,239,142 B2
Filed: 07/09/2009
Issued: 08/07/2012
Est. Priority Date: 07/01/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for high throughput analysis of data sets generally described by sets of peaks, each set of peaks having been extracted from an electrophoregram profile j of a biological sample k which has been amplified for a particular sequence of nucleotide, and in each set of peaks, the i^thpeak is characterized by a nucleotide length L_i,j,kand an area A_i,j,k, the method comprising using bioinformatics tools comprising a computer to extract and smooth peak data sets according to parameter files and store them in data files, wherein smoothing comprises steps of:

for each peak of a set of peaks, calculating an Euclidian division using the integer 3 of L_i,j,kλ

_jwith the remainder being assigned to an element of {−

1 0 1} wherein λ

_jis a theoretical length of the amplified sequence of nucleotide, andif the mean of reminders is superior to a first predefined threshold, shifting all peaks of the set of peaks by −

1 nucleotide length, and if the mean of reminders is inferior to a second predefined threshold, shifting all peaks of the set of peaks +1 nucleotide length.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method, device, and computer program product to extract and gather peak information from an automated sequencer of bioinformatics into a peak database, and to manipulate and analyze the peak information within the database.

9 Citations

View as Search Results

24 Claims

1. A method for high throughput analysis of data sets generally described by sets of peaks, each set of peaks having been extracted from an electrophoregram profile j of a biological sample k which has been amplified for a particular sequence of nucleotide, and in each set of peaks, the i^thpeak is characterized by a nucleotide length L_i,j,kand an area A_i,j,k, the method comprising using bioinformatics tools comprising a computer to extract and smooth peak data sets according to parameter files and store them in data files, wherein smoothing comprises steps of:
- for each peak of a set of peaks, calculating an Euclidian division using the integer 3 of L_i,j,kλ
  
  _jwith the remainder being assigned to an element of {−
  
  1 0 1} wherein λ
  
  _jis a theoretical length of the amplified sequence of nucleotide, andif the mean of reminders is superior to a first predefined threshold, shifting all peaks of the set of peaks by −
  
  1 nucleotide length, and if the mean of reminders is inferior to a second predefined threshold, shifting all peaks of the set of peaks +1 nucleotide length.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method according to claim 1 comprising a step of creating particular profiles representing peaks to be analyzed.
  - 3. The method according to claim 1, comprising a step of building a peak database.
  - 4. The method according to claim 1 comprising a step of building peak database by statistical tools.
  - 5. The method according to claim 1 comprising a step of using analysis of peak database to determine prognostic or diagnostic criteria.
  - 6. The method according to claim 1 comprising a step of using the prognostic and diagnostic criteria in the field of physiopathology such as immunotherapy, cancer treatment, HIV, infectious disease, autoimmune disease.
  - 7. The method according to claim 1 that is a high throughput method for analysis of immune repertoires.
  - 8. The method according to claim 7 comprising steps of starting with biological samples, which contains DNA or RNA fragments purifying DNA or RNA fragments.
  - 9. The method according to claim 8, further comprising the steps of:
    - providing purified DNA or synthesizing cDNA from purified RNA,amplifying purified DNA or cDNA by a PCR or SDA method by using oligonucleotides specific for antigen specific receptor genes, e.g., Immunoglobulin and T-cell receptor, variable (V), Junctional (J) and Constant (c) regions,labeling the amplified DNA for detection e.g. by performing a runoff extension step with J or C specific oligonucleotide labeled with a fluorescent drug,electrophoretically separating the labeled amplified DNA using an automatic sequencer for each electrophoregram, andidentifying peaks that characterize the separated labeled and amplified by determining their nucleotide length and area that correspond to labeled amplified DNA.
  - 10. The method according to claim 8, comprising reading the labeled amplified DNA to analyze it.
  - 11. The method according to claim 1, wherein the first predefined threshold is 0.5.
  - 12. The method according to claim 1, wherein the second predefined threshold is −
    - 0.5.
  - 13. The method according to claim 1, wherein smoothing comprises a step of removing background noise and peaks inferior to a defined cut-off from the set of peaks.
  - 14. The method according to claim 1, wherein smoothing comprises a step of summing peaks having an identical nucleotide length.
  - 15. The method according to claim 1, wherein smoothing comprises a step of:
    - detecting in the set of peaks adjacent peaks for which L_i,j,k=1,determining for adjacent peaks whether L_i,j,kλ
      
      _jor L_i+1,j,kλ
      
      _jis a multiple of 3,shifting the i+1^thpeak −
      
      1 nucleotide length when L_i,j,kλ
      
      _jis a multiple of 3;
      
      or shifting the i^thpeak by +1 nucleotide length, when L_i+1,j,kλ
      
      _jis a multiple of 3.
  - 16. The method according to claim 15, wherein smoothing comprises a step of summing adjacent peaks.

17. A method for high throughput analysis of data sets generally described by sets of peaks, each set of peaks having been extracted from an electrophoregram profile j of a DNA sample k which has been amplified for a particular sequence of nucleotide, and in each set of peaks, the i^thpeak is characterized by a nucleotide length L_i,j,kand an area A_i,j,k,wherein said method comprises using bioinformatics tools comprising a computer to extract and smooth peak data sets according to parameter files and store them in data files, wherein extracting comprises the steps of:
- for a plurality of data files (PICTfiles), each data file storing one set of peaks, generating an associated parameter file (CGEL parameter file) storing, for each data file, an order parameter (mNewOrder),reading successively the data files following the order parameters (mNewOrder) stored in the parameter file (CGEL parameter file),for each data file being read,extracting the nucleotide length L_i,j,kand area A_i,j,kof the peaks of the set of peaks stored in the data file, andgenerating a raw data file (data
  
  0) gathering all sets of peaks ordered according to the order parameters (mNewOrder).
- View Dependent Claims (18, 19, 20, 21)
- - 18. A method according to claim 17, wherein the parameter file (CGEL parameter file) also stores, for each data file, a consideration parameter (misConsidered) indicating whether the parameter file must be read or not.
  - 19. A method according to claim 17, wherein the parameter file (CGEL parameter file) also stores for each data file, a description parameter (mDescription) which is a string of characters that depicts the Particular sequence of nucleotide which has been amplified.
  - 20. A method according to claim 17, wherein the parameter file (CGEL parameter file) also stores, for each data file, a length parameter (mLength) which is a value of a theoretical length of the amplified sequence of nucleotide.
  - 21. A method according to claim 17, comprising a step of—
    - displaying extracted and smoothed peak data sets.

22. A method for high throughput analysis of data sets characterized by sets of peaks extracted from an electrophoregram profile comprising:
- extracting and smoothing peak data sets using a computer according to parameter files and storing them in data files, wherein smoothing comprises;
  
  for each peak of a set of peaks, calculating an Euclidian division using the integer 3 of L_i,j,kλ
  
  _jwith the remainder being assigned to an element of {−
  
  1 0 1} wherein λ
  
  _jis a theoretical length of the amplified sequence of nucleotide, andif the mean of reminders is superior to a first predefined threshold, shifting all peaks of the set of peaks by −
  
  1 nucleotide length, and if the mean of reminders is inferior to a second predefined threshold, shifting all peaks of the set of peaks +1 nucleotide length;
  
  wherein said data sets characterized by sets of peaks extracted from an electrophoregram profile j of a biological sample k which has been amplified for a particular sequence of nucleotide, and in each set of peaks, the i^thpeak is characterized by a nucleotide length L_i,j,kand an area A_i,j,k.
- View Dependent Claims (23, 24)
- - 23. The method of claim 22, further comprising extracting peak data sets from an electrophoregram.
  - 24. The method of claim 23, further comprising amplifying a polynucleotide in biological sample k and producing an electropherogram j from said amplified polynucleotide.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University Pierre et Marie Curie
Original Assignee
Centre National De La Recherche Scientifique, University Pierre et Marie Curie
Inventors
Collette, Alexis, Six, Adrien, Pied, Sylviane Bernadette, Cazenave, Pierre-Andre
Primary Examiner(s)
NEGIN, RUSSELL SCOTT

Application Number

US12/500,497
Publication Number

US 20100130371A1
Time in Patent Office

1,125 Days
Field of Search

None
US Class Current

702/20
CPC Class Codes

G16B 30/00   ICT specially adapted for s...

G16B 30/20   Sequence assembly

G16B 50/00   ICT programming tools or da...

System, method, device, and computer program product for extraction, gathering, manipulation, and analysis of peak data from an automated sequencer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

System, method, device, and computer program product for extraction, gathering, manipulation, and analysis of peak data from an automated sequencer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links