Methods for representing sequence-dependent contextual information present in polymer sequences and uses thereof

US 20030101003A1
Filed: 06/21/2002
Published: 05/29/2003
Est. Priority Date: 06/21/2001
Status: Abandoned Application

First Claim

Patent Images

1. A method of representing a polymer sequence, the method comprising:

obtaining a position vector descriptor (PVD) for one or more positions in the polymer; and

replacing the monomer(s) with the corresponding PVD(s) in the representation of the polymer.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention includes methods of representing polymer sequences in a way that reveals important position-specific contextual information. The representations can be used to determine a number of properties of polymers, such as protein and nucleic acid sequences, including the identification of secondary domain structures, folding rate constants, and the effects of altering (e.g., mutating) monomers. In addition, the representations can be used to compare polymers and thereby identify important structural and functional characteristics of polymers.

Citations

28 Claims

1. A method of representing a polymer sequence, the method comprising:
- obtaining a position vector descriptor (PVD) for one or more positions in the polymer; and
  
  replacing the monomer(s) with the corresponding PVD(s) in the representation of the polymer.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein obtaining a PVD comprises:
    - calculating functional descriptors (FD_Ps) for each position in the polymer, wherein the FD_Ps are calculated with respect to a specific pre-selected monomer, P; and
      
      combining the calculated FD_Ps into a single vector having m elements, where m is equal to the number of different types of monomers in the polymer.
  - 3. The method of claim 2, wherein the FD_Ps are calculated using the formula:
    - FD_P=I*D*F, if the associated monomer is at a position other than P; and
      
      FD_P=I*F, if the associated monomer is at position P, wherein I is an impulse function, D is a distance function, and F is either a function describing a physical parameter of each monomer in the polymer or F=1.
  - 4. The method of claim 1, wherein the PVD(s) is/are simplified to include only a subset of elements.
  - 5. The method of claim 4, wherein the PVD(s) is/are simplified to include only a single element, the context leading monomer (CLM).
  - 6. The method of claim 1, wherein the polymer is a protein.

7. A method of predicting the effects of a change in the sequence of a protein, the method comprising:
- obtaining a mathematical relationship that predicts the effects of a change in the sequence of a protein, wherein the input variable for the mathematical relationship is the difference between the value of a PVD element corresponding to the changed monomer and the value of a PVD element corresponding to the original monomer, and wherein the two PVD elements are from the same PVD and the PVD represents the position at which the change is located in the protein;
  
  obtaining a PVD representing a position of interest in the protein; and
  
  using (i) the difference between elements of the PVD representing the position of interest in the protein and (ii) the mathematical relationship to calculate the predicted effects of a change in sequence of the protein.
- View Dependent Claims (8)
- - 8. The method of claim 7, wherein the effect being predicted is protein stability.

9. A method of predicting secondary structure boundaries in a protein sequence, the method comprising:
- obtaining PVDs for some or all amino acid position in the protein sequence;
  
  constructing a leading monomer distribution map (LMDM) for the protein; and
  
  dividing the LMDM into segments representing predicted units of secondary structure.

11. A method for identifying structural homologs of a protein, the method comprising:
- obtaining PVDs for some or all amino acid positions in the protein sequence;
  
  determining the effective primary sequence of the protein; and
  
  searching a protein database for sequences homologous to the effective primary sequence of the protein.
- View Dependent Claims (12)
- - 12. The method of claim 11, wherein the sequences present in the protein database are effective primary sequences.

13. A method of identifying positions of contextual similarity in a pair of polymers, the method comprising:
- a) obtaining a first set of PVDs describing one or more positions in the first polymer and a second set of PVDs describing one or more positions in the second polymer;
  
  b) calculating a difference matrix for the first set of PVDs with respect to the second set of PVDs;
  
  c) identifying the elements in the resulting difference matrix that are within a pre-selected range; and
  
  d) optionally, graphing the identified elements.

14. A method of identifying positions of contextual similarity in a polymer, the method comprising:
- a) obtaining a set of PVDs describing one or more positions in the polymer, wherein the set of PVDs has been simplified to include a reduced number of elements, X;
  
  b) performing pair-wise comparisons of each PVD (CL_XPVD) from the set of PVDs, wherein two PVDs that have a threshold number, t, of CLMs in common are identified as representing monomer positions that are contextually similar; and
  
  , c) optionally, generating a matrix (E-MAAP™
  
  ) representing the results of step (b).
- View Dependent Claims (15)
- - 15. The method of claim 14, further comprising the steps:
    - d) repeating steps (a), (b), and (c) using PVDs constructed for multiple impulse function widths, W; and
      
      e) summing the matrices resulting from step (d) to produce a global matrix (E-MAAP™
      
      ).

16. A method of identifying proteins that have similar structural folds, the method comprising:
- obtaining a first scaled E-MAAP™
  
  , wherein the E-MAAP™
  
  is scaled using amino acid cohesion energies;
  
  obtaing a second scaled E-MAAP™
  
  , wherein the E-MAAP™
  
  is scaled using amino acid cohesion energies, and wherein the polymer sequence of the second scaled E-MAAP™
  
  is different from the polymer sequence of the first scaled E-MAAP™
  
  ; and
  
  determining the similarity of the second scaled E-MAAP™
  
  with respect to the first scaled E-MAAP™
  
  .
- View Dependent Claims (10, 17)
- - 10. The method of claim 16, wherein a fixed number of context centers on the LMDM define each segment of secondary structure.
  - 17. The method of claim 16, comprising:
    - repeating the method with the same first scaled E-MAAP™
      
      but different second scaled E-MAAP™
      
      s from the database, and optionally, ranking the E-MAAP™
      
      s of the database with respect to their similarity to the first scaled E-MAAP™
      
      .

18. A method of estimating the folding rate of a protein, the method comprising:
- obtaining a scaled E-MAAPTM, wherein the E-MAAP™
  
  is scaled using the Richardson hydrophobicity scale;
  
  making a three-dimensional representation of the scaled E-MAAP™
  
  ;
  
  integrating the positive volume of the three-dimensional representation;
  
  and using the value resulting from the integration to estimate the folding rate of the protein.

19. A method of identifying positions of contextual similarity in a pair of polymers, the method comprising:
- a) obtaining a first set of PVDs describing one or more positions in the first polymer and a second set of PVDs describing one or more positions in the second polymer, wherein the PVDs of the first and second set of PVDs have been simplified to include a limited number of elements, X;
  
  b) performing pairwise comparisons of each PVD (CL_XPVD) from the first set of PVDs with each PVD (CL_XPVD) from the second set of PVDs, wherein two PVDs that have a threshold number, t, of CLMs in common are identified as representing monomer positions that are contextually similar; and
  
  , c) optionally, generating a matrix (E-MAAP™
  
  ) representing the results of step (b).
- View Dependent Claims (20, 21)
- - 20. The method of claim 19, further comprising the steps:
    - d) repeating steps (a), (b), and (c) using PVDs constructed for multiple impulse function widths, W; and
      
      e) summing the matrices resulting from step (d) to produce a global matrix (E-MAAP™
      
      ).
  - 21. A method of predicting an interaction between two polymers, the method comprising:
    - scaling the values of the matrix produced by the method of claim 20 using amino acid cohesion energies; and
      
      identifying positive peaks in the values of the matrix.

22. A method of representing a polymer sequence, the method comprising:
- obtaining a PVD representing a position in the polymer sequence; and
  
  using the elements of the PVD to construct a Context Functional Surface (CFS) for one or more positions in the polymer sequence.
- View Dependent Claims (23)
- - 23. The method of claim 22, wherein the set of CFSs corresponding to some or all of the monomer positions in the polymer are combined to generate a CFS having an additional dimension.

24. A method of characterizing secondary structure segments in a protein, the method comprising:
- a) obtaining a PVD representing a particular monomer position, R, in the protein;
  
  b) using the PVD of step a) to generate a CFS for some or all monomer positions in the polymer;
  
  c) plotting the positive values of the CFSs of step b) on a single graph to produce a G-profile; and
  
  d) analyzing the G-profile.

25. A method of characterizing the contextual similarity of different positions in a polymer, the method comprising:
- a) obtaining a PVD representing a particular monomer position, R, in the polymer;
  
  b) using the PVD to generate a set of CFSs for some or all positions in the polymer;
  
  c) calculating an correlation matrix, r_R, for the set of CFSs generated in step b);
  
  d) repeating steps a) through c) for some or all positions, R, in the polymer; and
  
  e) using the correlation matrices of step d) to generate a GCD for the polymer.

26. A method of identifying contextually unique positions in a polymer, the method comprising:
- obtaining a GCD for the polymer; and
  
  identifying elements in the GCD that are greater than or equal to a predetermined threshold value; and
  
  identifying correlated islands in the set of GCD elements identified as exceeding the threshold value.

27. A method of predicting the effects of mutations on the structure of a protein, the method comprising:
- a) obtaining a GCD for the protein;
  
  b) identifying a position P in the GCD;
  
  c) identifying a position R in the GCD;
  
  d) plotting the row vector of the GCD at position P and the column vector of the GCD at position R on the same graph; and
  
  e) identifying peaks in the graph, thereby identifying positions in the protein that are predicted to disrupt the structural stability of the protein when mutated.

28. The method of identifying positions in a nucleic acid sequence, the method comprising:
- a) obtaining a GCD for a protein encoded by the nucleic acid sequence;
  
  b) identifying a position P in the GCD;
  
  c) identifying a position R in the GCD;
  
  d) plotting the row vector of the GCD at position P and the column vector of the GCD at position R on the same graph; and
  
  e) identifying positions in the graph corresponding to positions in the protein that are predicted to influence the structural stability of the protein; and
  
  f) identifying regions of the nucleic acid sequence that encode the amino acids identified in step e), thereby identifying positions in the nucleic acid sequence that are likely to contain SNPs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Albert S. Benight, Anton J. Hopfinger, Peter V. Riccelli, Petr Pancoska
Original Assignee
Albert S. Benight, Anton J. Hopfinger, Peter V. Riccelli, Petr Pancoska
Inventors
Benight, Albert S., Pancoska, Petr, Riccelli, Peter V., Hopfinger, Anton J.

Application Number

US10/178,070
Publication Number

US 20030101003A1
Time in Patent Office

Days
Field of Search
US Class Current

702/22
CPC Class Codes

A61P 25/00   Drugs for disorders of the ...

G16B 15/00   ICT specially adapted for a...

G16B 15/10   Nucleic acid folding

G16B 15/20   Protein or domain folding

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

Methods for representing sequence-dependent contextual information present in polymer sequences and uses thereof

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Methods for representing sequence-dependent contextual information present in polymer sequences and uses thereof

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links