Methods of populating data structures for use in evolutionary simulations

US 6,961,664 B2
Filed: 02/01/2000
Issued: 11/01/2005
Est. Priority Date: 01/19/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:

i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;

ii) selecting at least two substrings from said initial character strings;

iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;

iv) adding the product strings to a data structure to populate a data structure of product strings;

v) determining sequence identities of at least one of the product strings relative to at least one initial character string; and

vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In particular, this invention provides novel methods of populating data structures for use in evolutionary modeling. In particular, this invention provides methods of populating a data structure with a plurality of character strings. The methods involve encoding two or more a biological molecules into character strings to provide a collection of two or more different initial character strings; selecting at least two substrings from the pool of character strings; concatenating the substrings to form one or more product strings about the same length as one or more of the initial character strings; adding the product strings to a collection of strings; and optionally repeating this process using one or more of the product strings as an initial string in the collection of initial character strings.

197 Citations

157 Claims

1. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
  
  ii) selecting at least two substrings from said initial character strings;
  
  iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adding the product strings to a data structure to populate a data structure of product strings;
  
  v) determining sequence identities of at least one of the product strings relative to at least one initial character string; and
  
  vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 2. The method of claim 1, wherein said encoding comprises encoding two or more nucleic acid sequences into said character strings.
  - 3. The method of claim 2, wherein said two or more nucleic acid sequences comprise a nucleic acid sequence encoding a naturally occurring protein.
  - 4. The method of claim 1, wherein said encoding comprises encoding two or more amino acid sequences into said character strings.
  - 5. The method of claim 4, wherein said two or more amino acid sequences comprise an amino acid sequence encoding a naturally occurring protein.
  - 6. The method of claim 1, wherein said initial character strings have at least 30% sequence identity with each other.
  - 7. The method of claim 1, wherein said selecting in (ii) comprises selecting at least one substring from an initial character string such that the ends of said substring occur in string regions of about 3 to about 20 characters in the initial character string that have higher sequence identity with the corresponding region of another of said initial character strings than the overall sequence identity between the two initial character strings.
  - 8. The method of claim 1, wherein said selecting in (ii) comprises selecting substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 9. The method of claim 1, wherein said selecting in (ii) comprises aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 10. The method of claim 1, wherein said method further comprises randomly altering one or more characters of said initial or product character strings.
  - 11. The method of claim 10, wherein said method further comprises randomly selecting and altering one or more occurrences of a particular preselected character in said initial or product character strings.
  - 12. The method of claim 1, wherein said encoding, selecting, or concatenating is performed on an internet site.
  - 13. The method of claim 1, wherein said encoding, selecting, or concatenating is performed on a server.
  - 14. The method of claim 1, wherein said encoding, selecting, or concatenating is performed on a client linked to a network.
  - 15. The method of claim 1, wherein the initial character strings of (i) are related in that they encode the same gene or protein family but differ in sequence.
  - 16. The method of claim 1, further comprising determining a computationally predicted property for molecules represented by the product strings.
  - 17. The method of claim 1, wherein the molecules represented by the product strings are made in parallel in an array of vessels.
  - 18. The method of claim 1, wherein the molecules represented by the product strings are made by assembly of oligonucleotides.
  - 19. The method of claim 1, wherein the one or more product strings of (vi) have greater than 50% sequence identity with the at least one initial character string.
  - 20. The method of claim 1, wherein the one or more product strings of (vi) have greater than 75% sequence identity with the at least one initial character string.
  - 21. The method of claim 1, wherein the one or more product strings of (vi) have greater than 85% sequence identity with the at least one initial character string.
  - 22. The method of claim 1, wherein the one or more product strings of (vi) have greater than 90% sequence identity with the at least one initial character string.
  - 23. The method of claim 1, wherein the one or more product strings of (vi) have greater than 95% sequence identity with the at least one initial character string.
  - 24. The method of claim 1, wherein adding the product strings to a data structure comprises adding more than one product string to the data structure.
  - 25. The method of claim 1, wherein selecting at least two substrings from said initial character strings comprises random substring selection.
  - 26. The method of claim 1, wherein selecting at least two substrings from said initial character strings comprises uniform substring selection.
  - 27. The method of claim 1, wherein selecting at least two substrings from said initial character strings comprises motif-based selection.
  - 28. The method of claim 1, wherein selecting at least two substrings from said initial character strings comprises alignment-based selection.
  - 29. The method of claim 1, wherein selecting at least two substrings from said initial character strings comprises frequency-biased selection.

30. A computer program product on a computer readable media comprising computer code that:
- i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
  
  ii) selects at least two substrings from said initial character strings;
  
  iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adds the product strings to a data structure to populate a data structure of product strings;
  
  v) determines sequence identities of at least one of the product strings relative to at least one initial character string; and
  
  vi) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
- - 31. The computer program product of claim 30, wherein said two or more biological molecules are nucleic acid sequences.
  - 32. The computer program product of claim 30, wherein said two or more biological molecules are nucleic acid sequences encoding naturally occurring proteins.
  - 33. The computer program product of claim 30, wherein said two or more biological molecules are amino acid sequences.
  - 34. The computer program product of claim 30, wherein said initial character strings have at least 30% sequence identity with each other.
  - 35. The computer program product of claim 30, wherein said computer code selects in (ii) at least one substring from an initial character string such that the ends of said substring occur in string regions of about three to about twenty characters in the initial character string that have higher sequence identity with a corresponding region of another of said initial character strings than the overall sequence identity between the two initial character substrings.
  - 36. The computer program product of claim 30, wherein said computer code selects substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 37. The computer program product of claim 30, wherein the computer code selects substrings by aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the character strings, and selecting a character that is a member of an aligned pair for the end of one substring.
  - 38. The computer program product of claim 30, wherein said computer code additionally randomly alters one or more characters of said initial or product character strings.
  - 39. The computer program product of claim 38, wherein said computer code additionally randomly selects and alters one or more occurrences of a particular preselected character in said initial or product character strings.
  - 40. The computer program product of claim 30, wherein said computer code is stored on media selected from the group consisting of magnetic media, optical media, and optomagnetic media.
  - 41. The computer program product of claim 30, wherein said computer code is in dynamic or static memory of a computer.
  - 42. The computer program product of claim 30, wherein the initial character strings of (i) are related in that they encode the same gene or protein family but differ in sequence.
  - 43. The computer program product of claim 30, wherein the code instructs physical screening of the molecule(s) represented by the product strings for one or more desired properties.
  - 44. The computer program product of claim 30, wherein the code instructs determination of a computationally predicted property for molecules represented by the product strings.
  - 45. The computer program product of claim 30, wherein the code tests members of the data structure of product strings for a particular property and determines sequence differences responsible for differences in the particular property using multi-variate analysis.
  - 46. The computer program product of claim 30, wherein the one or more product strings of (vi) having greater than 50% sequence identity with the at least one initial character string.
  - 47. The computer program product of claim 30, wherein the one or more product strings of (vi) having greater than 75% sequence identity with the at least one initial character string.
  - 48. The computer program product of claim 30, wherein the one or more product strings of(vi) having greater than 95% sequence identity with the at least one initial character string.
  - 49. The computer program product of claim 30, wherein the computer code adds the product strings to a data structure by adding more than one product string to the data structure.
  - 50. The computer program product of claim 30, wherein the computer code selects at least two substrings from said initial character strings by a random substring selection.
  - 51. The computer program product of claim 30, wherein the computer code selects at least two substrings from said initial character strings by a uniform substring selection.
  - 52. The computer program product of claim 30, wherein the computer code selects at least two substrings from said initial character strings by a motif-based selection.
  - 53. The computer program product of claim 30, wherein the computer code selects at least two substrings from said initial character strings by an alignment-based selection.
  - 54. The computer program product of claim 30, wherein the computer code selects at least two substrings from said initial character strings by a frequency-biased selection.

55. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more related biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
  
  ii) selecting at least two substrings from said initial character strings;
  
  iii) concatenating said substrings to form one or more product strings;
  
  iv) adding the product strings to a data structure to populate a data structure of product strings; and
  
  v) determining whether at least one of the product strings have at least a predefined measure of similarity with at least one initial character string; and
  
  vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings determined to have greater than the predefined value of sequence identity with at least one initial string.

56. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
  
  ii) selecting at least two substrings from said initial character strings;
  
  iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adding the product strings to a data structure to populate a data structure of product strings;
  
  v) providing an alignment of at least one of the product strings relative to at least one initial character string; and
  
  vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73)
- - 57. The method of claim 56, wherein said encoding comprises encoding two or more amino acid sequences into said character strings, and wherein said two or more amino acid sequences comprise an amino acid sequence encoding a naturally occurring protein.
  - 58. The method of claim 56, wherein said initial character strings have at least 30% sequence identity with each other.
  - 59. The method of claim 56, wherein said selecting in (ii) comprises selecting at least one substring from an initial character string such that the ends of said substring occur in string regions of about 3 to about 20 characters in the initial character string that have higher sequence identity with the corresponding region of another of said initial character strings than the overall sequence identity between the two initial character strings.
  - 60. The method of claim 56, wherein said selecting in (ii) comprises selecting substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 61. The method of claim 56, wherein said selecting in (ii) comprises aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 62. The method of claim 56, wherein said method further comprises randomly altering one or more characters of said initial or product character strings.
  - 63. The method of claim 56, wherein the one or more product strings of (vi) having greater than 50% sequence identity with the at least one initial character string.
  - 64. The method of claim 56, wherein the one or more product strings of (vi) having greater than 75% sequence identity with the at least one initial character string.
  - 65. The method of claim 56, wherein the one or more product strings of (vi) having greater than 85% sequence identity with the at least one initial character string.
  - 66. The method of claim 56, wherein the one or more product strings of (vi) having greater than 90% sequence identity with the at least one initial character string.
  - 67. The method of claim 56, wherein the one or more product strings of(vi) having greater than 95% sequence identity with the at least one initial character string.
  - 68. The method of claim 56, wherein adding the product strings to a data structure comprises adding more than one product string to the data structure.
  - 69. The method of claim 56, wherein selecting at least two substrings from said initial character strings comprises random substring selection.
  - 70. The method of claim 56, wherein selecting at least two substrings from said initial character strings comprises uniform substring selection.
  - 71. The method of claim 56, wherein selecting at least two substrings from said initial character strings comprises motif-based selection.
  - 72. The method of claim 56, wherein selecting at least two substrings from said initial character strings comprises alignment-based selection.
  - 73. The method of claim 56, wherein selecting at least two substrings from said initial character strings comprises frequency-biased selection.

74. A computer program product on a computer readable media comprising computer code that:
- i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
  
  ii) selects at least two substrings from said initial character strings;
  
  iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adds the product strings to a data structure to populate a data structure of product strings;
  
  v) provides an alignment of at least one of the product strings relative to at least one initial character string; and
  
  vi) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
- - 75. The computer program product of claim 74, wherein said computer code encodes two or more amino acid sequences into said character strings, and wherein said two or more amino acid sequences comprise an amino acid sequence encoding a naturally occurring protein.
  - 76. The computer program product of claim 74, wherein said initial character strings have at least 30% sequence identity with each other.
  - 77. The computer program product of claim 74, wherein said computer code selects in (ii) at least one substring from an initial character string such that the ends of said substring occur in string regions of about three to about twenty characters in the initial character string that have higher sequence identity with a corresponding region of another of said initial character strings than the overall sequence identity between the two initial character substrings.
  - 78. The computer program product of claim 74, wherein said computer code selects in (ii) by selecting substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 79. The computer program product of claim 74, wherein said computer code selects in (ii) by aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 80. The computer program product of claim 74, wherein said computer code further randomly alters one or more characters of said initial or product character strings.
  - 81. The computer program product of claim 74, wherein the one or more product strings of (vi) having greater than 50% sequence identity with the at least one initial character string.
  - 82. The computer program product of claim 74, wherein the one or more product strings of (vi) having greater than 75% sequence identity with the at least one initial character string.
  - 83. The computer program product of claim 74, wherein the one or more product strings of (vi) having greater than 85% sequence identity with the at least one initial character string.
  - 84. The computer program product of claim 74, wherein the one or more product strings of (vi) having greater than 90% sequence identity with the at least one initial character string.
  - 85. The computer program product of claim 74, wherein the one or more product strings of (vi) having greater than 95% sequence identity with the at least one initial character string.
  - 86. The computer program product of claim 74, wherein the computer code adds the product strings to a data structure by adding more than one product string to the data structure.
  - 87. The computer program product of claim 74, wherein the computer code selects at least two substrings from said initial character strings by a random substring selection.
  - 88. The computer program product of claim 74, wherein the computer code selects at least two substrings from said initial character strings by a uniform substring selection.
  - 89. The computer program product of claim 74, wherein the computer code selects at least two substrings from said initial character strings by a motif-based selection.
  - 90. The computer program product of claim 74, wherein the computer code selects at least two substrings from said initial character strings by an alignment-based selection.
  - 91. The computer program product of claim 74, wherein the computer code selects at least two substrings from said initial character strings by a frequency-biased selection.

92. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more naturally occurring biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
  
  ii) selecting at least two substrings from said initial character strings;
  
  iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adding the product strings to a data structure to populate a data structure of product strings; and
  
  v) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110)
- - 93. The method of claim 92, wherein said encoding comprises encoding two or more nucleic acid sequences into said character strings.
  - 94. The method of claim 92, wherein said encoding comprises encoding two or more amino acid sequences into said character strings, and wherein said two or more amino acid sequences comprise an amino acid sequence encoding a naturally occurring protein.
  - 95. The method of claim 92, wherein said initial character strings have at least 30% sequence identity with each other.
  - 96. The method of claim 92, wherein said selecting in (ii) comprises selecting at least one substring from an initial character string such that the ends of said substring occur in string regions of about 3 to about 20 characters in the initial character string that have higher sequence identity with the corresponding region of another of said initial character strings than the overall sequence identity between the two initial character strings.
  - 97. The method of claim 92, wherein said selecting in (ii) comprises selecting substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 98. The method of claim 92, wherein said selecting in (ii) comprises aligning two or more of said initial character strings to maximize pairwise identity between two or more sub strings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 99. The method of claim 92, wherein said method further comprises randomly altering one or more characters of said initial or product character strings.
  - 100. The method of claim 92, wherein the one or more product strings of (v) having greater than 50% sequence identity with the at least one initial character string.
  - 101. The method of claim 92, wherein the one or more product strings of (v) having greater than 75% sequence identity with the at least one initial character string.
  - 102. The method of claim 92, wherein the one or more product strings of (v) having greater than 85% sequence identity with the at least one initial character string.
  - 103. The method of claim 92, wherein the one or more product strings of (v) having greater than 90% sequence identity with the at least one initial character string.
  - 104. The method of claim 92, wherein the one or more product strings of (v) having greater than 95% sequence identity with the at least one initial character string.
  - 105. The method of claim 92, wherein adding the product strings to a data structure comprises adding more than one product string to the data structure.
  - 106. The method of claim 92, wherein selecting at least two substrings from said initial character strings comprises random substring selection.
  - 107. The method of claim 92, wherein selecting at least two substrings from said initial character strings comprises uniform substring selection.
  - 108. The method of claim 92, wherein selecting at least two substrings from said initial character strings comprises motif-based selection.
  - 109. The method of claim 92, wherein selecting at least two substrings from said initial character strings comprises alignment-based selection.
  - 110. The method of claim 92, wherein selecting at least two substrings from said initial character strings comprises frequency-biased selection.

111. A computer program product on a computer readable media comprising computer code that:
- i) encodes two or more naturally occurring biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
  
  ii) selects at least two substrings from said initial character strings;
  
  iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adds the product strings to a data structure to populate a data structure of product strings; and
  
  v) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
- View Dependent Claims (112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129)
- - 112. The computer program product of claim 111, wherein said computer code encodes by encoding two or more nucleic acid sequences into said character strings.
  - 113. The computer program product of claim 111, wherein said computer code encodes two or more amino acid sequences into said character strings, and wherein said two or more amino acid sequences comprise an amino acid sequence encoding a naturally occurring protein.
  - 114. The computer program product of claim 111, wherein said initial character strings have at least 30% sequence identity with each other.
  - 115. The computer program product of claim 111, wherein said computer code selects in (ii) at least one substring from an initial character string such that the ends of said substring occur in string regions of about three to about twenty characters in the initial character string that have higher sequence identity with a corresponding region of another of said initial character strings than the overall sequence identity between the two initial character substrings.
  - 116. The computer program product of claim 111, wherein said computer code selects in (ii) by selecting substrings such that the ends of said substrings occur in predefined motifs of about 4 to about 8 characters.
  - 117. The computer program product of claim 111, wherein said computer code selects in (ii) by aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 118. The computer program product of claim 111, wherein said computer code further randomly alters one or more characters of said initial or product character strings.
  - 119. The computer program product of claim 111, wherein the one or more product strings of (v) having greater than 50% sequence identity with the at least one initial character string.
  - 120. The computer program product of claim 111, wherein the one or more product strings of (v) having greater than 75% sequence identity with the at least one initial character string.
  - 121. The computer program product of claim 111, wherein the one or more product strings of (v) having greater than 85% sequence identity with the at least one initial character string.
  - 122. The computer program product of claim 111, wherein the one or more product strings of (v) having greater than 90% sequence identity with the at least one initial character string.
  - 123. The computer program product of claim 111, wherein the one or more product strings of (v) having greater than 95% sequence identity with the at least one initial character string.
  - 124. The computer program product of claim 111, wherein the computer code adds the product strings to a data structure by adding more than one product string to the data structure.
  - 125. The computer program product of claim 111, wherein the computer code selects at least two substrings from said initial character strings by a random substring selection.
  - 126. The computer program product of claim 111, wherein the computer code selects at least two substrings from said initial character strings by a uniform substring selection.
  - 127. The computer program product of claim 111, wherein the computer code selects at least two substrings from said initial character strings by a motif-based selection.
  - 128. The computer program product of claim 111, wherein the computer code selects at least two substrings from said initial character strings by an alignment-based selection.
  - 129. The computer program product of claim 111, wherein the computer code selects at least two substrings from said initial character strings by a frequency-biased selection.

130. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
  
  ii) selecting at least two substrings from said initial character strings;
  
  iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adding the product strings to a data structure to populate a data structure of product strings;
  
  v) obtaining one or more computationally predicted properties for at least one of the product strings in the data structure; and
  
  vi) selecting one or more product biological molecules for production on the basis of the one or more computationally predicted properties.
- View Dependent Claims (131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143)
- - 131. The method of claim 130, wherein the computationally predicted properties comprise one or more of a maximum or minimum molecular weight, a maximum or minimum free energy, a maximum or minimum contact surface with a target molecule or surface, a specified net charge, a predicted pK, a predicted pI, a binding avidity, secondary form, and tertiary form.
  - 132. The method of claim 130, wherein said encoding comprises encoding two or more amino acid sequences into said character strings.
  - 133. The method of claim 130, wherein said selecting in (ii) comprises aligning two or more of said initial character strings to maximize pairwise identity between two or more sub strings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 134. The method of claim 130, wherein said method further comprises randomly altering one or more characters of said initial or product character strings.
  - 135. The method of claim 130, wherein the one or more product biological molecules of (vi) having greater than 50% sequence identity with the at least one initial character string.
  - 136. The method of claim 130, wherein the one or more product biological molecules of (vi) having greater than 75% sequence identity with the at least one initial character string.
  - 137. The method of claim 130, wherein the one or more product biological molecules of (vi) having greater than 90% sequence identity with the at least one initial character string.
  - 138. The method of claim 130, wherein adding the product strings to a data structure comprises adding more than one product string to the data structure.
  - 139. The method of claim 130, wherein selecting at least two substrings from said initial character strings comprises random substring selection.
  - 140. The method of claim 130, wherein selecting at least two substrings from said initial character strings comprises uniform substring selection.
  - 141. The method of claim 130, wherein selecting at least two substrings from said initial character strings comprises motif-based selection.
  - 142. The method of claim 130, wherein selecting at least two substrings from said initial character strings comprises alignment-based selection.
  - 143. The method of claim 130, wherein selecting at least two substrings from said initial character strings comprises frequency-biased selection.

144. A computer program product on a computer readable media comprising computer code that:
- i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
  
  ii) selects at least two substrings from said initial character strings;
  
  iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
  
  iv) adds the product strings to a data structure to populate a data structure of product strings;
  
  v) obtains one or more computationally predicted properties for at least one of the product strings in the data structure; and
  
  vi) selects one or more product biological molecules for production on the basis of the one or more computationally predicted properties.
- View Dependent Claims (145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157)
- - 145. The computer program product of claim 144, wherein the computationally predicted properties comprise one or more of a maximum or minimum molecular weight, a maximum or minimum free energy, a maximum or minimum contact surface with a target molecule or surface, a specified net charge, a predicted pK, a predicted pI, a binding avidity, secondary form, and tertiary form.
  - 146. The computer program product of claim 144, wherein the computer code encodes in (i) by encoding two or more amino acid sequences into said character strings.
  - 147. The computer program product of claim 144, wherein the computer code selects in (ii) by aligning two or more of said initial character strings to maximize pairwise identity between two or more substrings of the initial character strings, and selecting a character that is a member of an aligned pair for the end of one of the two or more substrings.
  - 148. The computer program product of claim 144, wherein the computer code further randomly alters one or more characters of said initial or product character strings.
  - 149. The computer program product of claim 144, wherein the one or more product biological molecules of (vi) having greater than 50% sequence identity with the at least one initial character string.
  - 150. The computer program product of claim 144, wherein the one or more product biological molecules of (vi) having greater than 75% sequence identity with the at least one initial character string.
  - 151. The computer program product of claim 144, wherein the one or more product biological molecules of (vi) having greater than 90% sequence identity with the at least one initial character string.
  - 152. The computer program product of claim 144, wherein the computer code adds the product strings to a data structure by adding more than one product string to the data structure.
  - 153. The computer program product of claim 144, wherein the computer code selects at least two substrings from said initial character strings by a random substring selection.
  - 154. The computer program product of claim 144, wherein the computer code selects at least two substrings from said initial character strings by a uniform substring selection.
  - 155. The computer program product of claim 144, wherein the computer code selects at least two substrings from said initial character strings by a motif-based selection.
  - 156. The computer program product of claim 144, wherein the computer code selects at least two substrings from said initial character strings by an alignment-based selection.
  - 157. The computer program product of claim 144, wherein the computer code selects at least two substrings from said initial character strings by a frequency-biased selection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Codexis, Inc.
Original Assignee
Maxygen Inc.
Inventors
Selifonov, Sergey A., Stemmer, Willem P. C.
Primary Examiner(s)
Horlick, Kenneth R.
Assistant Examiner(s)
Kim, Young J.

Application Number

US09/495,668
Publication Number

US 20030032010A1
Time in Patent Office

2,100 Days
Field of Search

435/6, 536/23.1, 702/19, 702/27, 706/47, 712/200
US Class Current

702/19
CPC Class Codes

A61K 39/00   Medicinal preparations cont...

C07K 14/005   from viruses

C07K 14/505   Erythropoietin [EPO]

C07K 14/535   Granulocyte CSF; Granulocyt...

C12N 15/1027   by DNA shuffling, e.g. RSR,...

C12N 15/11   DNA or RNA fragments; Modif...

C12N 2740/16122   New viral proteins or indiv...

C12N 9/16   acting on ester bonds (3.1)

Methods of populating data structures for use in evolutionary simulations

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

197 Citations

157 Claims

Specification

Solutions

Use Cases

Quick Links

Methods of populating data structures for use in evolutionary simulations

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

197 Citations

157 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links