Methods of populating data structures for use in evolutionary simulations
First Claim
1. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
- i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adding the product strings to a data structure to populate a data structure of product strings;
v) determining sequence identities of at least one of the product strings relative to at least one initial character string; and
vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string.
3 Assignments
0 Petitions
Accused Products
Abstract
In particular, this invention provides novel methods of populating data structures for use in evolutionary modeling. In particular, this invention provides methods of populating a data structure with a plurality of character strings. The methods involve encoding two or more a biological molecules into character strings to provide a collection of two or more different initial character strings; selecting at least two substrings from the pool of character strings; concatenating the substrings to form one or more product strings about the same length as one or more of the initial character strings; adding the product strings to a collection of strings; and optionally repeating this process using one or more of the product strings as an initial string in the collection of initial character strings.
197 Citations
157 Claims
-
1. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
-
i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adding the product strings to a data structure to populate a data structure of product strings;
v) determining sequence identities of at least one of the product strings relative to at least one initial character string; and
vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer program product on a computer readable media comprising computer code that:
-
i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
ii) selects at least two substrings from said initial character strings;
iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adds the product strings to a data structure to populate a data structure of product strings;
v) determines sequence identities of at least one of the product strings relative to at least one initial character string; and
vi) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
-
-
55. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
-
i) encoding two or more related biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings;
iv) adding the product strings to a data structure to populate a data structure of product strings; and
v) determining whether at least one of the product strings have at least a predefined measure of similarity with at least one initial character string; and
vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings determined to have greater than the predefined value of sequence identity with at least one initial string.
-
-
56. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
-
i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adding the product strings to a data structure to populate a data structure of product strings;
v) providing an alignment of at least one of the product strings relative to at least one initial character string; and
vi) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73)
-
-
74. A computer program product on a computer readable media comprising computer code that:
-
i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
ii) selects at least two substrings from said initial character strings;
iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adds the product strings to a data structure to populate a data structure of product strings;
v) provides an alignment of at least one of the product strings relative to at least one initial character string; and
vi) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
-
-
92. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
-
i) encoding two or more naturally occurring biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adding the product strings to a data structure to populate a data structure of product strings; and
v) selecting one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110)
-
-
111. A computer program product on a computer readable media comprising computer code that:
-
i) encodes two or more naturally occurring biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
ii) selects at least two substrings from said initial character strings;
iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adds the product strings to a data structure to populate a data structure of product strings; and
v) selects one or more product biological molecules for production, wherein the one or more product biological molecules correspond to one or more of the product strings having greater than 30% sequence identity with the at least one initial character string. - View Dependent Claims (112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129)
-
-
130. A method of identifying molecules for production, wherein the molecules are represented by concatenated strings, said method comprising:
-
i) encoding two or more biological molecules into a data structure of initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about 10 subunits;
ii) selecting at least two substrings from said initial character strings;
iii) concatenating said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adding the product strings to a data structure to populate a data structure of product strings;
v) obtaining one or more computationally predicted properties for at least one of the product strings in the data structure; and
vi) selecting one or more product biological molecules for production on the basis of the one or more computationally predicted properties. - View Dependent Claims (131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143)
-
-
144. A computer program product on a computer readable media comprising computer code that:
-
i) encodes two or more biological molecules into initial character strings to provide a collection of two or more different initial character strings wherein each of said biological molecules comprises at least about ten subunits;
ii) selects at least two substrings from said initial character strings;
iii) concatenates said substrings to form one or more product strings about the same length as one or more of the initial character strings;
iv) adds the product strings to a data structure to populate a data structure of product strings;
v) obtains one or more computationally predicted properties for at least one of the product strings in the data structure; and
vi) selects one or more product biological molecules for production on the basis of the one or more computationally predicted properties. - View Dependent Claims (145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157)
-
Specification