Systems and methods for biopolymer engineering
First Claim
1. A method for constructing a variant set for modifying a biopolymer of interest, the method comprising:
- a) identifying a plurality of positions in said biopolymer of interest and, for each respective position in said plurality of positions, one or more substitutions for the respective position, wherein the plurality of positions and the one or more substitutions for each respective position in the plurality of positions collectively define a biopolymer sequence space;
b) selecting a first plurality of variants of the biopolymer of interest thereby forming a variant set, wherein said variant set comprises a subset of said biopolymer sequence space;
c) measuring a property of all or a portion of the variants in the variant set; and
d) modeling, using a suitably programmed computer, a sequence-activity relationship between (i) one or more substitutions at one or more positions of the biopolymer of interest represented by the variant set and (ii) the property measured for all or the portion of the variants in the variant set, wherein the sequence-activity relationship has the form
Y=f(w1x1,w2x2, . . . wixi)wherein,Y is a quantitative measure of the property;
xi is a descriptor of a substitution, a combination of substitutions, or a component of one or more substitutions, at one or more positions in the plurality of positions;
wi is a weight applied to the descriptor xi; and
f( ) is a mathematical function,and wherein the modeling comprises;
i) optimizing, using a suitably programmed computer, the sequence-activity relationship by adjusting individual weights wi for each said descriptor xi using a refinement algorithm that minimizes the difference between the predicted values and the real values of Y from partial data, wherein the partial data is the first plurality of variants with either (1) individual sequences left out on a random basis or (2) individual substitutions at positions in the plurality of positions left out on a random basis, andii) repeating the optimizing i) a plurality of times thereby obtaining, for each respective substitution or combination of substitutions xi, (a) an average value for the weight wi describing a relative or absolute contribution of the respective substitution or combination of substitutions xi to Y, and (b) a standard deviation, variance or other measure of confidence in the weight wi describing the relative or absolute contribution of the respective substitution or combination of substitutions xi to Y.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, computer systems, and computer program products for biopolymer engineering. A variant set for a biopolymer of interest is constructed by identifying, using a plurality of rules, a plurality of positions in the biopolymer of interest and, for each respective position in the plurality of positions, substitutions for the respective position. The plurality of positions and the substitutions for each respective position in the plurality of positions collectively define a biopolymer sequence space. A variant set comprising a plurality of variants of the biopolymer of interest is selected. A property of all or a portion of the variants in the variant set is measured. A sequence-activity relationship is modeled between (i) one or more substitutions at one or more positions of the biopolymer of interest represented by the variant set and (ii) the property measured for all or the portion of the variants in the variant set. The variant set is redefined to comprise variants that include substitutions in the plurality of positions that are selected based on a function of the sequence-activity relationship.
29 Citations
44 Claims
-
1. A method for constructing a variant set for modifying a biopolymer of interest, the method comprising:
-
a) identifying a plurality of positions in said biopolymer of interest and, for each respective position in said plurality of positions, one or more substitutions for the respective position, wherein the plurality of positions and the one or more substitutions for each respective position in the plurality of positions collectively define a biopolymer sequence space; b) selecting a first plurality of variants of the biopolymer of interest thereby forming a variant set, wherein said variant set comprises a subset of said biopolymer sequence space; c) measuring a property of all or a portion of the variants in the variant set; and d) modeling, using a suitably programmed computer, a sequence-activity relationship between (i) one or more substitutions at one or more positions of the biopolymer of interest represented by the variant set and (ii) the property measured for all or the portion of the variants in the variant set, wherein the sequence-activity relationship has the form
Y=f(w1x1,w2x2, . . . wixi)wherein, Y is a quantitative measure of the property; xi is a descriptor of a substitution, a combination of substitutions, or a component of one or more substitutions, at one or more positions in the plurality of positions; wi is a weight applied to the descriptor xi; and f( ) is a mathematical function, and wherein the modeling comprises; i) optimizing, using a suitably programmed computer, the sequence-activity relationship by adjusting individual weights wi for each said descriptor xi using a refinement algorithm that minimizes the difference between the predicted values and the real values of Y from partial data, wherein the partial data is the first plurality of variants with either (1) individual sequences left out on a random basis or (2) individual substitutions at positions in the plurality of positions left out on a random basis, and ii) repeating the optimizing i) a plurality of times thereby obtaining, for each respective substitution or combination of substitutions xi, (a) an average value for the weight wi describing a relative or absolute contribution of the respective substitution or combination of substitutions xi to Y, and (b) a standard deviation, variance or other measure of confidence in the weight wi describing the relative or absolute contribution of the respective substitution or combination of substitutions xi to Y. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
Specification