T-cell epitope prediction
First Claim
1. A computer implemented method that facilitates epitope prediction, comprising:
- training, using a processing unit, a logistic regression (LR) model for epitope prediction using information from a plurality of sources representative of standard and special features of a desired epitope,wherein the standard features comprise, alone or in conjunction, data representative of an identity and/or supertype of a major histocompatibility complex (MHC) allele, and data representative of the identity and/or a chemical property of an amino acid at a certain position of an epitope,wherein the special features comprise, alone or Boolean combinations of, data representative of the standard features, the identity of an amino acid and/or the chemical property of the amino acid at a given position along either region that flanks an epitope and data representative of an amino acid and/or the chemical property of the amino acid at a given position along a MHC molecule;
employing hidden variables that represent an absence of supertypes among MHC molecules;
employing a shift variable that represents a position of a peptide within a groove of the MHC molecule; and
performing a multi-factor cross validation to confirm the epitope prediction.
3 Assignments
0 Petitions
Accused Products
Abstract
Epitope prediction models are described herein. By way of example, a system for predicting epitope information relating to a epitope can include a classification model (e.g., logistic regression model). The trained classification model can illustratively operatively execute one ore logistic functions on received protein data, and incorporate one or more of hidden binary variables and shift variables that when processed represent the identification (e.g., prediction) of one or more desired epitopes. The classification model can be configured to predict the epitope information by processing data including various features of an epitope, MHC, MHC supertype, and Boolean combinations thereof.
26 Citations
20 Claims
-
1. A computer implemented method that facilitates epitope prediction, comprising:
-
training, using a processing unit, a logistic regression (LR) model for epitope prediction using information from a plurality of sources representative of standard and special features of a desired epitope, wherein the standard features comprise, alone or in conjunction, data representative of an identity and/or supertype of a major histocompatibility complex (MHC) allele, and data representative of the identity and/or a chemical property of an amino acid at a certain position of an epitope, wherein the special features comprise, alone or Boolean combinations of, data representative of the standard features, the identity of an amino acid and/or the chemical property of the amino acid at a given position along either region that flanks an epitope and data representative of an amino acid and/or the chemical property of the amino acid at a given position along a MHC molecule; employing hidden variables that represent an absence of supertypes among MHC molecules; employing a shift variable that represents a position of a peptide within a groove of the MHC molecule; and performing a multi-factor cross validation to confirm the epitope prediction. - View Dependent Claims (2)
-
-
3. A computer readable storage medium having computer readable instructions to instruct a computer to perform a method comprising:
-
training a logistic regression (LR) model for epitope prediction using information from a plurality of sources representative of standard and special features of a desired epitope, wherein the standard features comprise, alone or in conjunction, data representative of an identity and/or supertype of a major histocompatibility complex (MHC) allele, and data representative of the identity and/or a chemical property of an amino acid at a certain position of an epitope, wherein the special features comprise, alone or Boolean combinations of, data representative of the standard features, the identity of an amino acid and/or the chemical property of the amino acid at a given position along either region that flanks an epitope and data representative of an amino acid and/or the chemical property of the amino acid at a given position along a MHC molecule; using hidden variables that represent a presence of supertypes among MHC molecules; employing a shift variable that represents a position of a peptide within a groove of the MHC molecule; and performing a multi-factor cross validation to confirm the epitope prediction.
-
-
4. A computer implemented method that facilitates epitope prediction, comprising:
-
training, using a processing unit, a logistic regression (LR) model for epitope prediction using one or more hidden variables representative of one or more characteristics of a major histocompatibility complex (MHC) molecule; using special features that include an identity of an amino acid and/or a chemical property of the amino acid at a given position along either region that flanks an epitope and the identity of the amino acid and/or the chemical property of the amino acid at a given position along the MHC molecule; and performing a multi-factor cross validation to confirm the epitope prediction. - View Dependent Claims (5, 6, 7, 8)
-
-
9. A computer implemented method that facilitates epitope prediction, comprising:
-
training, using a processing unit, a logistic regression (LR) model for epitope prediction using one or more shift variables representative of a position of a peptide within a groove of a major histocompatibility complex (MHC) molecule; using data representative of an identity of a major histocompatibility complex (MHC) allele and data representative of the identity of an amino acid at a certain position of an epitope; and performing a multi-factor cross validation to confirm the epitope prediction. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system that facilitates predicting an epitope, the system stored on computer-readable storage media, the system comprising:
-
a prediction component configured to predict epitope information by employing information from a plurality of sources including human leukocyte antigen (HLA) alleles and HLA supertypes to be processed by a classification model; a classification model engine executing a selected trained classification model employing information using standard and special features of the epitope, wherein the trained classification model is trained to include one or more hidden binary variables that represent a presence or an absence of the HLA supertypes, wherein the trained classification model is trained to include one or more shift variables to generate the epitope prediction, wherein the trained classification model is trained within the HLA alleles and within the HLA supertypes to generate the epitope prediction, and the classification model engine further executing the selected trained classification model to perform an optimization by determining a global maximum; and the classification model engine further executing the selected trained classification model to perform a multi-factor cross validation to confirm the epitope prediction. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification