An epitope, or antigenic determinant, is the part of a macromolecular complex which is recognized by the immune system and is classified according to its targeting receptor. T cell epitopes, as presented in the major histocompatibility complex (MHC) molecule, are antigenic peptide strings recognized by T cells receptors. MHCⅠ molecules present endogenous antigens while MHCⅡ molecules present exogenous antigens. The MHCⅠ molecule binds to a peptide of approximate 9 amino acids in length within a closed groove. In contrast, because the antigen-binding groove is open at both ends, the MHCⅡ molecules can present much longer peptides, generally varying from 12 to 25 amino acids, nine of which occupy the binding groove. This difference between MHCⅠ and MHCⅡ is very important for the development of distinct prediction algorithms. B cell epitopes, which are recognized by antibodies or B cells, are divided into a series of continuous linear epitopes and discon-tinuous conformational epitopes. The conformational epitopes that comprise the major B cell epitopes can be considered in terms of the three-dimensional (3D) surface features they contribute to the antigenic molecules, whereas the linear epitopes are defined by the amino acid sequence rather than by their 3D shape.
Given their impact on global health, much effort has been expended into elucidating the mechanisms utilized by viruses in order to elude the host immune system and development of prophylatic and ther-apeutic vaccines. Traditional vaccines are based on inactivated or attenuated pathogens. While they play an important role in the protection of infectious diseases, these vaccines have biohazard risk since they may infect the recipients. Compared to traditional vaccines design, epitope-based vaccines provide a safer approach as they consist of rationally designed protective epitopes, which are able to stimulate effective immune responses whilst avoiding potentially hazardous and undesirable side effects. Nevertheless, the identification and selection of suitable epitopes is a time-consuming and expensive work requiring careful experimental screening. Hence, there is a need for developing a quicker and cheaper strategy to address this bottleneck. One option is to use computational predictions and there are many bioinformatics tools available. Also, there are several public accessible epitope-associated databases, such as IEDB (The Immune Epitope Database and Analysis Resouces; http://www.immuneepitope.org) which are further sources of information. The IEDB database is a repository of curated empirical epitopes data, comprising both positive and negative subset, which can be used as a benchmark. Since these types of computational tools have been incorporated into the vaccine screening pipeline, epitope based vaccine design has become significantly more efficient. Even though the effectiveness of these prediction tools are limited by their accuracy, it is likely that, with the increasing amounts of experimental data such as genome sequences and protein structures, and the development of new and more advanced algorithms, more powerful and efficient virus epitope prediction tools will be available in the near future.
The underlying assumption in these methods is that some evolutionary relationship exists amongst groups of viruses and sequence analysis can discover conserved patterns within these groups. The manner in which these patterns are identified depends on the class of epitope and the particular software. Both T cells and B cells epitopes prediction are essential for epitope-based or epitope-driven vaccine design and a variety of prediction algorithms have been developed based on the antigen's primary amino acid sequence, 3D structure or other protein characteristics such as hydrophilicity, accessibility and flexibility. Several epitopes prediction softwares are currently available (Table 1).
Table 1. The list of currently available epitope prediction softwares
The tools listed above performed differently in terms of accuracy, specificity and sensitivity. The predictive results comprise true positive (TP), false positive (FP), true negative (TN) and false negative (FN). Generally, the accuracy which measures the ratio of correct predictions comprising true positive and true negative by (TP+TN)/(TP+FP+TN+FN), is the major criterion for assessing the quality of a predictive model. A predictive model with higher accuracy is often better than the lower one. Also, the sensitivity which measures the ratio of true positive by TP/(TP+FN), the specificity which measures the ratio of true negative by TN/(TN+FP) are important criterions. The accuracy is increased by increasing simultaneously both sensitivity and speciﬁcity, or by increasing either. Theoretically, an optimal prediction can reach 100% sensitivity and 100% specificity, which means the 100% accuracy; while in practice, it is impossible to achieve such a perfect prediction, there is usually a trade-off between the measures. Usually, the machine-learning based algorithms perform better than the motif-based algorithms in T cell epitopes prediction and linear B cell epitopes prediction with a higher accuracy. For example, SVMHC is more accurate than SYFPEITH (T cell epitopes prediction); while ABCpred has a significant increase in accuracy compared to Bcepred (linear B cell epitopes prediction). The conformational B cell epitopes prediction algorithms are based on 3D structure, and they have different performance. For instance, CEP predict more accurate than Discotope. However, there are no perfect algorithms, every tools has its strengths and weaknesses. In general, in practice several tools should be combined for epitopes prediction.