Citation: Zheng Zhang, Sifan Ye, Aiping Wu, Taijiao Jiang, Yousong Peng. Prediction of the Receptorome for the Human-Infecting Virome .VIROLOGICA SINICA, 2021, 36(1) : 133-140.  http://dx.doi.org/10.1007/s12250-020-00259-6

Prediction of the Receptorome for the Human-Infecting Virome

  • Corresponding author: Yousong Peng, pys2013@hnu.edu.cn, ORCID: 0000-0002-5482-9506
  • Electronic supplementary material The online version of this article (https://doi.org/10.1007/s12250-020-00259-6) contains supplementary material, which is available to authorized users.
  • Received Date: 13 April 2020
    Accepted Date: 01 June 2020
    Published Date: 28 July 2020
    Available online: 01 February 2021
  • The virus receptors are key for the viral infection of host cells. Identification of the virus receptors is still challenging at present. Our previous study has shown that human virus receptor proteins have some unique features including high N-glycosylation level, high number of interaction partners and high expression level. Here, a random-forest model was built to identify human virus receptorome from human cell membrane proteins with an accepted accuracy based on the combination of the unique features of human virus receptors and protein sequences. A total of 1424 human cell membrane proteins were predicted to constitute the receptorome of the human-infecting virome. In addition, the combination of the random-forest model with protein-protein interactions between human and viruses predicted in previous studies enabled further prediction of the receptors for 693 human-infecting viruses, such as the enterovirus, norovirus and West Nile virus. Finally, the candidate alternative receptors of the SARS-CoV-2 were also predicted in this study. As far as we know, this study is the first attempt to predict the receptorome for the human-infecting virome and would greatly facilitate the identification of the receptors for viruses.


  • 加载中
    1. Baranowski E, Ruiz-Jarabo CM, Domingo E (2001) Evolution of cell recognition by viruses. Science 292:1102-1105
        doi: 10.1126/science.1058613

    2. Casasnovas JM (2013) Virus-receptor interactions and receptor-mediated virus entry into host cells. In: Mateu MG (ed) Structure and physics of viruses, vol 68. Springer, Netherlands, pp 441-466

    3. Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data, vol 110. University of California, Berkeley, p 24

    4. Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695:1-9

    5. Dimitrov DS (2004) Virus entry: molecular mechanisms and biomedical applications. Nat Rev Microbiol 2:109-122
        doi: 10.1038/nrmicro817

    6. Free RB, Hazelwood LA, Sibley DR (2009) Identifying novel protein-protein interactions using co-immunoprecipitation and mass spectroscopy. Current Protoc Neurosci 46:5.28.1-5.28.14

    7. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150-3152
        doi: 10.1093/bioinformatics/bts565

    8. Gupta R, Jung E, Brunak S (2004) Prediction of N-glycosylation sites in human proteins. http://www.cbs.dtu.dk/services/NetNGlyc

    9. Hoffmann M, Kleine-Weber H, Krueger N, Mueller MA, Drosten C, Pöhlmann S (2020) The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells. BioRxiv. https://doi.org/10.1101/2020.01.31.929042
        doi: 10.1101/2020.01.31.929042

    10. Lasso G, Mayer SV, Winkelmann ER, Chu T, Elliot O, Patino-Galindo JA, Park K, Rabodan R, Honig B, Shapira SD (2019) A structure-informed Atlas of human-virus interactions. Cell 178(1526-1541):e1516

    11. Li F (2015) Receptor recognition mechanisms of coronaviruses: a decade of structural studies. J Virol 89:1954-1964
        doi: 10.1128/JVI.02615-14

    12. Masson P, Hulo C, De Castro E, Bitter H, Gruenbaum L, Essioux L, Bougueleret L, Xenarios I, Le Mercier P (2012) ViralZone: recent updates to the virus knowledge resource. Nucleic Acids Res 41:D579-D583
        doi: 10.1093/nar/gks1220

    13. Minor P, Pipkin P, Hockley D, Schild G, Almond J (1984) Monoclonal antibodies which block cellular receptors of poliovirus. Virus Res 1:203-212
        doi: 10.1016/0168-1702(84)90039-X

    14. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825-2830

    15. Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, Füllgrabe A, Fuentes AM-P, Jupp S, Koskinen S (2016) Expression Atlas update-an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res 44:D746-D752
        doi: 10.1093/nar/gkv1045

    16. Qi F, Qian S, Zhang S, Zhang Z (2020) Single cell RNA sequencing of 13 human tissues identify cell types and receptors of human coronaviruses. Biochem Biophys Res Commun 526:135-140
        doi: 10.1016/j.bbrc.2020.03.044

    17. Ryu W-S (2016) Molecular virology of human pathogenic viruses. Academic Press, Amsterdam, pp 247-260

    18. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447-D452
        doi: 10.1093/nar/gku1003

    19. Wang J-h (2002) Protein recognition by cell surface receptors: physiological receptors versus virus interactions. Trends Biochem Sci 27:122-126
        doi: 10.1016/S0968-0004(01)02038-2

    20. Yan C, Duan G, Wu F-X, Wang J (2019) IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning. BMC Bioinform 20:651
        doi: 10.1186/s12859-019-3278-3

    21. Zhang Z, Zhu Z, Chen W, Cai Z, Xu B, Tan Z, Wu A, Ge X, Guo X, Tan Z, Xia Z, Zhu H, Jiang T, Peng Y (2019) Cell membrane proteins with high n-glycosylation, high expression and multiple interaction partners are preferred by mammalian viruses as receptors. Bioinformatics 35:723-728
        doi: 10.1093/bioinformatics/bty694

    22. Zhang H, Kang Z, Gong H, Xu D, Wang J, Li Z, Cui X, Xiao J, Meng T, Zhou W (2020) The digestive system is a potential route of 2019-nCov infection: a bioinformatics analysis based on single-cell transcriptomes. BioRxiv. https://doi.org/10.1101/2020.01.30.927806
        doi: 10.1101/2020.01.30.927806

    23. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si H-R, Zhu Y, Li B, Huang C-L (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579:270-273
        doi: 10.1038/s41586-020-2012-7

  • 加载中

Figures(2) / Tables(3)

Article Metrics

Article views(4628) PDF downloads(27) Cited by()

Related
Proportional views

    Prediction of the Receptorome for the Human-Infecting Virome

      Corresponding author: Yousong Peng, pys2013@hnu.edu.cn
    • 1. Bioinformatics Center of College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
    • 2. Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
    • 3. Suzhou Institute of Systems Medicine, Suzhou 215123, China
    • 4. Guangzhou Regenerative Medicine and Health Guangdong Laboratory, Guangzhou 510005, China

    Abstract: 

    The virus receptors are key for the viral infection of host cells. Identification of the virus receptors is still challenging at present. Our previous study has shown that human virus receptor proteins have some unique features including high N-glycosylation level, high number of interaction partners and high expression level. Here, a random-forest model was built to identify human virus receptorome from human cell membrane proteins with an accepted accuracy based on the combination of the unique features of human virus receptors and protein sequences. A total of 1424 human cell membrane proteins were predicted to constitute the receptorome of the human-infecting virome. In addition, the combination of the random-forest model with protein-protein interactions between human and viruses predicted in previous studies enabled further prediction of the receptors for 693 human-infecting viruses, such as the enterovirus, norovirus and West Nile virus. Finally, the candidate alternative receptors of the SARS-CoV-2 were also predicted in this study. As far as we know, this study is the first attempt to predict the receptorome for the human-infecting virome and would greatly facilitate the identification of the receptors for viruses.

    • Receptor-binding is the first step for viral infection of host cells. Proteins are considered to be the more ideal receptors for viruses due to their higher binding affinity and specificity than carbohydrate and lipid (Baranowski et al. 2001; Casasnovas 2013; Dimitrov 2004; Li 2015; Wang 2002). Lots of human virus receptors have been identified. In addition, it is not a random process for viruses to choose proteins as their receptors. Previous studies have shown that the proteins that are abundant in the surface of host cells or have relatively low affinity for their natural ligands are the preferred receptors for viruses (Dimitrov 2004; Wang 2002). Moreover, based on a collection of 119 mammalian virus receptors, our recent study has further revealed that human virus receptor proteins have higher level of N-glycosylation, higher number of interaction partners in the human protein-protein interaction (PPI) network and higher expression level in 32 common human tissues compared to other cell membrane proteins (Zhang et al. 2019). The results obtained from these studies could facilitate the identification of human virus receptors.

      Identification of virus receptors in host cells is challenging. Currently, several experimental methods have been developed for identifying virus receptors. The first approach is to select candidate membrane proteins that can bind to the virus receptor-binding proteins (RBPs) by affinity purification and mass spectroscopy (Free et al. 2009; Ryu 2016). The second approach is firstly to identify monoclonal antibodies which can block the virus entry, and then take the membrane proteins to which the monoclonal antibodies bind as candidate receptor proteins (Minor et al. 1984; Ryu 2016). The third approach is to identify virus receptors by functional cloning selection based on the cDNA expression library (Ryu 2016). The proteins that can enable viral infection of the non-susceptible cells after transfection are considered as the receptor candidates. However, the identification of virus receptors is still time-consuming and difficult at present. Besides, the limitations on scalability have hampered the large-scale identification of viral receptors. Therefore, it is urgent to develop computational methods for identification of the human virus receptors.

      Previous studies have developed several computational models for predicting the PPIs between viruses and hosts which can help identify virus receptors (Lasso et al. 2019; Yan et al. 2019). For example, Lasso et al. (2019) developed an in silico computational framework (P-HIPSTer) that employed the structural information to predict more than 280, 000 PPIs between 1001 human-infecting viruses and humans, and made a series of new findings about human-virus interactions. The predicted PPIs between viral RBPs and human cell membrane proteins can be used to identify virus receptors. Here, a computational model was developed to predict the receptorome of the human-infecting virome based on the features of human virus receptors and protein sequences. Furthermore, the combination of this computational model with the PPIs predicted in Lasso's work was further used to predict the receptors for 693 human-infecting viruses. The results of this study would greatly facilitate the identification of human virus receptors.

    • A total of 90 human virus protein receptors were obtained from the viralReceptor database (available at http://www.computationalbiology.cn:5000/viralReceptor) that was developed in our previous study (Zhang et al. 2019). Human cell membrane proteins and human membrane proteins were obtained from the UniProtKB/Swiss-Prot database on February 21, 2020. The human proteins with the words of "cell membrane" and "cell surface" in the field of "Subcellular location" were considered to be human cell membrane proteins. The human proteins with the words of "membrane" and "cell surface" in the field of "Subcellular location" were considered to be human membrane proteins. A total of 3642 human cell membrane proteins and 7663 human membrane proteins were obtained.

    • The N-glycosylation sites of the human proteins mentioned above were obtained from the UniprotKB/Swiss-Prot database. The N-glycosylation sites of proteins without annotation in the UniprotKB/Swiss-Prot database were predicted with NetNGlyc 1.0 (available at http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al. 2004). The N-glycosylation level of these proteins was defined as the number of N-glycosylation sites per 100 amino acids.

      To calculate the node degree of the human proteins in the human PPI network, firstly, the human PPIs with the combined scores greater than 400 were extracted from the STRING database (version 10.5) (Szklarczyk et al. 2015) and were used to form the human PPI network. Then, the node degree was calculated with the function of degree in the R package igraph (version 1.2.4.2) (Csardi and Nepusz 2006).

      The expression level of the human genes in 32 common human tissues was obtained from the Expression Atlas database (Petryszak et al. 2016) on February 6, 2018. Since there were strong correlations between the gene expression level in different tissues, the principal component analysis (PCA) method was used to reduce the correlations with the function of PCA in the package scikit-learn (version 0.21.3) (Pedregosa et al. 2011) in Python (version 3.6.7). Only the first principal component was used to measure the expression level of human genes, which explained 95% of the total variance.

      The amino acid composition (AAC) and the frequencies of k-mers with two amino acids were calculated for each human protein with a Python script.

    • To distinguish human virus receptors from other human cell membrane proteins using machine learning models, the known human virus receptors were chosen as positive samples. Since the human cell membrane proteins may contain virus receptors unidentified yet, the human membrane proteins were taken as negative samples after excluding the human cell membrane proteins.

      Because not all human proteins were observed in the used human PPI network or showed the observed expressions in the available data, only the human proteins that possess all the three protein features, i.e., the N-glycosylation level, node degree and expression in common human tissues, were used in the modeling. Besides, the sequence redundancy in both human virus receptor proteins and human membrane proteins was removed using CD-HIT (version 4.8.1) (Fu et al. 2012) at the 70% identity level. Finally, a total of 88 human virus receptors and 1743 human membrane proteins were used in the machine learning modeling (Supplementary Table S1).

      The random-forest (RF) model is an ensemble machine learning technique using multiple decision trees and can handle data with high variance and high bias, while the risk of over-fitting can be significantly reduced by averaging multiple trees. Therefore, the RF model was chosen to distinguish human virus receptors from other human cell membrane proteins. Since the number of positive samples (human viral receptors) was much smaller than that of negative samples (human membrane proteins), the function of BalanceRandomForest in the package imbalanced-learn (version 0.5.0) (Chen et al. 2004) in Python was used to deal with the imbalanced positive and negative samples with the parameter of n_estimators set to be 100.

      When building the RF model based on AAC or the k-mers with two amino acids, the features of AAC or two-amino-acids k-mers were ranked based on the feature importance which was provided by the function of "feature_importances_" in the package scikit-learn in Python. Then, the top N (N = 1-20 for AAC and N = 1-400 for two-amino-acids k-mers) features were used in the RF models to investigate the influence of feature number used on the model performances.

      Five times of five-fold cross-validations were conducted to evaluate the predictive performances of the RF model with the function of StratifiedKFold in the package scikit-learn in Python. The predictive performances of the RF model were evaluated by the area under receiver operating characteristics curve (AUC), accuracy, sensitivity and specificity.

    • The predicted PPIs between human-infecting viruses and human were obtained from the database of P-HIPSTer (available at http://phipster.org/) (Lasso et al. 2019) on November 1, 2019. A total of 9395 pairs of interactions between viral RBPs and human cell membrane proteins with the likelihood ratio (LR) ≥ 100 were extracted for further analysis, which included 718 viral RBPs and 314 human cell membrane proteins. The RBPs of human-infecting viruses were compiled from three sources: the ViralZone database (Masson et al. 2012), the UniprotKB database in which viral proteins were annotated with GO terms "viral entry into host cell" or "virion attachment the host cell", and the literatures related to viral RBPs. The viruses belonging to the same viral family were supposed to use the same RBPs. For example, all coronaviruses were supposed to take the spike protein as RBPs.

      To evaluate the ability of the RF model in identification of virus receptors, 25 pairs of experimentally validated interactions between viral RBPs and receptors, and the predicted PPIs between these viral RBPs and human cell membrane proteins were extracted from the P-HIPSTer database. For each viral RBP, the predicted RBP-interacting human cell membrane proteins were ranked by either the LR provided in Lasso's work, or the predicted score provided by the RF model. Then, the ranks of the real receptors were analyzed, and the rank percentage of each real receptor was calculated by dividing the rank by the number of RBP-interacting human cell membrane proteins.

      When ranking the RBP-interacting proteins by the RF model, the performance of the RF model may be over-estimated due to the sequence similarity between the RBP-interacting proteins and human proteins in the modeling. To reduce the above effect, for each pair of viral RBP and receptor, the predicted RBP-interacting human cell membrane proteins were clustered with human proteins used in the modeling using CD-HIT at a 50% identity level. All the proteins which were clustered with RBP-interacting proteins were excluded in the modeling.

    • All data used in this study were obtained from public databases as mentioned above and were available in the Supplementary Tables.

    • Our previous studies have shown that human virus protein receptors have unique features including high N-glycosylation level, high number of interaction partners in the human PPI network, and high expression level in 32 common human tissues (Zhang et al. 2019). To identify the potential receptors of the human-infecting virome, firstly, a RF model was built to distinguish the human virus receptor proteins from other human membrane proteins based on the above features. The RF model built based on individual protein feature achieved an AUC ranging from 0.51 to 0.61 in five-fold cross-validations (Table 1). The combination of all three features greatly improved the RF model with the AUC and the prediction accuracy equaling to 0.70 and 0.72, respectively (Table 1).

      Model with different sets of features Feature number Acc Sen Spe AUC
      N-gly 1 0.59 0.58 0.59 0.59
      PPI 1 0.62 0.60 0.62 0.61
      Expression 1 0.50 0.51 0.50 0.51
      N-gly + PPI + Expression 3 0.72 0.68 0.72 0.70
      AAC (top 10) 10 0.70 0.73 0.70 0.71
      N-gly + PPI + Expression + AAC (top10) 13 0.76 0.75 0.76 0.76
      N-gly N-glycosylation, PPI node degree in human PPI network, Expression expressions in 32 human tissues, AAC amino acid composition, Acc accuracy, Sen sensitivity, Spe specificity, AUC area under receiver operating characteristic curve.

      Table 1.  The predictive performances of random-forest models using different sets of features.

      For comparison, we also developed RF models to distinguish the human virus receptors from other human membrane proteins based on protein sequences. The amino acid composition (AAC) of protein sequences was firstly used as features in the modeling. The AUC of RF models increased as the number of most important features (N) of AAC used increased from 1 to 10 (Fig. 1A). Then, it began to decrease when N was greater than 10. The RF model based on top ten features of AAC had an AUC of 0.71 and a prediction accuracy of 0.70 which were similar to that of the model based on a combination of protein features mentioned above. Further studies showed that the RF model based on the frequencies of k-mers with two amino acids didn't improve much compared to the model based on AAC (Fig. 1B). Therefore, only top ten features of AAC were used in the modeling based on protein sequences to reduce the complexity of the model.

      Figure 1.  The AUC of the random-forest model based on top N (N = 1-20 for AAC, N = 1-400 for two-amino-acid k-mers) features of AAC (A) or two-amino-acid k-mers of protein sequences (B).

      To further improve the model for predicting the receptorome of the human-infecting virome, the protein features and the top ten features of AAC of protein sequences were incorporated in the modeling. The RF model achieved an AUC of 0.76. The prediction accuracy, sensitivity and specificity of the model were 0.76, 0.75 and 0.76, respectively (Table 1). The model combining both the protein features and top ten features of AAC of protein sequences was used for further analysis.

    • Based on the RF model, the receptorome was predicted from human cell membrane proteins. A score ranging from 0 to 1 was assigned to each human cell membrane protein. The proteins with high scores are more likely to be virus receptors. A total of 1424 proteins with scores greater than 0.5 were considered to constitute the receptorome of the human-infecting virome. Table 2 listed top 20 human cell membrane proteins and the relevant scores (for all human cell membrane proteins, please see Supplementary Table S2).

      Gene name Protein name RF score Gene name Protein name RF score
      ITGAV Integrin alpha-V 0.959 PTPRJ Receptor-type tyrosine-protein phosphatase eta 0.903
      SCARB1 Scavenger receptor class B member 1 0.948 KDR Vascular endothelial growth factor receptor 2 0.903
      NCAM1 Neural cell adhesion molecule 1 0.943 IL6ST Interleukin-6 receptor subunit beta 0.900
      ITGB1 Integrin beta-1 0.940 SELP P-selectin 0.898
      IGF2R Cation-independent mannose-6-phosphate receptor 0.928 HSPA8 Heat shock cognate 71 kDa protein 0.895
      ITGA6 Integrin alpha-6 0.927 EGFR Epidermal growth factor receptor 0.895
      HLA-DRA HLA class Ⅱ histocompatibility antigen, DR alpha chain 0.926 TNFRSF14 Tumor necrosis factor receptor superfamily member 14 0.895
      ITGA3 Integrin alpha-3 0.914 IL7R Interleukin-7 receptor subunit alpha 0.892
      CR2 Complement receptor type 2 0.911 KIT Mast/stem cell growth factor receptor Kit 0.891
      LDLR Low-density lipoprotein receptor 0.911 SLAMF1 Signaling lymphocytic activation molecule 0.891

      Table 2.  Top 20 human cell membrane proteins and their scores assigned by the random-forest model.

    • Then, the prediction of virus-receptor interactions was investigated. In the previous study, Lasso et al. (2019) predicted 282, 528 pairs of PPIs between human and 1001 human-infecting viruses. Based on the study, 9395 pairs of PPIs between 718 viral RBPs from 693 human-infecting viruses, and 314 human cell membrane proteins were extracted for further analysis (see Supplementary Table S3). A viral RBP was predicted to interact with 1-65 human cell membrane proteins, with a median of 10. For each viral RBP, the RBP-interacting cell membrane proteins were ranked by the score provided by the RF model to select the most likely receptor (Supplementary Table S3).

      To validate the accuracy of the ranking by the RF model, 25 pairs of experimentally validated interactions between viral RBPs and receptors were extracted. For each pair of viral RBP and its receptor, the rank of the real receptor among the predicted RBP-interacting proteins was obtained, and then the related rank percentage was calculated (Materials and Methods). Eight real receptors were ranked in top one by the RF model (Table 3). Besides, nearly 70% (17/25) of real receptors were ranked in top three. On average, the real receptors had a rank percentage of 0.20 among all the RBP-interacting human cell membrane proteins, suggesting that the real receptors would be ranked in the top 20% of all candidates by the RF model.

      Virus name RBP Real viral receptor Num of RBP-interacting proteins Rank by LR Rank by RF score
      SARS-CoV S ACE2 31 * 22
      MERS-CoV S DPP4 8 2
      Echovirus E6 VP1 CD55 13 5 2
      Echovirus E11 VP1 CD55 9 4 2
      Echovirus E7 VP1 CD55 7 3
      Echovirus E13 VP1 CD55 11 4 1
      Echovirus E20 VP1 CD55 12 5 1
      Echovirus E29 VP1 CD55 13 6 2
      Echovirus E33 VP1 CD55 13 6 1
      Enterovirus C VP1 PVR 5 1
      Hepacivirus C E1 EGFR 17 10 2
      MACV GPC TFRC 2 1
      Measles virus H NECTIN4 18 18
      Measles virus H SLAMF1 18 2 2
      Hendra virus G EFNB2 5 1
      Nipah virus G EFNB2 5 1
      HAdV-A L5 CXADR 25 16
      HAdV-C L5 CXADR 5 4 5
      HAdV-D L5 CXADR 28 4 15
      HAdV-E L5 CXADR 33 3 24
      HSV-1 US6 TNFRSF14 28 3
      HSV-1 US6 NECTIN1 28 11
      HSV-2 US6 NECTIN1 34 14
      HSV-2 US6 TNFRSF14 34 23 3
      HIV-1 env CD4 21 1
      Top 1 0 8 (3)#
      Top 3 2 17 (9)#
      Top 5 8 18 (10)#
      Median rank percentage 0.43 0.20 (0.14)#
      The median rank percentage of real virus receptors among RBP-interacting human cell membrane proteins, and the number of real virus receptors among top one, three and five ranks were summarized at the bottom.
      MACV machupo mammarenavirus, HAdV-A human mastadenovirus A, HAdV-C human mastadenovirus C, HAdV-E human mastadenovirus E, HAdV-D human mastadenovirus D, HSV-1 human alphaherpesvirus 1, HSV-2 human alphaherpesvirus 2, HIV-1 human immunodeficiency virus 1.
      *No LR was provided in Lasso's work since there were resolved complex structures between the RBP and the receptor.
      #The number in brackets referred to those when only considering 12 pairs of viral RBP-receptor interaction with LRs available from Lasso's work.

      Table 3.  The ranks of real virus receptors among the RBP-interacting human cell membrane proteins by likelihood ratio (LR) and random-forest (RF) score.

      The LR provided in Lasso's work can also be used to rank the RBP-interacting proteins. 12 of 25 pairs of experimentally validated viral RBP-receptor interactions had LRs available from Lasso's work. For comparison, the viral RBP-interacting human cell membrane proteins were ranked by LR. No real receptor was ranked in top one, and only two real receptors were ranked in top three when ranking RBP-interacting human cell membrane proteins by using LR. On average, the median rank percentage of real receptors was 0.43 when ranking was conducted by the LR, while that was 0.14 by the RF model (Table 3).

    • Previous studies have shown that the ACE2 protein, the receptor of SARS-CoV-2 (Hoffmann et al. 2020; Zhou et al. 2020), shows a low expression level in the lung and the upper respiratory tract (Qi et al. 2020; Zhang et al. 2020). The results indicate that SARS-CoV-2 may have alternative receptors. We investigated the prediction of the alternative receptors for SARS-CoV-2. Lasso's study has predicted PPIs between 28 human cell membrane proteins which were members of the receptorome of human-infecting viruses, and the spike proteins of two coronaviruses, including Severe Acute Respiratory Syndrome-CoV and Middle East Respiratory Syndrome-CoV. We supposed that the SARS-CoV-2 is very likely to use these spike-interacting proteins as its alternative receptors. These spike-interacting proteins were ranked by the scores provided by the RF model (Fig. 2). The expression level of these spike-interacting proteins in 32 common human tissues were shown in Fig. 2. Most of them had higher expression level than ACE2 in the lung, such as APP, EZR, CD4 and so on.

      Figure 2.  The predicted alternative receptors (on the left side) of SARS-CoV-2 and their expressions in 32 human tissues (on the bottom). The predicted alternative receptors were ranked by the RF score. The expression level was measured by transcripts per million (TPM) and was colored according to the legend on the top right. The white referred to no data available. The lung was highlighted by an arrow. The ACE2 was marked by an asterisk.

    • The identification of receptors for human-infecting viruses is critical for understanding the interactions between viruses and human. Our previous studies have shown that human virus receptor proteins have some unique features compared to other cell membrane proteins, including high N-glycosylation level, high number of interaction partners and high expression level. This study further built a RF model for identifying human virus receptors from human cell membrane proteins with an accepted accuracy. Based on the RF model, the receptorome for the human-infecting virome was predicted, which included a total of 1424 human cell membrane proteins. The results could facilitate the identification of human virus receptors.

      In the previous study, Lasso et al. (2019) developed a computational model for predicting PPIs between human-infecting viruses and human. A variable number of human cell membrane proteins were predicted to interact with viral RBPs. To further select the potential receptors for viruses, both the LR and the RF model were used to rank the viral RBP-interacting human cell membrane proteins. The RF model was found to rank the real receptors better than the LR in a small validation dataset, suggesting that the performance of the RF model may be superior to that of the LR in selecting the real receptors from the predicted RBP-interacting human cell membrane proteins. The combination of the RF model and the RBP-interacting human cell membrane proteins predicted in Lasso's work enabled the prediction of receptors for 693 human-infecting viruses (Supplementary Table S3). Nevertheless, more efforts are needed to validate these candidate receptors in future studies.

      There are some limitations to this study. Firstly, the number of human virus receptor proteins was much smaller than that of human membrane proteins in the modeling, which may hinder accurate modeling. Thus, the under-sampling method was used to deal with the imbalance problem. Secondly, the performance of the RF model was modest in discriminating human virus receptor proteins from human membrane proteins. More efforts are still needed to improve the model. Thirdly, although the RF model can be used to predict the receptorome of human-infecting virome, it is not feasible to use the model to identify the receptors for a specific human-infecting virus. The combination of the RF model with the model of PPI predictions such as Lasso's work can help identify virus-receptor interactions.

      In conclusion, this study for the first time built a computational model for predicting the receptorome of the human-infecting virome. The results can facilitate the identification of human virus receptors in either computational or experimental studies.

    • This work was supported by the National Key Plan for Scientific Research and Development of China (2016YFD0500300), Hunan Provincial Natural Science Foundation of China (2018JJ3039), the Chinese Academy of Medical Sciences (2016-I2M-1-005), and the special project for COVID-19 of Guangzhou Regenerative Medicine and Health Guangdong Laboratory (2020GZR110406001).

    • YP and ZZ designed the study. ZZ and SY performed the analysis. SY and ZZ drafted the manuscript. AW, TJ and YP revised the manuscript. YP finalized the manuscript. All authors approved the final version of the manuscript.

    • The authors declare no conflicts of interest.

    • This article does not contain any studies with human or animal subjects.

    Figure (2)  Table (3) Reference (23) Relative (20)

    目录

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return