Citation: Huiting Chen, Zhaozhong Zhu, Ye Qiu, Xingyi Ge, Heping Zheng, Yousong Peng. Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms .VIROLOGICA SINICA, 2022, 37(3) : 437-444.  http://dx.doi.org/10.1016/j.virs.2022.04.006

Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms

  • Corresponding author: Yousong Peng, pys2013@hnu.edu.cn
  • Received Date: 01 November 2021
    Accepted Date: 02 April 2022
    Available online: 02 May 2022
  • The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.

  • 加载中
  • 10.1016j.virs.2022.04.006-ESM.docx
    1. Acharya, A., Kevadiya, B.D., Gendelman, H.E., Byrareddy, S.N., 2020. SARS-CoV-2 infection leads to neurological dysfunction. J. Neuroimmune Pharmacol. 15, 167-173.

    2. Anand, K., Ziebuhr, J., Wadhwani, P., Mesters, J.R., Hilgenfeld, R., 2003. Coronavirus main proteinase (3CLpro) structure:basis for design of anti-SARS drugs. Science 300, 1763-1767.

    3. Arabi, Y.M., Harthi, A., Hussein, J., Bouchama, A., Johani, S., Hajeer, A.H., Saeed, B.T., Wahbi, A., Saedy, A., Aldabbagh, T., Okaili, R., Sadat, M., Balkhy, H., 2015. Severe neurologic syndrome associated with Middle East respiratory syndrome corona virus(MERS-CoV). Infection 43, 495-501.

    4. Arya, R., Kumari, S., Pandey, B., Mistry, H., Bihani, S.C., Das, A., Prashar, V., Gupta, G.D., Panicker, L., Kumar, M., 2021. Structural insights into SARS-CoV-2 proteins. J. Mol.Biol. 433, 166725.

    5. Chafekar, A., Fielding, B.C., 2018. MERS-CoV:understanding the latest human coronavirus threat. Viruses 10, 93.

    6. Chen, B., Tian, E.K., He, B., Tian, L., Han, R., Wang, S., Xiang, Q., Zhang, S., El Arnaout, T., Cheng, W., 2020. Overview of lethal human coronaviruses. Signal Transduct. Targeted Ther. 5, 89.

    7. Chen, S., Tian, J., Li, Z., Kang, H., Zhang, J., Huang, J., Yin, H., Hu, X., Qu, L., 2019. Feline infectious peritonitis virus Nsp5 inhibits type I interferon production by cleaving NEMO at multiple sites. Viruses 12, 43.

    8. Chuck, C.P., Chong, L.T., Chen, C., Chow, H.F., Wan, D.C.C., Wong, K.B., 2010. Profiling of substrate specificity of SARS-CoV 3CL. PLoS One 5, e13197.

    9. Chuck, C.P., Chow, H.F., Wan, D.C.C., Wong, K.B., 2011. Profiling of substrate specificities of 3C-like proteases from group 1, 2a, 2b, and 3 coronaviruses. PLoS One 6, e27228.

    10. Cohen, M.E., Eichel, R., Steiner-Birmanns, B., Janah, A., Ioshpa, M., Bar-Shalom, R., Paul, J.J., Gaber, H., Skrahina, V., Bornstein, N.M., Yahalom, G., 2020. A case of probable Parkinson's disease after SARS-CoV-2 infection. Lancet Neurol. 19, 804-805.

    11. Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E., 2004. WebLogo:a sequence logo generator. Genome Res. 14, 1188-1190.

    12. Cui, J., Li, F., Shi, Z.L., 2019. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 17, 181-192.

    13. Dewanjee, S., Vallamkondu, J., Kalra, R.S., Puvvada, N., Kandimalla, R., Reddy, P.H., 2021. Emerging COVID-19 neurological manifestations:present outlook and potential neurological challenges in COVID-19 pandemic. Mol. Neurobiol. 58, 4694-4715.

    14. El Boujnouni, H., Rahouti, M., El Boujnouni, M., 2021. Identification of SARS-CoV-2 origin:using Ngrams, principal component analysis and Random Forest algorithm. Inform. Med. Unlocked 24, 100577.

    15. Fang, S., Shen, H., Wang, J., Tay, F.P.L., Liu, D.X., 2010. Functional and genetic studies of the substrate specificity of coronavirus infectious bronchitis virus 3C-like proteinase. J. Virol. 84, 7325-7336.

    16. Fearon, C., Fasano, A., 2021. Parkinson's disease and the COVID-19 pandemic. J. Parkinsons Dis. 11, 431-444.

    17. Fu, L., Ye, F., Feng, Y., Yu, F., Wang, Q., Wu, Y., Zhao, C., Sun, H., Huang, B., Niu, P., Song, H., Shi, Y., Li, X., Tan, W., Qi, J., Gao, G.F., 2020. Both Boceprevir and GC376 efficaciously inhibit SARS-CoV-2 by targeting its main protease. Nat. Commun. 11, 4417.

    18. Gralinski, L.E., Bankhead, A., Jeng, S., Menachery, V.D., Proll, S., Belisle, S.E., Matzke, M., Webb-Robertson, B.J.M., Luna, M.L., Shukla, A.K., Ferris, M.T., Bolles, M., Chang, J., Aicher, L., Waters, K.M., Smith, R.D., Metz, T.O., Law, G.L., Katze, M.G., Mcweeney, S., Baric, R.S., 2013. Mechanisms of severe acute respiratory syndrome coronavirus-induced acute lung injury. mBio 4, e00271, 13.

    19. Grau, J., Grosse, I., Keilwagen, J., 2015. PRROC:computing and visualizing precisionrecall and receiver operating characteristic curves in R. Bioinformatics 31, 2595-2597.

    20. Gupta, P., Mohanty, D., 2021. SMMPPI:a machine learning-based approach for prediction of modulators of protein-protein interactions and its application for identification of novel inhibitors for RBD:hACE2 interactions in SARS-CoV-2. Briefings Bioinf. 22 bbab111.

    21. Hartenian, E., Nandakumar, D., Lari, A., Ly, M., Tucker, J.M., Glaunsinger, B.A., 2020. The molecular virology of coronaviruses. J. Biol. Chem. 295, 12910-12934.

    22. Hu, B., Guo, H., Zhou, P., Shi, Z.L., 2021. Characteristics of SARS-CoV-2 and COVID-19. Nat. Rev. Microbiol. 19, 141-154.

    23. Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7:improvements in performance and usability. Mol. Biol. Evol. 30, 772-780.

    24. Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M., 2008. AAindex:amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202-D205.

    25. Kiemer, L., Lund, O., Brunak, S., Blom, N., 2004. Coronavirus 3CLpro proteinase cleavage sites:possible relevance to SARS virus pathology. BMC Bioinf. 5, 72.

    26. Kim, J.E., Heo, J.H., Kim, H.O., Song, S.H., Park, S.S., Park, T.H., Ahn, J.Y., Kim, M.K., Choi, J.P., 2017. Neurological complications during treatment of Middle East respiratory syndrome. J. Clin. Neurol. 13, 227-233.

    27. Klemm, T., Ebert, G., Calleja, D.J., Allison, C.C., Richardson, L.W., Bernardini, J.P., Lu, B.G., Kuchel, N.W., Grohmann, C., Shibata, Y., Gan, Z.Y., Cooney, J.P., Doerflinger, M., Au, A.E., Blackmore, T.R., Van Der Heden Van Noort, G.J., Geurink, P.P., Ovaa, H., Newman, J., Riboldi-Tunnicliffe, A., Czabotar, P.E., Mitchell, J.P., Feltham, R., Lechtenberg, B.C., Lowes, K.N., Dewson, G., Pellegrini, M., Lessene, G., Komander, D., 2020. Mechanism and inhibition of the papain-like protease, PLpro, of SARS-CoV-2. EMBO J. 39, e106275.

    28. Kounakis, K., Tavernarakis, N., 2019. The cytoskeleton as a modulator of aging and neurodegeneration. Adv. Exp. Med. Biol. 1178, 227-245.

    29. Larsen, C.N., Sun, G., Li, X., Zaremba, S., Zhao, H., He, S., Zhou, L., Kumar, S., Desborough, V., Klem, E.B., 2020. Mat_peptide:comprehensive annotation of mature peptides from polyproteins in five virus families. Bioinformatics 36, 1627-1628.

    30. Lau, K.K., Yu, W.-C., Chu, C.M., Lau, S.T., Sheng, B., Yuen, K.Y., 2004. Possible central nervous system infection by SARS coronavirus. Emerg. Infect. Dis. 10, 342-344.

    31. Lu, C., Zhang, Z., Cai, Z., Zhu, Z., Qiu, Y., Wu, A., Jiang, T., Zheng, H., Peng, Y., 2021. Prokaryotic virus host predictor:a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol. 19, 5.

    32. Moustaqil, M., Ollivier, E., Chiu, H.P., Van Tol, S., Rudolffi-Soto, P., Stevens, C., Bhumkar, A., Hunter, D.J.B., Freiberg, A.N., Jacques, D., Lee, B., Sierecki, E., Gambin, Y., 2021. SARS-CoV-2 proteases PLpro and 3CLpro cleave IRF3 and critical modulators of inflammatory pathways (NLRP12 and TAB1):implications for disease presentation across species. Emerg. Microb. Infect. 10, 178-195.

    33. Oberstadt, M., Claßen, J., Arendt, T., Holzer, M., 2018. TDP-43 and cytoskeletal proteins in ALS. Mol. Neurobiol. 55, 3143-3151.

    34. Pablos, I., Machado, Y., De Jesus, H.C.R., Mohamud, Y., Kappelhoff, R., Lindskog, C., Vlok, M., Bell, P.A., Butler, G.S., Grin, P.M., Cao, Q.T., Nguyen, J.P., Solis, N., Abbina, S., Rut, W., Vederas, J.C., Szekely, L., Szakos, A., Drag, M., Kizhakkedathu, J.N., Mossman, K., Hirota, J.A., Jan, E., Luo, H., Banerjee, A., Overall, C.M., 2021. Mechanistic insights into COVID-19 by global analysis of the SARS-CoV-2 3CL substrate degradome. Cell Rep. 37, 109892.

    35. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É., 2011. Scikit-learn:machine learning in Python. J. Mach. Learn. Res. 12, 2825-2830.

    36. Qiang, X.L., Xu, P., Fang, G., Liu, W.B., Kou, Z., 2020. Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus. Infect Dis Poverty 9, 33.

    37. Rosado, J., Pelleau, S., Cockram, C., Merkling, S.H., Nekkab, N., Demeret, C., Meola, A., Kerneis, S., Terrier, B., Fafi-Kremer, S., De Seze, J., Bruel, T., Dejardin, F., Petres, S., Longley, R., Fontanet, A., Backovic, M., Mueller, I., White, M.T., 2021. Multiplex assays for the identification of serological signatures of SARS-CoV-2 infection:an antibody-based diagnostic and machine learning study. Lancet Microbe 2, e60-e69.

    38. Schechter, I., Berger, A., 1967. On the size of the active site in proteases. I. Papain. Biochem Biophys Res Commun 27, 157-162.

    39. Shang, J., Han, N., Chen, Z., Peng, Y., Li, L., Zhou, H., Ji, C., Meng, J., Jiang, T., Wu, A., 2021. Compositional diversity and evolutionary pattern of coronavirus accessory proteins. Briefings Bioinf. 22, 1267-1278.

    40. Singh, O., Su, E.C.Y., 2016. Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features. BMC Bioinf. 17, 478.

    41. Snijder, E.J., Decroly, E., Ziebuhr, J., 2016. The nonstructural proteins directing coronavirus RNA synthesis and processing. Adv. Virus Res. 96, 59-126.

    42. Stanley, J.T., Gilchrist, A.R., Stabell, A.C., Allen, M.A., Sawyer, S.L., Dowell, R.D., 2020. Two-stage ML classifier for identifying host protein targets of the dengue protease. Pac Symp Biocomput 25, 487-498.

    43. Taquet, M., Geddes, J.R., Husain, M., Luciano, S., Harrison, P.J., 2021. 6-month neurological and psychiatric outcomes in 236 379 survivors of COVID-19:a retrospective cohort study using electronic health records. Lancet Psychiatr. 8, 416-427.

    44. Tsai, L.K., Hsieh, S.T., Chao, C.C., Chen, Y.C., Lin, Y.H., Chang, S.C., Chang, Y.C., 2004. Neuromuscular disorders in severe acute respiratory syndrome. Arch. Neurol. 61, 1669-1673.

    45. Vuong, W., Khan, M.B., Fischer, C., Arutyunova, E., Lamer, T., Shields, J., Saffran, H.A., Mckay, R.T., Van Belkum, M.J., Joyce, M.A., Young, H.S., Tyrrell, D.L., Vederas, J.C., Lemieux, M.J., 2020. Feline coronavirus drug inhibits the main protease of SARSCoV-2 and blocks virus replication. Nat. Commun. 11, 4282.

    46. Wang, D., Fang, L., Shi, Y., Zhang, H., Gao, L., Peng, G., Chen, H., Li, K., Xiao, S., 2016. Porcine epidemic diarrhea virus 3C-like protease regulates its interferon antagonism by cleaving NEMO. J. Virol. 90, 2090-2101.

    47. WHO, 2022. WHO coronavirus (COVID-19) overview. https://covid19.who.int/.(Accessed 25 March 2022).

    48. Xu, J., Zhong, S., Liu, J., Li, L., Li, Y., Wu, X., Li, Z., Deng, P., Zhang, J., Zhong, N., Ding, Y., Jiang, Y., 2005. Detection of severe acute respiratory syndrome coronavirus in the brain:potential role of the chemokine mig in pathogenesis. Clin. Infect. Dis. 41, 1089-1096.

    49. Yu, G., Wang, L.G., Han, Y., He, Q.Y., 2012. clusterProfiler:an R package for comparing biological themes among gene clusters. OMICS 16, 284-287.

    50. Zhu, X., Chen, J., Tian, L., Zhou, Y., Xu, S., Long, S., Wang, D., Fang, L., Xiao, S., 2020. Porcine deltacoronavirus nsp5 cleaves DCP1A to decrease its antiviral activity. J. Virol. 94, e02162, 19.

    51. Zhu, X., Fang, L., Wang, D., Yang, Y., Chen, J., Ye, X., Foda, M.F., Xiao, S., 2017a. Porcine deltacoronavirus nsp5 inhibits interferon-β production through the cleavage of NEMO. Virology 502, 33-38.

    52. Zhu, X., Wang, D., Zhou, J., Pan, T., Chen, J., Yang, Y., Lv, M., Ye, X., Peng, G., Fang, L., Xiao, S., 2017b. Porcine deltacoronavirus nsp5 antagonizes type I interferon signaling by cleaving STAT2. J. Virol. 91, e00003-17.

  • 加载中

Article Metrics

Article views(3317) PDF downloads(13) Cited by()

Related
Proportional views

    Prediction of coronavirus 3C-like protease cleavage sites using machine-learning algorithms

      Corresponding author: Yousong Peng, pys2013@hnu.edu.cn
    • Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082, China

    Abstract: The coronavirus 3C-like (3CL) protease, a cysteine protease, plays an important role in viral infection and immune escape. However, there is still a lack of effective tools for determining the cleavage sites of the 3CL protease. This study systematically investigated the diversity of the cleavage sites of the coronavirus 3CL protease on the viral polyprotein, and found that the cleavage motif were highly conserved for viruses in the genera of Alphacoronavirus, Betacoronavirus and Gammacoronavirus. Strong residue preferences were observed at the neighboring positions of the cleavage sites. A random forest (RF) model was built to predict the cleavage sites of the coronavirus 3CL protease based on the representation of residues in cleavage motifs by amino acid indexes, and the model achieved an AUC of 0.96 in cross-validations. The RF model was further tested on an independent test dataset which were composed of cleavage sites on 99 proteins from multiple coronavirus hosts. It achieved an AUC of 0.95 and predicted correctly 80% of the cleavage sites. Then, 1,352 human proteins were predicted to be cleaved by the 3CL protease by the RF model. These proteins were enriched in several GO terms related to the cytoskeleton, such as the microtubule, actin and tubulin. Finally, a webserver named 3CLP was built to predict the cleavage sites of the coronavirus 3CL protease based on the RF model. Overall, the study provides an effective tool for identifying cleavage sites of the 3CL protease and provides insights into the molecular mechanism underlying the pathogenicity of coronaviruses.

    Reference (52) Relative (20)

    目录

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return