For best viewing of the website please use Mozilla Firefox or Google Chrome.
Citation: Xiao Ding, Luyao Qin, Jing Meng, Yousong Peng, Aiping Wu, Taijiao Jiang. Progress and Challenge in Computational Identification of Influenza Virus Reassortment [J].VIROLOGICA SINICA, 2021, 36(6) : 1273-1283.  http://dx.doi.org/10.1007/s12250-021-00392-w

Progress and Challenge in Computational Identification of Influenza Virus Reassortment

  • Corresponding author: Taijiao Jiang, taijiao@ibms.pumc.edu.cn, ORCID: 0000-0002-6071-0122
  • Received Date: 03 September 2020
    Accepted Date: 29 March 2021
    Published Date: 26 May 2021
    Available online: 01 December 2021
  • Genomic reassortment is an important evolutionary mechanism for influenza viruses. In this process, the novel viruses acquire new characteristics by the exchange of the intact gene segments among multiple influenza virus genomes, which may cause flu endemics and epidemics within or even across hosts. Due to the safety and ethical limitations of the experimental studies on influenza virus reassortment, numerous computational researches on the influenza virus reassortment have been done with the explosion of the influenza virus genomic data. A great amount of computational methods and bioinformatics databases were developed to facilitate the identification of influenza virus reassortments. In this review, we summarized the progress and challenge of the bioinformatics research on influenza virus reassortment, which can guide the researchers to investigate the influenza virus reassortment events reasonably and provide valuable insight to develop the related computational identification tools.


  • 加载中
    1. Ahasan MS, Subramaniam K, Sayler KA, Loeb JC, Popov VL, Lednicky JA, Wisely SM, Campos Krauer JM, Waltzek TB (2019) Molecular characterization of a novel reassortment Mammalian orthoreovirus type 2 isolated from a Florida white-tailed deer fawn. Virus Res 270: 197642
        doi: 10.1016/j.virusres.2019.197642

    2. Arenas M, Posada D (2010) The effect of recombination on the reconstruction of ancestral sequences. Genetics 184: 1133-1139
        doi: 10.1534/genetics.109.113423

    3. Bi Y, Chen Q, Wang Q, Chen J, Jin T, Wong G, Quan C, Liu J, Wu J, Yin R, Zhao L, Li M, Ding Z, Zou R, Xu W, Li H, Wang H, Tian K, Fu G, Huang Y, Shestopalov A, Li S, Xu B, Yu H, Luo T, Lu L, Xu X, Luo Y, Liu Y, Shi W, Liu D, Gao GF (2016) Genesis, evolution and prevalence of H5N6 avian influenza viruses in China. Cell Host Microbe 20: 810-821
        doi: 10.1016/j.chom.2016.10.022

    4. Blitvich BJ, Saiyasombat R, Dorman KS, Garcia-Rejon JE, Farfan-Ale JA, Loroño-Pino MA (2012) Sequence and phylogenetic data indicate that an orthobunyavirus recently detected in the Yucatan Peninsula of Mexico is a novel reassortant of Potosi and Cache Valley viruses. Arch Virol 157: 1199-1204
        doi: 10.1007/s00705-012-1279-x

    5. Boni MF, de Jong MD, van Doorn HR, Holmes EC (2010) Guidelines for identifying homologous recombination events in influenza A virus. PLoS ONE 5: e10434
        doi: 10.1371/journal.pone.0010434

    6. Butler D (2011) Fears grow over lab-bred flu. Nature 480: 421-422
        doi: 10.1038/480421a

    7. Chan JM, Carlsson G, Rabadan R (2013) Topology of viral evolution. Proc Natl Acad Sci USA 110: 18566-18571
        doi: 10.1073/pnas.1313480110

    8. de Silva UC, Tanaka H, Nakamura S, Goto N, Yasunaga T (2012) A comprehensive analysis of reassortment in influenza A virus. Biol Open 1: 385-390
        doi: 10.1242/bio.2012281

    9. Ding X, Yuan X, Mao L, Wu A, Jiang T (2020) FluReassort: a database for the study of genomic reassortments among influenza viruses. Brief Bioinform 21: 2126-2132
        doi: 10.1093/bib/bbz128

    10. Dong C, Ying L, Yuan D (2011) Detecting transmission and reassortment events for influenza A viruses with genotype profile method. Virol J 8: 395
        doi: 10.1186/1743-422X-8-395

    11. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368-376
        doi: 10.1007/BF01734359

    12. Fouchier RA (2015) Studies on influenza virus transmission between ferrets: the public health risks revisited. mBio 6: e02560-14

    13. Gao R, Cao B, Hu Y, Feng Z, Wang D, Hu W, Chen J, Jie Z, Qiu H, Xu K, Xu X, Lu H, Zhu W, Gao Z, Xiang N, Shen Y, He Z, Gu Y, Zhang Z, Yang Y, Zhao X, Zhou L, Li X, Zou S, Zhang Y, Li X, Yang L, Guo J, Dong J, Li Q, Dong L, Zhu Y, Bai T, Wang S, Hao P, Yang W, Zhang Y, Han J, Yu H, Li D, Gao GF, Wu G, Wang Y, Yuan Z, Shu Y (2013) Human infection with a novel avian-origin influenza A (H7N9) virus. N Engl J Med 368: 1888-1897
        doi: 10.1056/NEJMoa1304459

    14. Garten RJ, Davis CT, Russell CA, Shu B, Lindstrom S, Balish A, Sessions WM, Xu X, Skepner E, Deyde V, Okomo-Adhiambo M, Gubareva L, Barnes J, Smith CB, Emery SL, Hillman MJ, Rivailler P, Smagala J, de Graaf M, Burke DF, Fouchier RA, Pappas C, Alpuche-Aranda CM, Lopez-Gatell H, Olivera H, Lopez I, Myers CA, Faix D, Blair PJ, Yu C, Keene KM, Dotson PD Jr, Boxrud D, Sambol AR, Abid SH, St George K, Bannerman T, Moore AL, Stringer DJ, Blevins P, Demmler-Harrison GJ, Ginsberg M, Kriner P, Waterman S, Smole S, Guevara HF, Belongia EA, Clark PA, Beatrice ST, Donis R, Katz J, Finelli L, Bridges CB, Shaw M, Jernigan DB, Uyeki TM, Smith DJ, Klimov AI, Cox NJ (2009) Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science 325: 197-201
        doi: 10.1126/science.1176225

    15. Goloboff PA, Wilkinson M (2018) On defining a unique phylogenetic tree with homoplastic characters. Mol Phylogenet Evol 122: 95-101
        doi: 10.1016/j.ympev.2018.01.020

    16. Graybeal A (1998) Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 47: 9-17
        doi: 10.1080/106351598260996

    17. Karasin AI, Carman S, Olsen CW (2006) Identification of human H1N2 and human-swine reassortant H1N2 and H1N1 influenza A viruses among pigs in Ontario, Canada (2003 to 2005). J Clin Microbiol 44: 1123-1126
        doi: 10.1128/JCM.44.3.1123-1126.2006

    18. Karasin AI, Landgraf J, Swenson S, Erickson G, Goyal S, Woodruff M, Scherba G, Anderson G, Olsen CW (2002) Genetic characterization of H1N2 influenza A viruses isolated from pigs throughout the United States. J Clin Microbiol 40: 1073-1079
        doi: 10.1128/JCM.40.3.1073-1079.2002

    19. Karasin AI, Schutten MM, Cooper LA, Smith CB, Subbarao K, Anderson GA, Carman S, Olsen CW (2000) Genetic characterization of H3N2 influenza viruses isolated from pigs in North America, 1977-1999: evidence for wholly human and reassortant virus genotypes. Virus Res 68: 71-85
        doi: 10.1016/S0168-1702(00)00154-4

    20. Kawaoka Y, Krauss S, Webster RG (1989) Avian-to-human transmission of the PB1 gene of influenza A viruses in the 1957 and 1968 pandemics. J Virol 63: 4603-4608
        doi: 10.1128/jvi.63.11.4603-4608.1989

    21. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6: 29
        doi: 10.1186/1471-2148-6-29

    22. Khiabanian H, Trifonov V, Rabadan R (2009) Reassortment patterns in Swine influenza viruses. PLoS ONE 4: e7366
        doi: 10.1371/journal.pone.0007366

    23. Kilbourne ED (2006) Influenza pandemics of the 20th century. Emerg Infect Dis 12: 9-14
        doi: 10.3201/eid1201.051254

    24. Kingsford C, Nagarajan N, Salzberg SL (2009) 2009 Swine-origin influenza A (H1N1) resembles previous influenza isolates. PLoS ONE 4: e6402
        doi: 10.1371/journal.pone.0006402

    25. Lam TT, Wang J, Shen Y, Zhou B, Duan L, Cheung CL, Ma C, Lycett SJ, Leung CY, Chen X, Li L, Hong W, Chai Y, Zhou L, Liang H, Ou Z, Liu Y, Farooqui A, Kelvin DJ, Poon LL, Smith DK, Pybus OG, Leung GM, Shu Y, Webster RG, Webby RJ, Peiris JS, Rambaut A, Zhu H, Guan Y (2013) The genesis and source of the H7N9 influenza viruses causing human infections in China. Nature 502: 241-244
        doi: 10.1038/nature12515

    26. Levin S, Holmes EC, Ghedin E, Miller N, Taylor J, Bao Y, St George K, Grenfell BT, Salzberg SL, Fraser CM, Lipman DJ, Taubenberger JK (2005) Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol 3: e300
        doi: 10.1371/journal.pbio.0030300

    27. Li YW, Yu L, Zhang YP (2007) "Long-branch Attraction" artifact in phylogenetic reconstruction. Yi Chuan 29: 659-667
        doi: 10.1360/yc-007-0659

    28. Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, Ingersoll R, Sheppard HW, Ray SC (1999) Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73: 152-160
        doi: 10.1128/JVI.73.1.152-160.1999

    29. Lu G, Rowley T, Garten R, Donis RO (2007) FluGenome: a web tool for genotyping influenza A virus. Nucleic Acids Res 35: W275-279
        doi: 10.1093/nar/gkm365

    30. Lun AT, Wong JW, Downard KM (2012) FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry. BMC Bioinformatics 13: 208
        doi: 10.1186/1471-2105-13-208

    31. Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics 16: 562-563
        doi: 10.1093/bioinformatics/16.6.562

    32. McGuire G, Wright F, Prentice MJ (1997) A graphical method for detecting recombination in phylogenetic data sets. Mol Biol Evol 14: 1125-1131
        doi: 10.1093/oxfordjournals.molbev.a025722

    33. Mena I, Nelson MI, Quezada-Monroy F, Dutta J, Cortes-Fernández R, Lara-Puente JH, Castro-Peralta F, Cunha LF, Trovão NS, Lozano-Dubernard B, Rambaut A, van Bakel H, García-Sastre A (2016) Origins of the 2009 H1N1 influenza pandemic in swine in Mexico. Elife 5: e16777
        doi: 10.7554/eLife.16777

    34. Nagarajan N, Kingsford C (2008) Uncovering genomic reassortments among influenza strains by enumerating maximal bicliques. Paper presented at the 2008 IEEE international conference on bioinformatics and biomedicine. https://doi.org/10.1109/BIBM.2008.78

    35. Nagarajan N, Kingsford C (2011) GiRaF: robust, computational identification of influenza reassortments via graph mining. Nucleic Acids Res 39: e34-e34
        doi: 10.1093/nar/gkq1232

    36. Nakajima K, Nobusawa E, Nagy A, Nakajima S (2005) Accumulation of amino acid substitutions promotes irreversible structural changes in the hemagglutinin of human influenza AH3 virus during evolution. J Virol 79: 6472-6477
        doi: 10.1128/JVI.79.10.6472-6477.2005

    37. Olsen CW, Karasin AI, Carman S, Li Y, Bastien N, Ojkic D, Alves D, Charbonneau G, Henning BM, Low DE, Burton L, Broukhanski G (2006) Triple reassortant H3N2 influenza A viruses, Canada, 2005. Emerg Infect Dis 12: 1132-1135
        doi: 10.3201/eid1207.060268

    38. Prosperi MC, Ciccozzi M, Fanti I, Saladini F, Pecorari M, Borghi V, Di Giambenedetto S, Bruzzone B, Capetti A, Vivarelli A, Rusconi S, Re MC, Gismondo MR, Sighinolfi L, Gray RR, Salemi M, Zazzi M, De Luca A (2011) A novel methodology for large-scale phylogeny partition. Nat Commun 2: 321
        doi: 10.1038/ncomms1325

    39. Rabadan R, Levine AJ, Krasnitz M (2008) Non-random reassortment in human influenza A viruses. Influenza Other Respir Viruses 2: 9-22
        doi: 10.1111/j.1750-2659.2007.00030.x

    40. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406-425
        doi: 10.1093/oxfordjournals.molbev.a040454

    41. Salzberg SL, Kingsford C, Cattoli G, Spiro DJ, Janies DA, Aly MM, Brown IH, Couacy-Hymann E, De Mia GM, Dung do H, Guercio A, Joannis T, Maken Ali AS, Osmani A, Padalino I, Saad MD, Savic V, Sengamalay NA, Yingst S, Zaborsky J, Zorman-Rojs O, Ghedin E, Capua I (2007) Genome analysis linking recent European and African influenza (H5N1) viruses. Emerg Infect Dis 13: 713-718
        doi: 10.3201/eid1305.070013

    42. Sawyer S (1989) Statistical tests for detecting gene conversion. Mol Biol Evol 6: 526-538

    43. Schäfer JR, Kawaoka Y, Bean WJ, Süss J, Senne D, Webster RG (1993) Origin of the pandemic 1957 H2 influenza A virus and the persistence of its possible progenitors in the avian reservoir. Virology 194: 781-788
        doi: 10.1006/viro.1993.1319

    44. Smith GJ, Donis RO (2015) Nomenclature updates resulting from the evolution of avian influenza A(H5) virus clades 2.1.3.2a, 2.2.1, and 2.3.4 during 2013-2014. Influenza Other Respir Viruses 9: 271-276
        doi: 10.1111/irv.12324

    45. Smith GJ, Vijaykrishna D, Bahl J, Lycett SJ, Worobey M, Pybus OG, Ma SK, Cheung CL, Raghwani J, Bhatt S, Peiris JS, Guan Y, Rambaut A (2009a) Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459: 1122-1125
        doi: 10.1038/nature08182

    46. Smith GJD, Bahl J, Vijaykrishna D, Zhang J, Poon LLM, Chen H, Webster RG, Peiris JSM, Guan Y (2009b) Dating the emergence of pandemic influenza viruses. Proc Natl Acad Sci 106: 11709-11712
        doi: 10.1073/pnas.0904991106

    47. Smith JM (1992) Analyzing the mosaic structure of genes. J Mol Evol 34: 126-129
        doi: 10.1007/BF00182389

    48. Sourdis J, Nei M (1988) Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree. Mol Biol Evol 5: 298-311

    49. Su S, Fu X, Li G, Kerlin F, Veit M (2017) Novel Influenza D virus: epidemiology, pathology, evolution and biological characteristics. Virulence 8: 1580-1591
        doi: 10.1080/21505594.2017.1365216

    50. Suzuki Y (2010) A phylogenetic approach to detecting reassortments in viruses with segmented genomes. Gene 464: 11-16
        doi: 10.1016/j.gene.2010.05.002

    51. Svinti V, Cotton JA, McInerney JO (2013) New approaches for unravelling reassortment pathways. BMC Evol Biol 13: 1
        doi: 10.1186/1471-2148-13-1

    52. Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol 12: 823-833

    53. van Ravenzwaaij D, Cassey P, Brown SD (2018) A simple introduction to Markov Chain Monte-Carlo sampling. Psychon Bull Rev 25: 143-154
        doi: 10.3758/s13423-016-1015-8

    54. Vijaykrishna D, Poon LL, Zhu HC, Ma SK, Li OT, Cheung CL, Smith GJ, Peiris JS, Guan Y (2010) Reassortment of pandemic H1N1/2009 influenza A virus in swine. Science 328: 1529
        doi: 10.1126/science.1189132

    55. Villa M, Lassig M (2017) Fitness cost of reassortment in human influenza. PLoS Pathog 13: e1006685
        doi: 10.1371/journal.ppat.1006685

    56. Virk RK, Jayakumar J, Mendenhall IH, Moorthy M, Lam P, Linster M, Lim J, Lin C, Oon LLE, Lee HK, Koay ESC, Vijaykrishna D, Smith GJD, Su YCF (2020) Divergent evolutionary trajectories of influenza B viruses underlie their contemporaneous epidemic activity. Proc Natl Acad Sci USA 117: 619-628
        doi: 10.1073/pnas.1916585116

    57. Wan XF, Wu X, Lin G, Holton SB, Desmone RA, Shyu CR, Guan Y, Emch ME (2007a) Computational identification of reassortments in avian influenza viruses. Avian Dis 51: 434-439
        doi: 10.1637/7625-042706R1.1

    58. Wan XF, Chen G, Luo F, Emch M, Donis R (2007b) A quantitative genotype algorithm reflecting H5N1 Avian influenza niches. Bioinformatics 23: 2368-2375
        doi: 10.1093/bioinformatics/btm354

    59. Wan XF, Ozden M, Lin G (2008) Ubiquitous reassortments in influenza A viruses. J Bioinform Comput Biol 6: 981-999
        doi: 10.1142/S0219720008003813

    60. WHO/OIE/FAO H5N1 Evolution Working Group (2008) Toward a unified nomenclature system for highly pathogenic avian influenza virus (H5N1). Emerg Infect Dis 14: e1

    61. WHO/OIE/FAO H5N1 Evolution Working Group (2009) Continuing progress towards a unified nomenclature for the highly pathogenic H5N1 avian influenza viruses: divergence of clade 2.2 viruses. Influenza Other Respir Viruses 3: 59-62
        doi: 10.1111/j.1750-2659.2009.00078.x

    62. WHO/OIE/FAO H5N1 Evolution Working Group (2012) Continued evolution of highly pathogenic avian influenza A (H5N1): updated nomenclature. Influenza Other Respir Viruses 6: 1-5
        doi: 10.1111/j.1750-2659.2011.00298.x

    63. Wu A, Su C, Wang D, Peng Y, Liu M, Hua S, Li T, Gao GF, Tang H, Chen J, Liu X, Shu Y, Peng D, Jiang T (2013) Sequential reassortments underlie diverse influenza H7N9 genotypes in China. Cell Host Microbe 14: 446-452
        doi: 10.1016/j.chom.2013.09.001

    64. Xing G, Gu J, Yan L, Lei J, Lai A, Su S, Zhou J (2016) Human infections by avian influenza virus H5N6: Increasing risk by dynamic reassortment? Infect Genet Evol 42: 46-48
        doi: 10.1016/j.meegid.2016.04.009

    65. Yin R, Zhou X, Rashid S, Kwoh CK (2020) HopPER: an adaptive model for probability estimation of influenza reassortment through host prediction. BMC Med Genomics 13: 9
        doi: 10.1186/s12920-019-0656-7

    66. Yurovsky A, Moret BME (2011) FluReF, an automated flu virus reassortment finder based on phylogenetic trees. BMC Genomics 12: S3
        doi: 10.1186/1471-2164-12-S2-S3

  • 加载中

Figures(2) / Tables(1)

Article Metrics

Article views(1124) PDF downloads(4) Cited by()

Related
Proportional views
    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Progress and Challenge in Computational Identification of Influenza Virus Reassortment

      Corresponding author: Taijiao Jiang, taijiao@ibms.pumc.edu.cn
    • 1. Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
    • 2. College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
    • 3. Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
    • 4. Suzhou Institute of Systems Medicine, Suzhou, Jiangsu 215123, China

    Abstract: 

    Genomic reassortment is an important evolutionary mechanism for influenza viruses. In this process, the novel viruses acquire new characteristics by the exchange of the intact gene segments among multiple influenza virus genomes, which may cause flu endemics and epidemics within or even across hosts. Due to the safety and ethical limitations of the experimental studies on influenza virus reassortment, numerous computational researches on the influenza virus reassortment have been done with the explosion of the influenza virus genomic data. A great amount of computational methods and bioinformatics databases were developed to facilitate the identification of influenza virus reassortments. In this review, we summarized the progress and challenge of the bioinformatics research on influenza virus reassortment, which can guide the researchers to investigate the influenza virus reassortment events reasonably and provide valuable insight to develop the related computational identification tools.

    • Influenza virus is a negative-sense single stranded RNA virus which belongs to the family of viruses known as Orthomyxoviridae. According to the ICTV (https://talk.ictvonline.org/taxonomy/), there are four types of influenza viruses: A, B, C and D. Human can be infected only with influenza A, B and C viruses, while influenza D virus primarily affects cattle (Su et al. 2017). Influenza A and B viruses are responsible for the worldwide pandemics and the seasonal epidemics which can cause tremendous loss of human lives and social economy. The World Health Organization (WHO) estimated that approximately 3 to 5 million cases of severe illness resulted from seasonal influenza epidemics, and that about 290, 000 to 650, 000 people died of the respiratory illness worldwide (https://www.who.int/en/news-room/fact-sheets/detail/influenza-(seasonal)).

    • As influenza A and B viruses are the primary threats to humanity, a lot of studies have been done on these two viruses. The genomes of influenza A and B viruses both consist of eight RNA negative-sense segments, including six internal genes (PB1, PB2, PA, NP, M, NS) and two external genes (HA, NA), which encode more than ten proteins in total. The segmented nature of the viral genome will lead different viruses to exchange the intact gene segments when they co-infect the same host cell. This important evolutionary mechanism of influenza viruses is called reassortment (Fig. 1). Novel viruses with new characteristics can be created by reassortment, which may have great pandemic potential in humans. In fact, all the viruses with multiple genomic segments may reassort, like bunyaviruses and reoviruses. However, most reports and researches of the genomic reassortment are about influenza virus (Blitvich et al. 2012; Ahasan et al. 2019). Among the four previous pandemics, three have been ascertained to be associated with reassortments. The 1957 Asia Flu pan-demic was triggered by the new H2N2 subtype, a reas-sortant between an avian influenza virus and the H1N1 virus from the previous 1918 pandemic (Kawaoka et al. 1989; Schäfer et al. 1993; Kilbourne 2006; Smith et al. 2009b). Later, the 1968 Hong Kong Flu pandemic was caused by the H3N2 reassortant which combined six genes (PB2, PA, NA, NP, NS and M) of the H2N2 virus from the 1957 pandemic and two genes (PB1 and HA) from a novel avian influenza virus (Kawaoka et al. 1989; Kilbourne 2006; Smith et al. 2009b). Recently, in the 2009 flu pan-demic, the novel H1N1 virus was a triple reassortant of an avian influenza virus source (the PB2 and PA genes), the human seasonal H3N2 influenza virus (the PB1 gene) and the swine influenza viruses (the HA, NP, NS, NA and M genes) (Garten et al. 2009; Smith et al. 2009a; Mena et al. 2016). In addition, influenza virus reassortants of some avian sources may spread to humans by overcoming a series of transmission barriers (Gao et al. 2013; Xing et al. 2016). For example, a novel H7N9 virus in 2013 that caused an endemic in China was created by the genomic combination of the original H7N9 virus and the H9N2 virus (Lam et al. 2013; Wu et al. 2013).

      Figure 1.  Genomic structure of influenza viruses and the principle of influenza virus reassortment. A The genome of influenza viruses contains eight RNA negative-sense segments including six internal genes (PB2, PB1, PA, NP, M and NS) and two external genes (HA and NA). More than ten proteins correlated with viral structure and function are encoded by these eight segments. B The principle of influenza virus reassortment. The reassortants can be produced by exchanging the gene segments when different influenza viruses co-infect the same host cell, which may cause influenza pandemics in different animals.

    • As the reassortment underlies the important evolution for influenza viruses, increasing experimental efforts were made to study the reassortments. However, such research is challenging due to the safety and ethical issues (Butler 2011; Fouchier 2015). With the advent of the big data era, more and more influenza virus genomic data are generated. Additionally, the improvements were made on bioinfor-matics by the development of various subjects like math-ematics, physics and biology, etc., leading the computational methods to be one of the indispensable tools for studying the influenza virus reassortment. As shown in Fig. 2, the key point of the data-driven computational identification of reassortments is to recognize the hetero-geneity of multiple gene segments based on the genomic data; then the integrative analyses with related epidemio-logic information are made to infer the reassortment events. Many tools that were designed to detect virus genomic recombination have been developed, which can also identify the virus reassortment, such as Simplot (Lole et al. 1999), Recombination Detection Program (RDP) (Martin and Rybicki 2000), GENECONV (Sawyer 1989), DSS (Difference of Sums of Squares) (McGuire et al. 1997), MAXCHI (Maximum Chi-Square method) (Smith 1992), and so on. Currently, the specific bioinformatics identification methods are mainly divided into two types as summarized in Table 1, the phylogenetic tree-based methods and the phylogenetic tree-independent methods. Here, a comprehensive review of influenza virus reassortment identification is given in this section. By summarizing the existing computational approaches in reassortment identification, we believe this work can serve as a reasonable guide to identify and infer reassortments.

      Figure 2.  The framework of reassortment identification with bioinfor-matics methods. Reassortants can be inferred by recognizing the heterogeneity of multiple segments based on the genomic data first, as shown in the left panel. Then the heterogeneity and related epidemiologic information, such as sampled region, host and sampled date will be combined to identify the reassortment events further.

      Classification Method Principle Accessibility Compiling environment Using experience Required data Limitation References
      Phylogenetic tree-based FluReF A bottom-up research Source code Written by C ++ on Linux system The test dataset containing 1050 strains spent 10 s in total Complete genome of influenza virus High computational complexity Yurovsky and Moret (2011)
      Villa et al. Core mutations Source code Written by both C++ and python on Linux system The mock test dataset containing 7477 strains with the 290 bp simulated genomic sequence took more than 5 days in total Complete HA and NA sequences Limited to the reassortment identification of the HA and NA segments Villa and Lassig (2017)
      FluResort Identities of predicted protein Web invalid Written with ANSI/ISO standard C ++ on both Windows or Linux systems Not available Viral protein sequences and mass spectral data of these proteins Limited to the HA, NA, NP and M1 proteins, and the mass spectral data with high-resolution was required Lun et al. (2012)
      Nagarajan et al. Enumerating maximal bicliques Not supported Not supported Not available Genomic segments of influenza virus High computational complexity Nagarajan and Kingsford (2008)
      GiRaF Graph theory Source code Written by C ++ on Linux, Mac or Windows systems The test dataset containing 35 strains took about 5 s in total Complete genome of influenza virus High computational complexity Nagarajan and Kingsford (2011)
      Suzuki et al. Topologies of quartet trees Not supported Not supported Not available Complete genome of influenza virus High computational complexity Suzuki (2010)
      Dong et al. Genotype profile IVEE soft Written by both C ++ and python on Windows system Each complete genome took 3 about seconds Complete genome and the genotype information It had limitations when inferring intra-subtype reassortments within the same host Dong et al. 2011)
      Phylogenetic tree independent Wan et al. Network module; MST Not supported Not supported Not available Genomic segments of influenza virus Not suitable for short sequences Wan et al.(2007a, 2007b, 2008)
      Rabadan et al. Hamming distance Not supported Not supported Not available Genomic segments of influenza virus The assumption of equal mutation rate among segments may not always hold Rabadan et al. (2008)
      Silva et al. Genetic distance Not supported Written by Ruby script using bioruby on a Debian Linux server system Not available Genomic segments of influenza virus The performance of this algorithm will be influenced by the sample bias significantly de Silva et al. (2012)
      HoPER Host tropism Not supported Not supported Not available The full-length amino acid sequences of all genomic segments It is difficult to identify the reassortments between different hosts Yin et al. (2020)

      Table 1.  A brief review of the reassortment identification methods for the influenza viruses.

    • Currently, the phylogenetic incongruences of the relation-ships among eight gene segments in influenza viruses were identified to infer the reassortment events manually in most previous studies (Arenas and Posada 2010; Boni et al. 2010). In this process, the phylogenetic trees for each gene segment were first constructed with different methods such as neighborhood joining (NJ) (Saitou and Nei 1987), maximum likelihood (ML) (Felsenstein 1981) and maxi-mum parsimony (MP) (Sourdis and Nei 1988) etc., or based on the molecular clock analysis (Takezaki et al. 1995); the reconstructed phylogenetic trees were then partitioned into multiple clades manually based on some criteria like the bootstrap of ancestral node and the diver-gence time for different phylogenetic clades etc.; finally, the reassortment events would be recognized with the integration of topological incongruence and related epi-demiologic information. Lots of achievements with this manual identification approach were made on different subtype influenza viruses. For example, our previous work combined the molecular clock analysis and the viruses' epidemiological data to demonstrate that at least two sequential reassortments of the novel H7N9 viruses were took place with the distinct H9N2 viruses. The computa-tional results indicated that the first reassortment likely occurred in wild birds while the second occurred in domestic birds in east China in early 2012 (Wu et al. 2013). Recently, the genetic origin and evolution of H5N6 viruses were explored also with this method by Bi et al. (2016). In their work, a comprehensive phylogenetic analysis of eight gene segments with ML method coupling with the epi-demiological data was performed, which revealed that H5N6 arose from the reassortment of H5 and H6N6 viru-ses, and that the internal genes were constantly reassorted among low-pathogenic avian influenza viruses. In addition, the reassortment events of pandemic H1N1/2009 virus were successfully identified in Vijaykrishna's (Vijaykr-ishna et al. 2010) and Smith's work (Smith et al. 2009a) with similar methods.

      Despite the significant achievements, the feasibility and validity of this identification method was still limited by the manual operation. Particularly, enormous amounts of genomic data on influenza viruses made the manual reas-sortment identification not available. Hence, automatic comparison among phylogenetic trees based on different gene segments was developed to improve the algorithm feasibility. FluReF is a fully automated reassortment finder which was proposed by Yurovsky and Moret (2011). The reassortment events can be identified by a bottom-up search for candidate clades on both the whole genome-based and segment-based phylogenetic trees, which sepa-rates the phylogenetic clades containing the reassortants from the other clades. As demonstrated in this work, FluReF could find reassortments effectively even for geo-graphically and temporarily expanded datasets. Recently, Villa et al. successfully inferred the reassortment events with the self-defined core mutations in genealogical trees in the investigation of the fitness cost in human influenza virus reassortment (Villa and Lassig 2017). Apart from the epidemiologic and mutational information, the biophysical data were also used. Lun et al. proposed a set of automatic reassortment identification algorithms, FluShuffle and FluResort (Lun et al. 2012). In FluShuffle, PepGen was first employed to generate theoretical peptide monoisotopic masses based on the influenza viral protein sequences. Then a Bayesian Markov Chain Monte Carlo (MCMC) approach (van Ravenzwaaij et al. 2018) was implemented to assign a combination of protein accessions to a single mass spectrum. Next, a Gibbs sampling algorithm was employed to estimate the marginal posterior probability for each known protein accession. Finally, accessions that match more peaks or match uniquely to a peak were selected with a higher probability at each step in the Gibbs sampler. The different combinations of influenza viral protein identities had been established through FluShuffle, which were then mapped onto the phylogenetic trees using FluResort. A statistical model was developed in FluResort to calculate the likelihood of reassortments, which was quantified using Z-score, a standardized value of the weighted mean patristic distance of each identity across different trees. This set of algorithms were evaluated with both the experimental and simulated mass spectral data obtained from the whole virus digests. For the experimental data, the algorithms were first tested with mass spectral data obtained from the digestion of a H1N1 strain from the reassortment of a 2009 H1N1 pandemic strain (A/Cali-fornia/07/2009) and a lab-modified H1N1 strain (A/Puerto Rico/08/1934). The seasonal influenza A and B viruses were also analyzed with these two algorithms. In addition, FluShuffle and FluResort algorithms were tested with the simulated mass spectral data. As indicated in this paper, these two algorithms accurately identified the natural reassortment of the H1N1 vaccine strain with the identifi-cation of each viral protein. Additionally, no reassortment events were recognized in the seasonal strain analyses. Although this set of algorithms can identify the reassort-ments accurately and rapidly, the mass spectral data with high-resolution are required.

      Additionally, the graph theory was employed when many efforts were made on the automatic comparison of the phylogenetic trees. A framework based on the enu-merating maximal bicliques was first proposed to detect the reassortment events by Nagarajan and Kingsford (2008). Then, a fully automatic reassortment identification algo-rithm, GiRaF (Graph-incompatibility based Reassortment Finder) (Nagarajan and Kingsford 2011), was developed on the basis of this framework. In GiRaF, large groups of Markov chain Monte Carlo (MCMC)-sampled trees are searched for incompatible splitting by a fast biclique enu-meration algorithm coupled with several statistical tests to identify the differential phylogenetic topology. Then, the reassortment events are recognized with the combination of the differential from multiple gene segments. Three influ-enza virus datasets, including 156 human influenza A (H3N2) isolates (Levin et al. 2005), 35 avian influenza A (H5N1) isolates (Salzberg et al. 2007) and 140 swine influenza isolates (Kingsford et al. 2009) were evaluated with GiRaF, which had been analyzed in previous studies relying on the manual reassortment identification method. Not only the known reassortment events in these three influenza virus populations were detected accurately, but also several unreported reassortments in H5N1 and swine influenza isolates were identified. In addition, GiRaF can identify the reassortment events with high sensitivity as well as high precision for the simulated reassortment datasets. Recently, the reassortment events within the Victoria and Yamagata lineages were recognized by GiRaF when researchers exploited the evolutionary trajectories of influenza B viruses (Virk et al. 2020). A method based on quartet trees was proposed by Suzuki to detect reassort-ments, which can be used even when the constructed phylogenetic trees were unreliable (Suzuki 2010). In this method, a quartet of strains were examined at a time, and the corresponding phylogenetic tree was constructed for each gene segment. Then, the topologies of all quartet trees supported with a statistical significance were compared among segments. The reassortment events could be rec-ognized according to the pattern of topological difference among segments. Notably, although the reassortment events can be identified accurately, the computation com-plexity of the graph theory-based algorithm is tremendous, as the traversal of the phylogenetic tree with a part of strains will cost huge computing resources and time.

      Obviously, the validity of the identified reassortment events with the phylogenetic tree-based methods is dependent on the reliability of the constructed trees. However, the false phylogenetic incongruence can be caused by the inaccurate construction of phylogenetic trees, such as inappropriate selection of evolution model (Keane et al. 2006), high level of homoplasy (Goloboff and Wilkinson 2018), long branch attraction (Li et al. 2007), insufficient sampling (Graybeal 1998), unreasonable data partition (Prosperi et al. 2011) and so on. To solve this problem, Svinti et al. developed two robust approaches to detect reassortments, namely MLreassort and Breassort, which can distinguish the reassortment-caused topological inconsistency from phylogenetic errors-caused topological inconsistency (Svinti et al. 2013). MLreassort is based on a maximum likelihood framework while Breassort is a Bayesian based approach. High precision and sensitivity were achieved when these two approaches detected reas-sortment events on both the small real data of influenza A sequences and the simulated data. However, the perfor-mance of these two approaches was not satisfactory when they analyzed the large datasets.

      In conclusion, phylogenetic tree-dependent methods rely on the assumption that reassortants are distributed among the different clades of phylogenetic trees. These approaches are generally feasible to identify reassortment events across inter-subtypes of the influenza virus. They are accurate and sensitive to identify reassortment events even if the reassortant has a complicated evolutionary history. However, the reliability of phylogenetic tree con-struction is usually unsatisfactory when there is extremely incomplete data, and the low bootstrap probabilities and poor topology may lead to the obscure evidences for reassortment. Although some efforts were made to solve this problem, the feasibility of these methods is still limited by the computational cost from large scale data.

    • Ambiguous quantified benchmark for partitioning the phylogenetic clades and the extreme dependence of phy-logenetic reconstruction led more efforts on the identifi-cation of reassortment events without the phylogenetic trees.

      The sequence distance between strains was commonly used in phylogenetic tree-independent methods. The Complete Composition Vector (CCV) was first employed to recognize reassortants by Wan et al. (Wan et al. 2007a, 2007b, 2008). In these algorithms, the calculated CCVs among different virus strains are core parameters, which are then used to assign diverse genotypes for related strains by different clustering methods. In their first algo-rithm (Wan et al. 2007a), the reassortment events can be identified by the genotypes which are assigned using the network modules coupled with the CCVs. As demonstrated in the study, this algorithm could infer the reassortment events with a large number of sequences accurately and rapidly. After that, the clustering method was improved by employing the minimum spanning tree (MST) and the Hierarchical Bayesian Modeling instead of the networks (Wan et al. 2008). As indicated in the evaluations, the CCV-based algorithms could successfully identify the reassortment events of the NP and PB2 genes for the H5N1 avian influenza virus. Another two algorithms were also developed with the sequence distance. Rabadan et al. constructed a statistical framework to estimate the likeli-hood of reassortments with the hamming distance in the third codon position for all sequences (Rabadan et al. 2008). The detected reassortment events of H3N2 strains with this algorithm were similar to the previous study. A reassortment identification algorithm was developed by Silva et al. based on the r-neighbourhood which are determined only by the genetic distances among sequences (de Silva et al. 2012). For each sequence, the set of r closest strains is defined as the r-neighbourhood for that sequence. 35 candidate reassortants of high quality were found by the algorithm with the large data sets of influenza virus whole genome nucleotide sequences. In addition, Chan et al. proposed that the pervasive reassortment in influenza virus can be detected with persistent homology (Chan et al. 2013).

      Apart from the sequence distance, the other features were also employed to identify the reassortment events without the phylogenetic tree. Recently, a novel compu-tational algorithm HopPER (Yin et al. 2020) was proposed by Yin et al., which inferred the reassortment events by the random forest based on the prediction of the host tropism. 147 features generated from seven physicochemical prop-erties of amino acids (i.e. polarity, net charge, hydropho-bicity, normalized van der waals volume, solvent accessibility, polarizability and secondary structure) were used to infer the host tropism. For the full length and non-redundant amino acid sequences of different proteins, 280 out of 318 candidate reassortants were successfully iden-tified regardless of the completeness of the genomes. In addition, HopPER was more robust than the alternative reassortment identification algorithms (Karasin et al. 2000, 2002, 2006; Olsen et al. 2006; Khiabanian et al. 2009; Kingsford et al. 2009; Nagarajan and Kingsford 2011; de Silva et al. 2012).

    • In addition to the efforts which have working on the computational identification of influenza virus reassort-ments, several database tools were also developed to facilitate the computational related analysis for influenza virus reassortments.

      Due to the diversity of influenza viruses that can reflect the possible reassortment events, the appropriate assigned genotypes are essential to identify and describe the reas-sortments of influenza viruses. FluGenome (http://www.flugenome.org/) was constructed by Lu et al., which enabled users to recognize the reassortment events by their developed genotype nomenclature (Lu et al. 2007). The available sequences for eight gene segments were retrieved from NCBI Influenza Virus Resource first; then the downloaded sequences were clustered into several lineages following criteria, in order to assign the strains into the nomenclatural genotypes. FluGenome provided three levels of information that included the segments (assigned lineage, strain name, segment, serotype, host, country, year, GenBank accession number, nucleotide sequence and sequence length), genomes (assigned genotype and acces-sion numbers of individual gene segments) and genotypes (all genotypes and the genomes assigned into each geno-type). With the analysis of more than 2000 complete viral genomes, 156 unique genotypes were revealed in Flu-Genome. Based on the developed genotypes, the reassort-ment events can be further detected by combining the epidemiologic information of the corresponding strains in the database. Unfortunately, FluGenome is no longer sup-ported, which is a grievous loss to the study of the influenza virus reassortments.

      As the increasing number of studies have attempted to identify the reassortment computationally, a systematic, comprehensive online repository of reassortment events for influenza viruses is needed urgently. Our previous work developed FluReassort (https://www.jianglab.tech/FluReassort) (Ding et al. 2020), the first database that included all reported and published reassortment events. To facili-tate the investigation of the reassortment preference on the gene segment or the subtype of viruses, FluReassort also supported the reconstruction of reassortment networks for different subtypes of influenza viruses, which was based on the reassortment events retrieved from the extensive liter-ature. Total 3513 research papers published before July 2018 were retrieved from the PubMed database with a keyword combination of "subtype and (reassortment or reassortant or evolution or origin)", where "subtype" denotes the specific subtype of influenza virus such as H1N1. To provide the high quality reassortment events comprehensively, the reassortment events which were compiled manually from the given retrieved literature would be recruited in FluReassort only if they had both the phylogenetic analysis and clear reassortant and reassort-ment donor strains. As a result, 204 reassortment events were compiled based on 535 strains of 56 subtypes isolated from 37 different countries, which provides the metadata about the reassortant strain and reassortment donor strain, the inferred date, geographic region and host for reassort-ment, phylogenetic analysis methods and the PubMed IDs (PMIDs) of the corresponding references. FluReassort offered the most comprehensive information about reas-sortment events for influenza viruses in a structured way. The retrieval and exposition of the compiled reassortment events are implemented on the 'Home' page, while the 'Phylogenetic Analysis' and 'Reassortment Network' pages are designed to analyze the reassortment events. FluReassort has conducted a thorough compilation of the reassortment events for influenza viruses for the first time. The information provided by FluReassort can serve as a guide to future research, and facilitate data-driven explo-ration of the reassortments.

    • Both influenza virus genomic data and influenza virus-re-lated data have been exploded in the current big data era. Therefore, one of the challenges for researchers is how to process the influenza virus-related big data to improve the reliability of the influenza virus reassortment identification. Reasonable data preprocessing can bring several advan-tages to the computational reassortment identification. The volume of the data is decreased by removing the redundant information, which can reduce the computation time and the hardware requirements. For example, as indicated above, most phylogenetic tree-based methods are unavail-able for enormous amount of viruses. To address this sit-uation, a set of rational criteria, which eliminate the redundant strains sharing a close phylogenetic relationship, are needed urgently. An approach is to process the data by integrating the epidemiologic information and the pairwise sequence distance between strains. Non-redundant data also decreases the noise in the subsequent analysis with the identified reassortment events. For example, multiple influenza virus reassortants were generated in a reassort-ment event once; however, they were assigned with dif-ferent name or ID in a dataset. For the un-preprocessed data, this reassortment event might be regarded as different events in the computation process, which then influenced the subsequent analysis, such as the reassortment network construction. Lastly, the currently available influenza virus-related data are generated by high-throughput sequencing technologies, which have systematic sequenc-ing errors. Elimination of these errors will improve the accuracy of the reassortment event identification.

      The integration of various types of influenza virus data is crucial to improve the reliability of the inferred reas-sortment events. As reviewed above, an effort was made by Yin et al. (2020), which identified the reassortments with features generated from seven physicochemical properties of amino acids. Although this algorithm has several limi-tations, an insight was provided in terms of the data-driven computational identification of influenza virus reassort-ments. Such as the previous studies have indicated that the protein structure can be influenced by the evolution of viruses (Nakajima et al. 2005). Thus, we infer that the combination between protein structure information and evolutionary profiles of influenza viruses can identify the reassortment events more sensitively and accurately. For example, similar to the Villa et al.'s work (Villa and Lassig 2017), an approach is to evaluate the mutations generated in the evolutionary pathways based on the function of different regions of viral proteins, which can be further employed to identify the reassortants from the phylogenetic trees. In summary, the reasonably processed data will guarantee accurate identification of the reassortment events.

    • As shown in Table 1, the update and even the download are no longer supported for most of the algorithms, which mainly results from the limited users of these algorithms. In our opinion, a reassortment identification algorithm should be developed aimed at the researchers from differ-ent fields. The usage for the researchers with non-computer background will be limited by the low practical and low validity of the algorithms. For example, the GiRaF algo-rithm not only needs to be compiled from the source code, but also requires to install a series of dependent libraries. Although the software developed by Dong et al. (2011) has a user-friendly interface, each identification process requires manual operation, which greatly reduces the effi-ciency of the software. Therefore, we suggest that the future reassortment identification algorithms encapsulate an easy-to-use pipeline and a user-friendly interface.

      The other key difficulty to improve the identification methods is to define the rational thresholds to evaluate the heterogeneity of multiple gene segments between strains. The thresholds are hard to develop due to the complexity and diversity of the analyzed datasets. In addition, auto-matic estimation of cutoffs based on the analyzed data is also worth trying. For instance, the sequence distance cutoff that is used to recognize the heterogeneity of the intra-subtypes influenza viruses should be distinct from that for the inter-subtypes influenza viruses. Additionally, the development of the computational identification methods can focus on the self-adjusting estimation for the related thresholds. As reviewed above, most reassortment identi-fication methods are recursive, which leads the data structure to changing in each process. Therefore, the opti-mal adjustment for the cutoff is required for these algorithms.

    • Currently, there is lack of a benchmarking dataset to evaluate the performance of the developed reassortment identification algorithms. The mostly appropriate way is to generate a golden dataset that provides the reassortants, corresponding parental strains and the reassortment geno-mic segments confirmed by experimental methods in the laboratories. However, this work is impeded by the safety and ethical issues. An alternative computational way is to infer precise reassortment event that contain complete reassortment information by reliable data and appropriate methods. For example, the reasonable and effective clas-sification standard of the H5N1 viruses' HA segment can be used, which was proposed by World Health Organiza-tion (WHO), World Organization for Animal Health (OIE), and Food and Agriculture Organization (FAO) (WHO 2008, 2009, 2012; Smith and Donis 2015). In addition, several previous identified reassortment events are credible and include complete information, which are considered as the benchmarks in studies (Lam et al. 2013; Wu et al. 2013). In short, a benchmarking dataset is urgently needed for the development of the computational identification of the reassortments.

    • In this work, the computational identifications on influenza virus reassortment, which included the identification methods and the related database tools, were summarized comprehensively. In addition, the challenge and future prospects in computational identification of influenza virus reassortments were also illuminated. The reassortment identification methods were generally divided into two categories in terms of the dependence on phylogenetic tree. The phylogenetic tree-based methods recognize the reas-sortment events by investigating the structure incongru-ences among the phylogenetic trees of eight gene segments. Among these methods, manual comparison of the topolo-gies of phylogenetic trees coupled with epidemiologic information was the most commonly used. As the manual identifications were empirical and subjective, some auto-matic reassortment inferences based on phylogenetic trees were developed, which employed the graph theory, the statistics and so on. Although the reassortment events could be identified with these approaches accurately and sensi-tively, which is attributed to the use of the evolutionary history of related influenza viruses, the feasibility of these methods is tremendously limited by the reliability, time and computational cost of the phylogenetic tree construc-tion. In this case, several efforts were made to recognize the reassortants without phylogenetic trees. The phyloge-netic tree-independent methods detected the reassortment events primarily by using the significant differences of strain distances among multiple gene segments, which can be implemented on large amount virus strains with low computational complexity. These distances were based on both the genomic sequences and the physicochemical properties of amino acids. However, the quality of nucleotide and amino acid sequences could greatly influ-ence the identification performance. For the reviewed reassortment identification algorithms, the performance were regrettably not compared because the most algorithms are unavailable and the benchmark dataset is lacking. Therefore, we only summarized the actual use experience of the four obtained software in Table 1. Based on the principles of the algorithms, we also give some suggestions on different application scenarios of these algorithms for both bioinformatic and virological researchers. Firstly, the software developed by Dong et al. is the only algorithm that has a friendly interface, which is more suitable for researchers with little bioinformatics background (Dong et al. 2011). In general, the phylogenetic tree-independent methods are more efficient to identify the reassortments from the large-scale data compared to the phylogenetic tree-based methods which are quite time-consuming when constructing the phylogenetic tree. However, for a small number of genomic sequences (usually about 100 sequen-ces), the phylogenetic tree-based methods infer the reas-sortments more accurately. In addition, the methods proposed by Nagarajan and Kingsford(2008, 2011), Wan et al. (Wan et al. 2007a, 2007b, 2008), Rabadan et al. (2008) and de Silva et al. (2012) can identify the reas-sortment events based on partial genomic segment sequences of influenza viruses. The amino acid sequences of genomic segments can be processed by either FluResort or HoPER, and HoPER is more suitable for inferring the reassortments in the same host.

      On the other side, as an increasing number of studies on identifying influenza virus reassortments, two databases i.e. FluGenome and FluReassort were developed, providing valuable information related to the influenza virus reas-sortments. In summary, a universal and valid computa-tional method for reassortments identification doesn't exist recently. The most appropriate scheme can be designed, which depending on all information of the analyzed data, such as the amount of strains, the diversity of strains and so on. We hope this review can serve as a guide to reasonably identify the reassortments for diverse influenza virus datasets.

    • This work was supported by the National Natural Science Foundation of China (31801101 to X.D., 31671371, 32070678 to T.J.); the CAMS Initiative for Innovative Medicine (CAMS-I2M, 2016-I2M-1-005, 2020-I2M-2-003 to T.J.).

    • Conflict of interest The authors declare that no competing interests exist.

    • This article does not contain any studies with human or animal subjects performed by any of the authors.

    Figure (2)  Table (1) Reference (66) Relative (20)

    目录

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return