Proteotyping: A New Approach Studying Influenza Virus Evolution at the Protein Level*

Wei-feng SHI; Zhong ZHANG; Lei PENG; Yan-zhou ZHANG; Bin LIU; Chao-dong ZHU

October 2007

Citation: Wei-feng SHI, Zhong ZHANG, Lei PENG, Yan-zhou ZHANG, Bin LIU, Chao-dong ZHU. Proteotyping: A New Approach Studying Influenza Virus Evolution at the Protein Level* .VIROLOGICA SINICA, 2007, 22(5) : 405-411.

Proteotyping: A New Approach Studying Influenza Virus Evolution at the Protein Level*

Wei-feng SHI ¹ ,
Zhong ZHANG ² ,
Lei PENG ³ ,
Yan-zhou ZHANG ⁴ ,
Bin LIU ⁵ ,
Chao-dong ZHU ^{4
,,}

1.
Institute of Life Sciences, Taishan Medical College, Shandong Tai'an, 271000 China
2.
Department of Basic Medicine, Taishan Medical College, Shandong Tai'an, 271000 China
3.
College of Information and Engineering, Taishan Medical College, Shandong Tai'an, 271000 China
4.
Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101 China
5.
Department of Biological Sciences, Taishan Medical College, Shandong Tai'an, 271000 China

Corresponding author: Chao-dong ZHU, zhucd@ioz.ac.cn
Received Date: 11 May 2007
Accepted Date: 03 July 2007
Available online: 01 October 2007

Fund Project: National Nature Science Funds 30670242National Nature Science Funds 30500056

Abstract

Phylogenetic methods have been widely used to detect the evolution of influenza viruses. However, previous phylogenetic studies of influenza viruses do not make full use of the genetic information at the protein level and therefore cannot distinguish the subtle differences among viral genes. Proteotyping is a new approach to study influenza virus evolution. It aimed at mining the potential genetic information of the viral gene at the protein level by visualizing unique amino acid signatures (proteotypes). Neuraminidase gene fragments of some H5N1 avian influenza viruses were used as an example to illustrate how the proteotyping method worked. Bayesian analysis confirmed that the NA gene tree was mainly divided into three lineages. The NA proteotype analysis further suggested there might be multiple proteotypes within these three lineages and even within single genotypes. At the same time, some proteotypes might even involve more than one genotype. In particular, it also discovered some amino acids of viruses of some genotypes might co-reassort. All these results proved this approach could provide additional information in contrast to results from standard phylogenetic tree analysis.
- Proteotyping
- , Genotype
- , H5N1
- , Avian influenza virus
- , Neuraminidase

References
1. Choi Y K, Ozaki H, Webby R J, et al.2004.Continuing Evolution of H9N2 Influenza Viruses in Southern China.J Virol, 78:8609-8614.
  doi: 10.1128/JVI.78.16.8609-8614.2004
2. Cummings J L.2003.Toward a molecular neuropsy-chiatry of neurodegenerative diseases.Ann Neurol, 54 (2):147-154.
  doi: 10.1002/(ISSN)1531-8249
3. Cummings J L.2004.Dementia with Lewy Bodies: Molecular Pathogenesis and Implications for Classifi-cation.J Geriatr Psychiatry Neurol, 17 (3):112-119.
  doi: 10.1177/0891988704267473
4. Guan Y, Peiris J S M, Lipatov A S, et al.2002. Emergence of multiple genotypes of H5N1 avian influenza viruses in Hong Kong SAR.Proc Natl Acad Sci USA, 99:8950-8955.
  doi: 10.1073/pnas.132268999
5. Guan Y, Poon L L M, Cheung C Y, et al.2004.H5N1 influenza: A protean pandemic threat.Proc Natl Acad SciUSA, 101:8156-8161.
  doi: 10.1073/pnas.0402443101
6. Holmes E C, Ghedin E, Miller N, et al.2005.Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and Reassortment among Recent H3N2 Viruses.Plos Biology, 3:1579-1589.
7. Hatta M, Gao P, Halfmann P, et al.2001.Molecular basis for high virulence of Hong Kong H5N1 influenza A viruses.Science, 293:1840-1842.
  doi: 10.1126/science.1062882
8. http://www.flu.org.cn/upfile/attachment/200693093153272.pdf
9. Huang K, Fan X H. 2005. Molecular Epidemiological Studies on H5N1 Influenza Viruses from Poultry in Nanning (Mr. thesis). : Guangxi Medical University, Guangxi, China. (in Chinese)
10. Iwatsuki-Horimoto K, Kanazawa R, Sugii S, et al.2004. The index influenza A virus subtype h5n1 isolated from a human in 1997 differs in its receptor-binding properties from a virulent avian influenza virus.J Gen Virol, 85:1001-1005.
  doi: 10.1099/vir.0.19519-0
11. Kou Z, Lei F M, Yu J, et al.2005.New genotype of avian influenza H5N1 viruses isolated from tree sparrows in China.J Virol, 79:15460-15466.
  doi: 10.1128/JVI.79.24.15460-15466.2005
12. Kumar S, Tamura K, Nei M.2004.MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment.Brief Bioinform, 5:150-163.
  doi: 10.1093/bib/5.2.150
13. Li K S, Guan Y, Wang J, et al.2004.Genesis of a highly pathogenic and potentially pandemic H5N1 influenza virus in eastern Asia.Nature, 430:209-213.
  doi: 10.1038/nature02746
14. Lutskiy M I, Rosen F S, Remold-O'Donnell E.2005. Genotype-Proteotype Linkage in the Wiskott-Aldrich Syndrome.J Immunol, 175:1329-1336.
  doi: 10.4049/jimmunol.175.2.1329
15. Matrosovich M N, Krauss S, Webster R G.2001.H9N2 influenza A viruses from poultry in Asia have human virus-like receptor specificity.Virology, 281:156-162.
  doi: 10.1006/viro.2000.0799
16. Matrosovich M, Zhou N N, Kawaoka Y, et al.1999.The surface glycoproteins of H5 influenza viruses isolated from humans, chickens, and wild aquatic birds have distin-guishable properties.J Virol, 73:1146-1155.
17. Obenauer J C, Denson J, Mehta P K, et al.2006. Large-scale sequence analysis of avian influenza isolates. Science, 311 (5767):1576-1580.
  doi: 10.1126/science.1121586
18. Rodriguez C, Quero C, Dominguez A, et al. 2006. Proteotyping of human haptoglobin by MALDI-TOF profiling: Phenotype distribution in a population of toxic oil syndrome patients. Proteomics, 6(Suppl 1): S272-S281.
19. Ronquist F, Huelsenbeck J P.2003.MrBayes 3: Bayesian phylogenetic inference under mixed models.Bioinfor-matics, 19:1572-1574.
  doi: 10.1093/bioinformatics/btg180
20. Roth M J, Forbes A J, Boyne Ⅱ M T, et al. 2005. Precise and Parallel Characterization of Coding Polymorphisms, Alternative Splicing and Modifications in Human Proteins by Mass Spectrometry. Mol Cell Proteomics, 4 (7): 1002-1008.
  doi: 10.1074/mcp.M500064-MCP200
21. Shillingford J M, Miyoshi K, Robinson G W, et al.2003. Proteotyping of Mammary Tissue from Transgenic and Gene Knockout Mice with Immunohistochemical Markers: a Tool To Define Developmental Lesions.J Histochem Cytochem, 51 (5):555-565.
  doi: 10.1177/002215540305100501
22. Simmons M P, Ochoterena H.2000.Gaps as Characters in Sequence-Based Phylogenetic Analyses.Syst Biol, 49 (2):369-381.
  doi: 10.1093/sysbio/49.2.369
23. Thompson J D, Gibson T J, Plewniak F, et al.1997.The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.Nucl Acids Res, 25:4876-4882.
  doi: 10.1093/nar/25.24.4876
24. Wang J, Li K S. 2004. Genotype Evolution of the H5N1 Influenza Viruses in Aquatic Birds in Southern China (Mr. thesis). Shantou University, Guangdong, China. (in Chinese)
25. Webster R G, Bean W J, Gorman O T, et al.1992. Evolution and ecology of influenza A viruses.Microbiol Rev, 56:152-179.
26. Zhuang Z P, Huang S, Kowalak J A, et al.2006.From tissue phenotype to proteotype: Sensitive protein identifi-cation in microdissected tumor tissue.Int J Oncol, 28 (1):103-110.
Proportional views

Figures(1) / Tables(1)

PDF

Article Metrics

Article views(4242) PDF downloads(16) Cited by()

Proportional views

HTML

Tracing influenza viruses' evolution has been the subject of much research relevant to influenza viruses and frequent reassor-tment events for both avian and human influenza viruses have been detected using phylogenetic methods (6, 13). For instance, several genotypes of H5N1 avian influenza viruses have been detected in the past five years, and these have been designated A, B, C, D, E, G, V, W, X, Y, Z, Z⁺ and so on (4, 5, 11, 13). Viruses of genotype A, B, C, D, E and F were also identified for the H9N2 subtype (1). There is no denying that these studies revealed differences among viruses. However, all of the differences were principally at the DNA level and genetic information at the protein level was not fully utilized. Therefore, phylogenetic trees sometimes could not provide subtle differences among viral genes.

To better make use of the information at the protein level, molecular characterization analyses and reverse genetics techniques have been performed to help find the key sites relevent to pathogenicity, virulence and even host selection of influenza viruses (15, 16) etc. Up to date, some positions playing important roles in viral genomes have been found, such as the con-necting peptide sites in HA, Lys-627 in the PB2 fragment (7, 10) and so on. Thus, molecular characteri-zation analysis does have advantages in seeking single amino acid and short peptide mutations. However, it is difficult for it to integrate all these genetic information as a whole to find genes that co-reassort or proteins displaying compensatory mutations. To this end, Obenauer et al. introduced a proteotyping method to visualize unique amino acid signatures (proteotypes) (17). This method was able to identify co-reassorting genes, 50+ protein-protein pairs, virus "families" that share specific combination of genes and proteins exhibiting compensatory mutations (8).

Neuraminidase (NA) is a surface protein that cleaves sialic acid from virus and host cell glycocon-jugates at the end of the virus life cycle to allow mature virions to be released (25). Phylogenetic studies have revealed that the H5N1 avian influenza viruses of China were divided into three lineages according to the NA gene tree, with one lineage (Ⅰ) possessing a 19-aa deletion in the stalk of NA, one lineage (Ⅱ) without deletion, and one lineage (Ⅲ) with a 20-aa deletion (9, 24). Viruses of genotypes A, G, X, Y, Z and ShanTou3-like (ST3-like) belonged to group Ⅲ, while B, C, D, E, W, Z⁺, ST1-like and ST2-like isolates belonged to group Ⅱ and HK/156/97 was placed into group Ⅰ.

In this paper, we took NA gene fragments of some H5N1 influenza viruses isolated from mainland China, Hong Kong Special Administration Region (SAR) and Southern Asia as an example to illustrate how the proteotyping method worked.

DATASET AND METHODS

Our dataset included the NA gene segments of typical H5N1 avian influenza viruses of the known genotypes isolated from mainland China, Hong Kong SAR and Southeast Asia. In addition, some isolates from human were also included to assist our analysis. Parrot/Ulster/73 was designated as an outgroup to root the tree. All nucleotide sequences were obtained directly from GenBank.

The first step in proteotyping was similar to that of normal phylogenetic analysis. Multiple sequence alignment was performed with ClustalX 1.81 (23) and the alignment parameters were set to default. To estimate the trees accurately, MrBayes, version 3.0b4 was used to construct the NA gene tree (19). Four Markov chains were run for two million generations and sampled every 100 generations to yield a posterior probability distribution of 20 000 trees. After elimi-nating the first 5 000 trees as burn-in, a 50% majority-rule consensus tree was constructed. Bayesian Posterior Probability (BPP) was used to assess the support for the recovered clades, given the aligned sequence data. A six parameter substitution model (General Time Reversible) was used with a gamma rate parameter allowing site variation. It should be noted that besides Bayesian, other trees search methods can also be used.

In the second step, DNA data were translated to their protein sequences accordingly by using Mega 3 (12). Alternatively, protein sequences could be downloaded directly from GenBank. After that, protein sequences were aligned using ClustalW included in Mega 3. The protein alignment was then re-sorted according to the sequence order displayed by the tree. Consequently, a so-called "clade-guided" sequence alignment was produced by assigning a unique color to each kind of amino acid. It also should be noted that leading and trailing gaps were generally artifacts of aligning sequences with different 5' and 3' termini (22) and were set to white. The remaining gaps were set to black in order to highlight the real amino acid deletions.

Thirdly, a consensus sequence was calculated for the alignment. All the consensus amino acids were set to white to match the background color so that only non-consensus sites were visible. Obenauer et al. proposed a residue occur more than any other residue to be the consensus (17). However, by our method, all the residues would be displayed if no residue occurred more than 50% in the column. The remaining residues were used to define the proteotype according to the numbers of variable amino acids among proteins.

Finally, the proteotypes of NA proteins of the representative H5N1 viruses were determined mainly based on the amino acid differences among protein sequences and position information of the sequences on the tree. After the proteotypes are determined, serial numbers will be assigned starting at the top downwards for each proteotype and these numbers would be summarized into a table 1. At the same time, unique amino acids were sought from the NA proteotypes.

DISCUSSION

Proteotyping is a recently introduced method akin to genotyping at the DNA level, but wichit additionally captures the variability of proteins as they occur in populations and change over time (20). Using this method to help find the proteins related to diseases and study the changes of these proteins both in healthy and morbid situations has been reported (2, 3, 26). It has also been used as a tool to study developmental lesions (21). Some researchers even used it to link genotype and phenotype of some diseases (14). In these studies, the proteotyping processes were often fulfilled by the assistance of mass spectrum (MS) (18, 20).

Method has also been reported to be used to study influenza virus evolution at the protein level (17). In this study, it is have modified and has some particular characteristics. First of all, Proteotyping analysis is principally sequence-based, and therefore it can be completed without MS data. Secondly, as mentioned in the method section, the protein sequence alignment has been changed to clade-guided rather than normal multiple sequence alignment. Thirdly, for influenza viruses, genotype is only determined by the whole genome rather than single or few genes of it. In contrast, the proteotypes of the viruses can be determined for both each gene of the virus and the whole genome. In fact, integrating all the eight proteotypes determined for each gene segment, one can ascertain the proteotype of the whole genome like what have done to define a genotype of an influenza virus. Fourthly, the serial numbers designated to the proteotypes of the same viruses may be different because the serial numbers are decided both by the sample size into analysis and by the positions of the viruses in the gene tree. At last, the proteotype can also be linked to genotype. In fact, information at the genotype level is helpful to define the proteotype.

Bayesian analysis in this paper confirmed the previously constructed topology (9, 24). However, differences between the results from phylogenetic and proteotyping analyses proved that the proteotyping method had a higher resolution and was able to mine more subtle differences among viruses. The specific amino acids found by the proteotyping method could be further analyzed by other bioinformatics techniques or reverse genetic techniques to study their potential biological functions. However, only the proteotypes of NA proteins were determined here (Table 1). If the proteotypes of all the eight proteins of the influenza viruses were identified, proteins co-reassorting or showing compensatory mutations could be detected (17).

Unlike the consensus definition proposed by Obe-nauer et al.(17), we suggest all the residues should be displayed if no residue occurs more than 50% in the column. If none of them are occurring more than 50%, this may be an indication that this site is super variable. Although a super variable site suggests weak selective pressure and absence of biological function, if these variable sites were neglected, it is difficult to find sites that might co-reassort and they would be hidden subjectively. It is likely that these coreassorting sites might be related to the function of the protein. Therefore, hiding the residue taking up less than 50% in the column might lose potentially important information.

It should be also mentioned that there is no general criterion availabel to guide the definition of a pro-teotype. If it is defined arbitrarily, potentially, useful information might be hidden by the noise. However, the number of different amino acids among proteins and the positions of the viruses in the gene tree may be helpful to distinguish different proteotypes, but in some cases this may not be sufficient. Additional factors should be also taken into account such as serotype, subtype, genotype, host, collection time and natural selection pressure. In particular, sample size is also an important factor that should not be ignored. However, in this paper, we mainly introduced the proteotyping method and therefore only a sample of small size was used and the proteotypes were designated mostly by the numbers of different amino acids of NA proteins. Consequently, the proteotypes designated here were not strict.

To sum up, proteotyping method is a useful tool for studying virus evolution at the protein level. It also can be applied to other viruses, especially to viruses with segmented genomes.

Figure (1) Table (1) Reference (26) Relative (20)

Proteotyping: A New Approach Studying Influenza Virus Evolution at the Protein Level*

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

Proteotyping: A New Approach Studying Influenza Virus Evolution at the Protein Level*

Corresponding author: Chao-dong ZHU, zhucd@ioz.ac.cn

HTML

Bayesian Analysis

Proteotyping Analysis

目录