Analysis of Synonymous Codon Usage Bias in 09H1N1

Zhen-peng LI; De-quan YING; Peng LI; Fei LI; Xiao-chen BO; Sheng-qi WANG

doi:10.1007/s12250-010-3123-3

October 2010

Citation: Zhen-peng LI, De-quan YING, Peng LI, Fei LI, Xiao-chen BO, Sheng-qi WANG. Analysis of Synonymous Codon Usage Bias in 09H1N1 .VIROLOGICA SINICA, 2010, 25(5) : 329-340. http://dx.doi.org/10.1007/s12250-010-3123-3

Analysis of Synonymous Codon Usage Bias in 09H1N1

Beijing Institute of Radiation Medicine, Beijing 100850, China

Corresponding author: Xiao-chen BO, boxc@bmi.ac.cn
Sheng-qi WANG, sqwang@bmi.ac.cn
These authors contributed equally to this work.
Received Date: 05 January 2010
Accepted Date: 30 April 2010
Available online: 01 October 2010

Abstract

A novel subtype of influenza A virus 09H1N1 has rapidly spread across the world. Evolutionary analyses of this virus have revealed that 09H1N1 is a triple reassortant of segments from swine, avian and human influenza viruses. In this study, we investigated factors shaping the codon usage bias of 09H1N1 and carried out cluster analysis of 60 strains of influenza A virus from different subtypes based on their codon usage bias. We discovered that more preferentially used codons of 09H1N1 are A-ended or U-ended, and the intra-genomic codon usage bias of 09H1N1 is quite low. Base composition constraint, dinucleotide biases and translational selection are the main factors influencing the codon usage bias of 09H1N1. At the genome level, we find that the codon usage bias of 09H1N1 is similar to H1N1 (A/swine/Kansas/77778/2007H1N1), H9N2 from Asia, H1N2 from Asia and North America and H3N2 from North America. Our results provide insight for understanding the processes governing evolution, regulation of gene expression, and revealing the evolution of 09H1N1.
- 09H1N1
- , Correspondence analysis
- , Codon usage bias

References
1. Bao Y, Bolotov P, Dernovoy D, et al. 2008. The Influenza Virus Resource at the National Center for Biotechnology Information. J Virol, 82 (2): 596-601.
  doi: 10.1128/JVI.02005-07
2. Basak S, Banerjee T, Gupta S K. 2004. Investigation on the causes of codon and amino acid usages variation between thermophilic Aquifex aeolicus and mesophilic Bacillus subtilis. J Biomol Struct Dyn, 22 (2): 205-214.
  doi: 10.1080/07391102.2004.10506996
3. Charif D, Lobry J. 2007. SeqinR 1. 0-2. A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis. In: Structural Approaches to Sequence Evolution (Bastolla U, Porto M, Roman E, Vendruscolo M, eds. ), Berlin Heidelberg: Springer, p207-232.
4. Dray S, Dufour A B. 2007. The ade4 package: implementing the duality diagram for ecologists. J Stat Softw, 22 (4): 1-20.
5. Garten R J, Davis C T, Russell C A. 2009. Antigenic and Genetic Characteristics of Swine-Origin 2009 A (H1N1) influenza Viruses Circulating in Humans. Science, 325 (5937): 197-201.
  doi: 10.1126/science.1176225
6. Gu W, Zhou T, Ma J. 2004. The relationship between synonymous codon usage and protein structure in Escherichia coli and Homo sapiens. BioSystems, 73 (2): 89-97.
  doi: 10.1016/j.biosystems.2003.10.001
7. Gupta S K, Ghosh T C. 2001.Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa. Gene, 273 (1): 63-70.
  doi: 10.1016/S0378-1119(01)00576-5
8. Ihaka R, Gentleman R. 1996. R: A language for data analysis and graphics. J Comp Graph Stat, 5 (3): 299-314.
9. Jenkins G M, Holmes E C. 2003. The extent of codon usage biases in human RNA viruses and its evolutionary origin. Virus Res, 92 (1): 1-7.
  doi: 10.1016/S0168-1702(02)00309-X
10. Karlin S, Doerfler W, Cardon L R. 2007. Why is CpG suppressed in the genomes of virtually all small eukaryotic viruses but not in those of large eukaryotic viruses? J Virol, 68 (5): 2889-2897.
11. Kyte J, Doolittle R F. 1982. A simple method for displaying the hydropathic character of a protein. J Mol Biol, 157 (1): 105-32.
  doi: 10.1016/0022-2836(82)90515-0
12. Lobry J R, Gautier C. 1994. Hydrophobicity, expressivity and aromaticity are the major trends of amino acid usage in 999 Escherichia coli chromosome encoded genes. Nucl Acids Res, 22 (15): 3174-3180.
  doi: 10.1093/nar/22.15.3174
13. Marais G. 2001.Duret L. Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J Mol Evol, 52 (3): 275-280.
  doi: 10.1007/s002390010155
14. McInerney J O. 1998. Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc Natl Acad Sci USA, 95 (18): 10698-10703.
  doi: 10.1073/pnas.95.18.10698
15. Mooers A , Holmes E C. 2000. The evolution of base composition and phylogenetic inference. Trends Ecol Evol (Amst.), 15 (9): 365-369.
  doi: 10.1016/S0169-5347(00)01934-0
16. Perriere G, Thioulouse J. 2002. Use and misuse of correspondence analysis in codon usage studies, Nucl Acids Res, 30 (20): 4548-4555.
  doi: 10.1093/nar/gkf565
17. Sharp P M, Tuohy T M, Mosurski K R, et al. 1986. Codon usage in yeast: cluster analysis clearly differ-entiates highly and lowly expressed genes. Nucl Acids Res, 14 (13): 5125-5143.
  doi: 10.1093/nar/14.13.5125
18. Suzuki H, Brown C J, Forney L J. 2008. Comparison of correspondence analysis methods for synonymous codon usage in bacteria. DNA Res, 15 (6): 357-365.
  doi: 10.1093/dnares/dsn028
19. Tao P, Dai L, Luo M. 2009. Analysis of synonymous codon usage in classical swine fever virus. Virus Genes, 38 (1): 104-112.
  doi: 10.1007/s11262-008-0296-z
20. Trifonov V, Khiabanian H, Greenbaum B, et al. 2009. The origin of the recent swine influenza A (H1N1) virus infecting humans. Euro Surveill, 14 (17) : pii=19193. Available online. http//www.eurosurveillance.org/View-Article.aspx?Article Id=19193.
21. Trifonov V, Khiabanian H, Rabadan R. 2009. Geographic Dependence, Surveillance, and Origins of the 2009 Influenza A (H1N1) Virus. N Engl J Med, 361 (2): 115-119.
  doi: 10.1056/NEJMp0904572
22. Wright F. 1990. The 'effective number of codons' used in a gene. Gene, 87 (1): 23-29.
  doi: 10.1016/0378-1119(90)90491-9
23. Zhou T, Gu W, Ma J, et al. 2005. Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. BioSystems, 81 (1): 77-86.
  doi: 10.1016/j.biosystems.2005.03.002
Proportional views

Figures(3) / Tables(6)

PDF

Article Metrics

Article views(3804) PDF downloads(16) Cited by()

Proportional views

HTML

09H1N1, which is a combination of gene segments from both North American and Eurasian swine lineages^{[5, 20, 21]}, has crossed the species barrier to humans and has rapidly spread across the world. Several reports have illustrated the origin of this virus, which showed that all segments of 09H1N1 are directly related to swine influenza virus, including not only H1N1, but also the other subtypes of influenza A virus, mainly from America and Eurasian. The results also revealed that 09H1N1 is a triple reassortant of the segments from swine, avian and human influenza viruses.

It has been well established that synonymous codon usage varies both among genomes and within genomes. Several factors which can influence codon usage have been reported, such as mutational bias^[9], translational selection^[7], replicational and transcriptional selection ^[14], secondary structure of proteins ^[6], gene function^[23], gene length^[13] and environmental factors ^[2]. Codon usage biases of some organisms, such as bacteria, yeast, drosophila and mammals, have been examined in earlier research^[15]. More recently, reports about the codon usage of RNA virus have also been reported ^{[9, 19, 23]}, which show that intra-genomic synonymous codon usage bias (referred to as "codon usage bias" for brevity hereafter) of most RNA viruses is quite low.

It is well known that a detailed knowledge of codon usage biases in RNA viruses can lead to a better understanding of the processes governing their evolution, particularly the role played by mutation pressure^[9]. Such information can also provide clues to the mechanisms involved in the regulation of viral gene expression and the evolution of viruses.

DISCUSSION

In this study, we investigated the codon usage bias of 09H1N1. Through codon usage analysis, we found that the most preferentially used codons of 09H1N1 are A-ended and U-ended codons and the codon usage bias of 09H1N1 is quite low. After long term coexisting with a host, the codon usage patterns of the virus may adapt to its host. It is believed that the codon usage patterns of host may become an obstacle to block the virus to transmit to another species with codon usage patterns quite different from its natural host. The low codon usage bias suggests a more uniformed synonymous codon selection of 09H1N1, which may endow 09H1N1 the advantage to transmit across the species barriers.

Codon bias is likely a product of various kinds of mutational and selective forces. We try to investigate the various factors shaping the codon bias of 09H1N1, rather, we need to mention that there may exist other factors influencing the codon bias of 09H1N1 that are not detected. Through Nc-plot and the computation of correlation coefficient between the position of genes along the first two axis of WCA and indices related to base composition, we found that base composition constrains is a key factor driving the codon usage bias of 09H1N1, while the low correlation coefficient between GC3s and the position of genes along the first two axes suggested that GC3s mutational bias is small and uneven in shaping the codon usage bias of 09H1N1, which is consistent with the Nc-plot and correlation extent between GC12s and GC3s. As it has been proved that mutational bias is the main factor determines the codon usage bias of influenza A virus ^[23], the uneven and small effect of mutational bias on 09H1N1 may give indirect support for its complex genome origins. Meanwhile, the correlation between Nc and Axis 1 deriving from WCA suggests a close relationship between translational selection and codon usage bias. Other factors, such as gene length, hydrophobicity of proteins and aromaticity of amino acid have no significant correlation with the codon usage bias of 09H1N1. As there doesn't exist a clear boundary between structural proteins and nonstructural proteins in Fig 1, it is likely that gene function is entangled with other factors, it's hard to ascertain the exact correlation between bias and gene's function.

The correlation relationships between the 16 dinucleotide frequencies and the first three axes derived from WCA suggests that dinucleotide biases, which are independent of the overall base composition, can also affect the codon usage bias of 09H1N1. The relationship between dinucleotide frequencies and codon usage bias is evident in some cases. For example, the AA dinucleotide has the highest mean frequency in table 5, there are six codons including AA, related to coding four amines, i.e. Gln, Asn, Lys and Glu, the most preferentially used codons of all these four aminos are all AA-including codons. The situation is similar to GA, which has the second highest mean frequency. In contrast to AA and GA, CG has the lowest mean frequence, of the eight codons containing CG, which relate to encode 5 amino, only have a mean RSCU value of 0.38, meanwhile, the least preferentially used codons of these five amino all contain CG. The significant CG deficiency is a common phenomenon in small eukaryotic viruses. Thus, it could be a strategy for viruses to resist host defense as CpGs may be recognized by the host's innate immune system as pathogen signature ^[19].

Other than the analysis mentioned above, the cluster map among genomes of 60 different strains was also plotted. 09H1N1 and swine05 (A/swine/ Kansas/77778/2007(H1N1)) are closely clustered in the cluster map among genomes, which suggests that they have similar codon usage bias. Early evolutional original analysis of 09H1N1 had revealed that six of eight segments have high similarity with the swine H1N2 influenza A viruses isolated in North America and Asia ^{[20, 21]}. Together with the fact that H1N2 is descendant of the triple-reassortant swine H3N2 isolated in North America^[21], it is under-standable that H1N2 from North America and Asia, H3N2 from North America are clustered in the same big branch as 09H1N1. It is worth noting that H9N2 from Asia are also in the same big branch as 09H1N1, which was ignored in the original analysis ^{[20, 21]}. As codon usage bias may be related to gene' function, expression level and protein structure, and cluster analysis based on codon usage bias may provide additional information when compared with sequence analysis. Further experiments or data are needed to verify whether there exist a biological relationship between 09H1N1 and H9N2 from Asia.

Our results will provide a complement to phy-logenetic studies of 09H1N1. Furthermore, a better knowledge of codon usage biases in RNA viruses will provide necessary information, which is useful to understand the processes governing their evolution, such as mutation pressure. At last, such information can provide relevant clues to grasp the regulation of viral gene expression and evolutionary origin of different genes of 09H1N1.

Figure (3) Table (6) Reference (23) Relative (20)

Analysis of Synonymous Codon Usage Bias in 09H1N1

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

Analysis of Synonymous Codon Usage Bias in 09H1N1

Corresponding author: Xiao-chen BO, boxc@bmi.ac.cn

Corresponding author: Sheng-qi WANG, sqwang@bmi.ac.cn

HTML

Materials

RSCU, RF ^[18]

GC3s, GC, T3s, C3s, A3s and G3s

GRAVY ^[11] and Aromo ^[12]

Effective number of codons (Nc) ^[22]

Distance measure and cluster analysis

Correspondence analysis (CA) and within correspondence analysis (WCA)

Software and statistic method

Synonymous codon usage of 09H1N1

The causes of codon usage bias of 09H1N1

目录

Analysis of Synonymous Codon Usage Bias in 09H1N1

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

Analysis of Synonymous Codon Usage Bias in 09H1N1

Corresponding author: Xiao-chen BO, boxc@bmi.ac.cn

Corresponding author: Sheng-qi WANG, sqwang@bmi.ac.cn

HTML

Materials

RSCU, RF [18]

GC3s, GC, T3s, C3s, A3s and G3s

GRAVY [11] and Aromo [12]

Effective number of codons (Nc) [22]

Distance measure and cluster analysis

Correspondence analysis (CA) and within correspondence analysis (WCA)

Software and statistic method

Synonymous codon usage of 09H1N1

The causes of codon usage bias of 09H1N1

目录

RSCU, RF ^[18]

GRAVY ^[11] and Aromo ^[12]

Effective number of codons (Nc) ^[22]