-
Within the standard genetic codes utilized in a great deal of diverse ways, all amino acids (aa) are coded by two to six synonymous codons, except Met and Trp. However, degenerate codons are not used at equal frequencies within organism, a phenomenon called codon usage bias[17, 21, 50]. Codon usage bias among synonymous codons has been described for many genes in various species[6, 10, 20, 21, 26, 28, 39, 53]. Researches of the synonymous codon usage can uncover knowledge concerning the molecular evolution of individual gene. It is reported that synonymous codon usage bias may related with variant biological factors, such as GC compositions, gene length, mutation frequency and patterns, gene expression level, tRNA abundance, gene translation initiation signal and protein structure[4, 14, 19, 27, 37]. Further analysis discovered that synonymous codon usage pattern varied at different sites along a coding sequence[24], balances of strong versus weak base pair bonding[5, 22], maintenance of DNA and RNA secondary structure[52], and translational efficiency and fidelity[26].
Aujeszky's disease, caused by PRV (also known as suid herpesvirus 1, SuHV-1), is a frequently fatal disease with a global distribution that affects swine primarily and other domestic and wild animals incidentally[34, 35, 43, 46, 48]. Most of the previous research works have focused on the epidemiology and prevention of this disease[7, 32, 42, 43, 55]. However, the exact molecular biology characteristics about the PRV genome is still not well understood thus far. PRV US1 gene, a 1050-base pair sequence encodes a putative polypeptide of 349 aa residues designated PICP22. The functions of US1 gene products, such as herpes simplex virus 1 (HSV-1) ICP22[3, 8, 16, 47] and varicella-zoster virus (VZV) ORF63[2, 11, 12, 41] that are the homologs of PICP22, in the herpesvirus life cycle have been extensively studied; however, the exact functional characteristics of PRV US1 gene, as well as its codon usage bias is poorly understood. Given this background, it becomes crucial to analyze the codon preference used in PRV US1 gene. In this study, we analyzed the synonymous codon usage data of PRV US1 gene and compared it with the US1-like genes of 20 reference alphaherpesviruses. Then, we investigated how other factors may impact codon usage variation in the PRV US1 gene and its reference species. Moreover, we compared the codon usage preference of PRV US1 gene with those of E. coli, yeast, and human.
HTML
-
The nucleotide sequences (Table 1) of PRV Becker strain US1 gene (GenBank accession no. JF797219) and the US1-like genes of 20 reference alphaherpesviruses were obtained from the GenBank (Bethesda, Maryland, USA; http://www.ncbi.nlm.nih.gov/).
Table 1. Nucleotide sequences of the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpes viruses from different species
-
To compare with those of ICP22-like proteins of the 21 reference alphaherpesviruses, for which nucleotide sequences are available in GenBank (listed in Table 1), the nucleotide sequences of PRV US1 gene and its reference alphaherpesviruses were translated into aa sequence, then multiple sequence alignment and phylogenetic analysis (rooted tree) were performed by employing the Clustal V in MegAlign program of DNAStar (version 7.0, DNAStar, Inc.)[9].
-
For each gene, codon usage was assessed by using the program CodonW 1.4 (http://codonw.sourceforge.net/). Some indices of codon usage bias including CAI (codon adaptation index), ENc (effective number of codons), GC3s (G+C content at the third positions of codons) and RSCU (relative synonymous codon usage) were calculated. CAI uses a reference set of highly expressed genes from a species to estimate the relative virtues of each codon (a full gene list is available at http://helixweb.nih.gov/emboss/html/cai.htm), and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the level to which selection has been effective in shaping codon usage[51]. ENc is the effective number of codons used in a gene and can be used to quantify how far the codon usage of a gene deviates from equal usage of synonymous codons without reliance on sequence length or a given knowledge of preferred codons, although it is affected by base composition[13, 45, 56]. Values of ENc can range from 20 (when only one codon is used per aa) to 61 (when all synonyms are used with equal frequency). Thus, ENc can be a useful measure of general codon usage bias. The lower the ENc, the higher the codon bias. GC3s is a useful parameter of the degree of base composition bias, and represents the frequency of the nucleotide G+C at the synonymous third position of codons, excluding Met, Trp and the stop codons. The relative synonymous codon usage (RSCU) was employed to investigate the overall synonymous codon usage variation among the genes without the confounding influence of the aa composition of different gene samples, it is defined as the ratio of the observed frequency of codons to the expected frequency if all the synonymous codons for those aa are used equally. A RSCU value greater than 1.0 indicates that the corresponding codon is more frequently used than expected, whereas the reverse is true for a RSCU value less than 1.0[51]. A heat map to represent the clustering of RSCU values was constructed with the CIMMiner software tool (http://discover.nci.nih.gov/cimminer)[54] with each column representing a specific codon and each row representing a different species (in the order as in Table 1). Clustering was performed based on Euclidean distance and the average linkage method. The codon usage pattern across different genes was also analyzed by the ENc-plot, which is a plot of ENc versus GC3s and length or GC3s versus length. Curves were generated using a logarithmic distribution curve where
y = -34.757Ln(x) + 31.407,
y = -24.909Ln(x) + 214.24 and
y = 0.4553Ln(x) -2.3871,
were used for calculating the points for ENc-GC3s, ENc-Length and GC3s-Length, respectively.
-
To test whether distinct species follow a similar codon usage rule, we compared the codon preferences among the PRV US1 gene with those of E. coli, yeast and human. The codon usage analysis of these species was carried out by using the codon usage database (http://www.kazusa.or.jp/codon) and the CUSP program in the EMBOSS software suite (The European Molecular Biology Open Software Suite, http://bioinfo.pbi.nrc.ca:8090/EMBOSS/)[38].
-
The correlations between codon usage variations among the PRV US1 gene and 20 reference alphaherpesviruses and four indicators (CAI, ENc, GC3s and gene length) were estimated by using the SPSS 12.0 software package.
Virus species and gene sequences
Molecular phylogenetic tree of ICP22-like proteins of the 21 reference alphaherpesviruses
Codon usage analysis of the PRV Becker strain US1 gene and other 20 reference alphaherpesviruses
Comparison of codon preferences of PRV Becker strain US1 gene with those of E. coli, yeast and human
Statistical analysis
-
A phylogenetic tree on the basis of the deduced PICP22 and its ICP22-like proteins in the reference alphaherpesviruses (Table 1) is shown in Fig. 1. From Fig. 1 we can see that the general branching pattern is consistent with other previously published phylogenetic analyses[43, 46] and the ICP22-like proteins within the same genus or in the same microorganism are clustered together. Simultaneously, it is shown that the PICP22 of PRV Becker strain clusters with Bartha strain and Kaplan strain are initially placed in a monophyletic clade and then clustered with other members of the genus Varicellovirus of alphaherpesvirus, such as bovine herpesvirus 1 (BoHV-1), BoHV-5, felid herpesvirus 1 (FeHV-1), equid herpesvirus 1 (EHV-1), EHV-4, EHV-9, human herpesvirus 3 (HHV-3, VZV) and cercopithecine herpesvirus 9 (CeHV-9). Therefore, we can conclude from the phylogenetic tree and the high aa sequence homology that the PRV PICP22 protein has a close evolutionary relationship with the members of the genus Varicellovirus of alphaherpesvirus, but certain differences nevertheless exist.
Figure 1. Evolutionary relationship of the PRV Becker strain ICP22 protein with the ICP22-like proteins of 20 reference alphaherpesviruses from different species (Table 1). Phylogenetic tree of these proteins was generated by using the MEGALIGN (DNAStar) program with Clustal V multiple alignment software package and sequence distance indicated by the scale was calculated using the PAM250 matrix in LASERGENE.
-
The results obtained by CodonW analysis of 21 alphaherpesviruses species are shown in Table 2. Codon usage in the PRV US1 gene and its homologous genes is extremely nonrandom, and the overall base composition of the US1 gene and its homologous genes in these species also shows similar variation. However, there are some distinct patterns in the codon usage bias parameters of the US1 gene among the PRV Becker, Kaplan and Bartha strains. It can be seen in Table 2 that the CAI values of different alphaherpesviruses vary from 0.182 to 0.493, with a mean value of 0.387 and a standard deviation (SD) of 0.084 and their ENc values range from 28.4 to 61.0, with a mean value of 44.2 and SD of 12.1. Since most ENc values of the 21 alphaherpesviruses are lower than the average (ENc < 40), the codon usage bias in the US1-like genes of the 21 alphaherpesviruses is accordingly slightly higher. Similarly, the GC3S content of each US1-like gene also confirm the homogeneity of synonymous codon usage among the different alphaherpesviruses, which vary from 34.44% to 95.68%, with a mean of 71.68% and a SD of 19.88%.
Table 2. Summary analysis of the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpesviruses from different species
A plot of ENc against GC3s is an effective way of examining the heterogeneity of codon usage among a set of homologous genes[56]. If a specific gene is subject to G+C compositional constraint for shaping the codon usage pattern, it will lie on a continuous curve, representing random codon usage[29]. Conversely, if a gene is subject to selection for translationally optimal codons, it will lie considerably below the expected curve. The ENc values of each US1-like gene in the 21 reference alphaherpesviruses are plotted against their corresponding GC3s in Fig. 2A. From Fig. 2A, we can see that although a few genes lay on the expected curve, a large number of points lie near the solid curve of this distribution, suggesting that these genes are subject to GC compositional constraints.
Figure 2. Relationship between ENc, GC3s and gene length of the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpesviruses. A: Plot of ENc versus GC3s for the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpesviruses. ENc denotes the effective number of codons of each gene, and GC3s denotes the G+C content at the third synonymous codon position of each gene. The solid curve shows the expected position of genes whose codon usage is only determined by the variation in GC3s. B: Plot of ENc versus gene length for the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpesviruses. C: Plot of GC3s versus gene length (bp) for the PRV Becker strain US1 gene and the US1-like genes of 20 reference alphaherpesviruses. Red point represents the PRV Becker strain, yellow point represents the PRV Bartha strain and green point represents the PRV Kaplan strain.
The relationship between gene length and synonymous codon usage bias has been described for Drosophila melanogaster, E. coli, Saccharomyces cerevisiae, Pseudomonas aeruginosa and Yersinia pestis[23, 25, 40]. Here, the plot of gene length against ENc (Fig. 2B) or against GC3s (Fig. 2C) shows the distribution for each gene. It appears that in the US1-like genes of the 21 reference alphaherpesviruses, longer genes have a much wider variance in ENc values and GC3s, suggesting that gene length may also play a role in shaping the codon usage bias of the 21 alphaherpesviruses.
-
While the CAI, ENc and the related measures indicate the overall codon bias of PRV US1 gene, it is also important to more closely examine the pattern of codon bias. Table 3 shows the overall codon preference of the US1 gene in the PRV Becker strain. From the RSCU values we can see that the amino acids, excluding Met, Trp and the termination codons in the polypeptide, Arg, Leu, Ser, Ala, Gly, Pro, Thr and Val have a high level of diversity in codon usage biases because they have six-fold and four-fold coding degeneracy. Moreover, Cys, Asp, His, Lys, Asn, Gln and Tyr also have a high level of diversity in codon usage bias, even though they only have two-fold or three-fold coding degeneracy. Altogether, although the most and the least frequencies used codons of all the aa are different, the analyzed PRV Becker strain US1 gene shows significant preference for one or more than one postulate codon for each aa. However, a similar bias also exists at the first position, indicating a more complex situation exists in reality.
Table 3. The result of codon preference analysis in PRV Becker strain US1 gene analyzed with the CUSP program
-
To provide a visual representation of the variation in codon bias[15, 36, 44], we performed a cluster analysis of the codon usage pattern based on the PRV Becker strain US1 gene and its 20 reference alphaherpesviruses according to the RSCU values (Table 4 and Fig. 3). From the figure we can see that PRV Becker, Kaplan and Bartha strains appear distinct from other alphaherpesviruses. They firstly cluster together and form a separate branch, then cluster with the members of genus Varicellovirus of alphaherpesvirus, such as BoHV-1, BoHV-5, EHV-1, EHV-4 and EHV-9, and subsequently cluster with other genera of alphaherpesvirus. This result fully indicates the internal relations of the codon usage pattern between PRV and other alphaherpesviruses, suggesting that the codon usage pattern of PRV has differences with other alphaherpesviruses, the more distant the genetic relationship, the bigger the expected variation in the codon usage bias. Accordingly, we can conclude that the codon usage pattern of PRV is fairly close to that of the members of genus Varicellovirus of alphaherpesvirus and is most different with other genera of alphaherpesvirus.
Figure 3. Heat map of RSCU values for the 21 reference alphaherpesvirus species (clustered by the RSCU values, Table 4). See main text for details.
Table 4. RSCU values of the US1 genes of PRV Becker strain and 20 reference alphaherpesviruses from different species
-
Generally, the codon usage bias in a gene remains conserved to a certain degree across species. Here, the codon usage of PRV Becker strain US1 gene was compared with those of E. coli, yeast and human to see which would be the most suitable host for optimal expression. From Table 5, we can see that there are 50 codons showing a PRV-to-yeast ratio higher than 2 or lower than 0.50 and 49 codons showing a PRV-to-human ratio higher than 2 or lower than 0.50, but 48 codons showing a PRV-to-E. coli ratio higher than 2 or lower than 0.50, indicating that large differences in the codon preferences exist for all three hosts. Although there were slightly fewer differences in codon usages between E.coli and PRV, the difference is unlikely to be statistically significant, and experimental studies would be necessary to establish the most suitable expression system for this virus.
Table 5. Comparison of codon preferences between PRV Becker strain US1 gene and E. coli, yeast and human