A defining feature of HIV-1 is its ability to adapt to the immune systems of its hosts by evolving new variant viruses. Some of the evolved variations became practically fixed in certain geographical regions and led to the emergence of characteristic regional variants, as e.g. in the case of the East Asian subtype B' (Graf M, et al., 1998; Li Z, et al., 2012a), likely an offspring of the pandemic subtype B. Variants such as B' constitute novel genomic backgrounds of the retrovirus on which natural selection and neutral evolution will then operate anew. The effect of the different backgrounds on the further course of retroviral evolution is unknown. It can be envisioned that for different genomic backgrounds the evolution of resistance to anti-retroviral drugs is different, or that for the same HLA type immune escape paths depend on genomic background. A case in point is the observation of novel mutation patterns associated with drug resistance in East Asian clade B' (Deng X, et al., 2008).
In the present study we test the hypothesis that selection pressure depends on the retroviral genomic background. Similar studies have demonstrated that this is the case for the env gene and the reverse transcriptase in various subtypes and host populations (Choisy M, et al., 2004; Travers S A, et al., 2005; Pond S L, et al., 2006). Here, we specifically analyze selection pressure on the viral envelope protein gp120 of pandemic subtype B and the related East Asian subtype B'. We have chosen gp120 as this protein is exposed to a particularly strong selection pressure by the host immune system, and this might lead also to strong differences in selection pressure between B and B'. Further, we analyzed B and B' as these clades can, on one hand, be distinguished clearly (Wang Y, et al., 2013b), while on the other hand they are quite closely related, so that differences could be traced back to specific genomic patterns.
One way to quantify selection pressure is to evaluate for the codons of a protein coding gene the ratio of non-synonymous nucleotide mutations (leading to a different amino acid) to synonymous mutations of codons (not leading to a different amino acid), often termed dn/ds or ka/ks (Nei M, et al., 1986; Li W H, 1993). We distinguish directional selection, pushing a population away from an established state, and stabilizing selection, tying a population to an established state. When no selection pressure is present, we observe neutral evolution (Kimura M, 1968). For directional selection, the ka/ks ratio should take a higher value (amino acid mutations favored), for stabilizing selection a lower value (synonymous mutations favored). Often the methods for estimating ka/ks were applied on a per gene basis, implying averaging over the codons of the studied gene. However, it is possible that within a gene there are positions experiencing directional selection and other positions under stabilizing selection, so that averaging over all codons of a gene may lead to canceling of contributions from different codons, and hence to an underestimation of selection pressure. Therefore, other methods have been developed that allow estimation of selection pressure on a per codon basis. Often these methods employ complex probabilistic models to explain the observed mutations in a gene on the background of a phylogenetic tree (Nielsen R, et al., 1998; Huelsenbeck J P, et al., 2006; Murrell B, et al., 2012). While these methods promise accurate results, they usually require relatively costly calculations. A fast and simple approximation was developed by Chen L (2004). This method counts synonymous and non-synonymous mutations with respect to a reference sequence, and corrects ka/ks ratios by a null model derived from the data. The use of a single reference corresponds to the assumption of a star-like phylogeny, as discussed by Chen L (2006).
In the present work we first compare results for selection pressure in the HIV-1 gp120 of the East Asian subtype B' from the MEME method (Murrell B, et al., 2012) with results from the much simpler approach by Chen L (2004). MEME uses a complex mixed-effect model that can account for variation of selection over time and between branches of a phylogenetic tree. If results agree, this would justify using the simpler method, especially for larger data sets where computational demands for the complex model may exceed the resources.
The original method by Chen L (2004) not only estimated directional and stabilizing selection pressure in a codon-wise fashion, but also included a significance criterion for directional selection. Here we slightly extend the method to also allow for significance assessment for stabilizing selection. The extended method is then applied to gp120 of subtype B and of the East Asian variant B', and results discussed in terms of the structure of gp120. Finally, we analyze codons that have a increased values of ka/ks in B' compared to B. These patterns show an interesting distribution in the structure of gp120, and we find a statistical association of glycosylation sites with these codons.
Differential selection in HIV-1 gp120 between subtype B and East Asian variant B'
- Received Date: 06 October 2013
- Accepted Date: 20 November 2013
- Published Date: 16 January 2014
Abstract: HIV-1 evolves strongly and undergoes geographic differentiation as it spreads in diverse host populations around the world. For instance, distinct genomic backgrounds can be observed between the pandemic subtype B, prevalent in Europe and North-America, and its offspring clade B' in East Asia. Here we ask whether this differentiation affects the selection pressure experienced by the virus. To answer this question we evaluate selection pressure on the HIV-1 envelope protein gp120 at the level of individual codons using a simple and fast estimation method based on the ratio ka/ks of amino acid changes to synonymous changes. To validate the approach we compare results to those from a state-of-the-art mixed-effect method. The agreement is acceptable, but the analysis also demonstrates some limitations of the simpler approach. Further, we find similar distributions of codons under stabilizing and directional selection pressure in gp120 for subtypes B and B' with more directional selection pressure in variable loops and more stabilizing selection in the constant regions. Focusing on codons with increased ka/ks values in B', we show that these codons are scattered over the whole of gp120, with remarkable clusters of higher density in regions flanking the variable loops. We identify a significant statistical association of glycosylation sites and codons with increased ka/ks values.