Differential selection in HIV-1 gp120 between subtype B and East Asian variant B'

  • Stefan Dang,

    Affiliation Research Group Bioinformatics, Center of Medical Biotechnology and Faculty of Biology, University of Duisburg-Essen, Essen 45117, Germany

  • Yan Wang,

    Affiliation AIDS and HIV Research Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China

  • Bettina Budeus,

    Affiliation Research Group Bioinformatics, Center of Medical Biotechnology and Faculty of Biology, University of Duisburg-Essen, Essen 45117, Germany

  • Jens Verheyen,

    Affiliation Institute of Virology, University of Duisburg-Essen, Essen 45117, Germany

  • Rongge Yang,

    Affiliation AIDS and HIV Research Group, State Key Laboratory of Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China

  • Daniel Hoffmann


    Affiliation Research Group Bioinformatics, Center of Medical Biotechnology and Faculty of Biology, University of Duisburg-Essen, Essen 45117, Germany

Differential selection in HIV-1 gp120 between subtype B and East Asian variant B'

  • Stefan Dang, 
  • Yan Wang, 
  • Bettina Budeus, 
  • Jens Verheyen, 
  • Rongge Yang, 
  • Daniel Hoffmann


HIV-1 evolves strongly and undergoes geographic differentiation as it spreads in diverse host populations around the world. For instance, distinct genomic backgrounds can be observed between the pandemic subtype B, prevalent in Europe and North-America, and its offspring clade B' in East Asia. Here we ask whether this differentiation affects the selection pressure experienced by the virus. To answer this question we evaluate selection pressure on the HIV-1 envelope protein gp120 at the level of individual codons using a simple and fast estimation method based on the ratio ka/ks of amino acid changes to synonymous changes. To validate the approach we compare results to those from a state-of-the-art mixed-effect method. The agreement is acceptable, but the analysis also demonstrates some limitations of the simpler approach. Further, we find similar distributions of codons under stabilizing and directional selection pressure in gp120 for subtypes B and B' with more directional selection pressure in variable loops and more stabilizing selection in the constant regions. Focusing on codons with increased ka/ks values in B', we show that these codons are scattered over the whole of gp120, with remarkable clusters of higher density in regions flanking the variable loops. We identify a significant statistical association of glycosylation sites and codons with increased ka/ks values.

A defining feature of HIV-1 is its ability to adapt to the immune systems of its hosts by evolving new variant viruses. Some of the evolved variations became practically fixed in certain geographical regions and led to the emergence of characteristic regional variants, as e.g. in the case of the East Asian subtype B' (Graf M, et al., 1998; Li Z, et al., 2012a), likely an offspring of the pandemic subtype B. Variants such as B' constitute novel genomic backgrounds of the retrovirus on which natural selection and neutral evolution will then operate anew. The effect of the different backgrounds on the further course of retroviral evolution is unknown. It can be envisioned that for different genomic backgrounds the evolution of resistance to anti-retroviral drugs is different, or that for the same HLA type immune escape paths depend on genomic background. A case in point is the observation of novel mutation patterns associated with drug resistance in East Asian clade B' (Deng X, et al., 2008).

In the present study we test the hypothesis that selection pressure depends on the retroviral genomic background. Similar studies have demonstrated that this is the case for the env gene and the reverse transcriptase in various subtypes and host populations (Choisy M, et al., 2004; Travers S A, et al., 2005; Pond S L, et al., 2006). Here, we specifically analyze selection pressure on the viral envelope protein gp120 of pandemic subtype B and the related East Asian subtype B'. We have chosen gp120 as this protein is exposed to a particularly strong selection pressure by the host immune system, and this might lead also to strong differences in selection pressure between B and B'. Further, we analyzed B and B' as these clades can, on one hand, be distinguished clearly (Wang Y, et al., 2013b), while on the other hand they are quite closely related, so that differences could be traced back to specific genomic patterns.

One way to quantify selection pressure is to evaluate for the codons of a protein coding gene the ratio of non-synonymous nucleotide mutations (leading to a different amino acid) to synonymous mutations of codons (not leading to a different amino acid), often termed dn/ds or ka/ks (Nei M, et al., 1986; Li W H, 1993). We distinguish directional selection, pushing a population away from an established state, and stabilizing selection, tying a population to an established state. When no selection pressure is present, we observe neutral evolution (Kimura M, 1968). For directional selection, the ka/ks ratio should take a higher value (amino acid mutations favored), for stabilizing selection a lower value (synonymous mutations favored). Often the methods for estimating ka/ks were applied on a per gene basis, implying averaging over the codons of the studied gene. However, it is possible that within a gene there are positions experiencing directional selection and other positions under stabilizing selection, so that averaging over all codons of a gene may lead to canceling of contributions from different codons, and hence to an underestimation of selection pressure. Therefore, other methods have been developed that allow estimation of selection pressure on a per codon basis. Often these methods employ complex probabilistic models to explain the observed mutations in a gene on the background of a phylogenetic tree (Nielsen R, et al., 1998; Huelsenbeck J P, et al., 2006; Murrell B, et al., 2012). While these methods promise accurate results, they usually require relatively costly calculations. A fast and simple approximation was developed by Chen L (2004). This method counts synonymous and non-synonymous mutations with respect to a reference sequence, and corrects ka/ks ratios by a null model derived from the data. The use of a single reference corresponds to the assumption of a star-like phylogeny, as discussed by Chen L (2006).

In the present work we first compare results for selection pressure in the HIV-1 gp120 of the East Asian subtype B' from the MEME method (Murrell B, et al., 2012) with results from the much simpler approach by Chen L (2004). MEME uses a complex mixed-effect model that can account for variation of selection over time and between branches of a phylogenetic tree. If results agree, this would justify using the simpler method, especially for larger data sets where computational demands for the complex model may exceed the resources.

The original method by Chen L (2004) not only estimated directional and stabilizing selection pressure in a codon-wise fashion, but also included a significance criterion for directional selection. Here we slightly extend the method to also allow for significance assessment for stabilizing selection. The extended method is then applied to gp120 of subtype B and of the East Asian variant B', and results discussed in terms of the structure of gp120. Finally, we analyze codons that have a increased values of ka/ks in B' compared to B. These patterns show an interesting distribution in the structure of gp120, and we find a statistical association of glycosylation sites with these codons.


Sequence data

All gp120 sequences were retrieved from the HIV database (http://hiv.lanl.gov) at Los Alamos National Laboratory (LANL). We included only one sequence per patient and excluded sequences that were flagged as "problematic". A total of 2212 sequences of subtype B (including B') were retrieved; we call this sequence set Btotal. Following Wang Y (2013b), sequences with an L/W-pattern in V3 were assigned to clade B'. With this criterion, the Btotal set of 2212 sequence was split into 120 B' sequences and 2092 B sequences (excluding B'). The latter set of 2092 sequences is used as representative for subtype B if not explicitly stated otherwise, especially in our comparisons with B'.

For the computation of ka/ks values, we used two reference sequences, namely HXB2 (GenBank K03455) (Ratner L, et al., 1985), a commonly used reference sequence for subtype B, and RL42 (GenBank U71182.1) (Graf M, et al., 1998), a widely used reference sequence for the East Asian variant B'. Throughout the paper, all codon numbers refer to gp120 in HXB2.

For the comparison between the two methods for the identification of codons under selection, the smaller B' set of sequences was first translated into amino acids with transeq (Rice P, et al., 2000), the sequences aligned with MAFFT (Katoh K, et al., 2002), and reversely transcribed with revtrans (Wernersson R, et al., 2003). In this way we obtained a codon-wise alignment for 118 B' sequences (two of the 120 original sequences could not be processed).

Prior to the analysis of differential selection pressure between B and B', the Btotal set of sequences was aligned with MAFFT (Katoh K, et al., 2002) using default parameters.

Computation of selection pressure

The reference method MEME was accessed via the server offered by the MEME authors (http://www.datamonkey.org, Delport W (2010)). The codon-wise alignment of B' sequences described above was submitted to the MEME method using default codon substitution bias model and significance level (last access October 5, 2013).

We used the kaksCodon method described in Chen L (2004) and implemented in R-package CorMut (Li Z, et al., 2012b), with one extension. Chen L (2004) use a log-odds ratio (LOD) as indicator of whether a value ka/ks > 1 is significant evidence for directional selection:

$ LO{D_{dir}} = {\log _{10}}P\left( {i \ge {N_Y}\left| {N,q,{{\left( {\frac{{{k_a}}}{{{k_s}}}} \right)}_{corr}}} \right.} \right) = 1 $ (1)

$ = {\log _{10}}\sum\nolimits_{i = {N_Y}}^N {\left( {\begin{array}{*{20}{c}} N\\ i \end{array}} \right){q^i}{{\left( {1 - q} \right)}^{N - i}}} $ (2)

with P the probability of having at the respective codon the observed number NY or more mutations to amino acid residue Y, N the total number of observed mutations at this site, q the a priori probability of a mutation to Y for a given null model including average transition and transversion probabilities estimated from the data, and (ka/ks)corr the ka/ks ratio, corrected by division by the the value of ka/ks expected under the null model. The null model is detailed in Chen L (2004).

Computing the LODdir according to Eq (1) corresponds to application of a right-tailed binomial test, e.g. with LODdir = 3 corresponding to a p-value of 10-3. This makes only sense as a significance criterion for directional selection, where the number of observed mutations is greater than expected under the null model.

We extended this significance indicator by an analogous quantity LODstabil for stabilizing selection:

$ LO{D_{stabil}} = {\log _{10}}P\left( {i \le {N_Y}\left| {N,q,{{\left( {\frac{{{k_a}}}{{{k_s}}}} \right)}_{corr}}} \right.} \right) = 1 $ (3)

$ = {\log _{10}}\sum\nolimits_{i = 0}^{N_Y} {\left( {\begin{array}{*{20}{c}} N\\ i \end{array}} \right){q^i}{{\left( {1 - q} \right)}^{N - i}}} $ (4)

corresponding to the application of a left-tailed binomial test. In practice, we obtained a combined LOD for both stabilizing and directional selection as LOD=-log10p with the p-value resulting from a two-tailed binomial test for given N, NY, q.

In the following we drop the index corr of ka/ks, i.e. ka/ks is meant as symbol for a ratio of the naï ve ratio of ka and ks divided by the corresponding ratio under the null model described in Chen L (2004). This is also the output of function kaksCodon in R-package CorMut (Li Z, et al., 2012b), implementing the method proposed by Chen L (2004). We used version 1.2.0 of CorMut, downloaded from http://www.bioconductor.org.

In this paper, codons with ka/ks > 1 (or log (ka/ks) > 0) and LOD > 3 are considered to be under significant directional selection, codons with ka/ks < 1 (or log (ka/ks) < 0) and LOD > 3 are considered to be under significant stabilizing selection.

All statistical analyses were conducted with the R software (R Development Core Team, 2006), version 3.


Comparison of selection pressure estimates for gp120

We first compared the codon-wise selection pressures estimated by the MEME method (Murrell B, et al., 2012) with its more complex and powerful model, with results obtained by using the faster and simpler method of Chen L (2004) (in the following called "kaksCodon"), as implemented in the kaksCodon function of R-package CorMut (Li Z, et al., 2012b).

In the B' alignment covering 629 codons, we identified with kaksCodon 38 codons under directional selection with log-odds ratio LOD > 3, compared to 113 codons under significant directional selection p ≤ 0.05 found with MEME. 25 codons were identified by both methods. The difference between 113 codons from MEME and 38 from kaksCodon is likely due to the more complex statistical model of MEME, which is more sensitive than the simple counting method in kaksCodon. The latter lacks power when applied to the comparatively small number of B' sequences. But as long as we find a reasonable number of positions, this high false negative rate may be tolerated if we aim at finding out whether there is differential selection at all. It is unclear whether the 13 codons predicted by kaksCodon but not by MEME are false positives. Irrespective of this question, the predictions for directional selection by both methods are highly significantly associated (a Fisher exact test yields p = 1.6×10-11). Moreover, the power of the simple counting method should increase with the number of sequences, while the computational effort remains small. Thus we consider kaksCodon an acceptable method for the fast screening for positions under selection.

Selection pressure and structure of gp120

Apart from checking agreement of the kaksCodon method with a reference method such as MEME, it is also important to test the results of kaksCodon against independent information. For instance it is plausible that parts of gp120 with the greatest exposure to the immune system, e.g. to antibodies, will experience strong directional selection pressure, while parts of gp120 responsible for the interior architecture of the protein will be subject to stabilizing selection pressure. We expect to find this pattern in the data, irrespective of whether we evaluate sequences of subtype B or of East Asian variant B'. Indeed, Figure 1 and 2 show agreement between structural elements and type of selection pressure. The N-terminal signal peptide and the variable loops V1 to V5 are predominantly under directional selection, and it is also there where the maximum values of ka/ks are reached. Conversely, the constant regions C1 to C5 are predominantly under stabilizing selection, and it is there where the positions under strongest stabilizing selection lie. However, the picture is not black-and-white: e.g. some positions in V2 or V3 are under strong stabilizing selection, while in C3 there are relatively many positions under directional selection.

Fig 1. kaksCodon analysis along gp120 of subtype B. The plot gives the value of log (ka/ks) for all codons (black crosses). Codons under significant directional selection are additionally marked by a red circle, codons under significant stabilizing selection by a blue triangle. Vertical dashed lines indicate boundaries of gp120 structural elements.

Fig 2. kaksCodon analysis along gp120 of East Asian subtype B'. For explanations see caption of Figure 1.

This overvall picture is the same for the subtype B data (Figure 1) and for the B' data (Figure 2), although the smaller size of the B' data leads to less statistical power and thus predicts significant selection for fewer codons.

Do gp120 B and B' experience overall different selection?

While Figure 1 and 2 show that the selection pressure is distributed similarly over the substructures in B and B', there may still be a trend towards overall higher or lower selection pressure in B'. For instance, if B' is not yet well adapted to its host population, there could be overall a higher selection pressure in B', compared to, say, B in Europe or North-America.

If we have a global deviation between selection pressure on corresponding codons i in B and in B', this should show up in a plot of (ka/ks)B, i against (ka/ks)B′, i, or the logarithms thereof. Figure 3 indicates that there is no such systematic global deviation, as explained in the following. All points in the Figure are roughly normally distributed around the bisecting line (slope 1), which means that there is no strong global difference between ka/ks in B and B'. However, when we fit a linear model to the points we find a small but significant deviation from slope 1. For instance, if we use subtype B reference sequence HXB2 (GenBank K03455) as reference in kaksCodon (black circles), the least-square fit linear model (dashed black line) has a slope of 0.87±0.05 (± standard error). A na ve interpretation of this result is that the selection pressure in B' is more extreme, leading to a larger variation of the ka/ks-values in B', and thus to a slope < 1 in Figure 3. But we have to keep in mind that kaksCodon uses a simplified estimation of ka/ks that assumes a star-like phylogeny rooted in a reference sequence. Accordingly, if we use the usual B' reference sequence RL42 as reference in ka/ks Codon, we have less variation along the (ka/ks)B′ axis, and more along the (ka/ks)B axis, with a slope of the linear model of 1.11±0.06. Hence, we can explain global deviations between selection pressures in B and B' obtained with kaksCodon as an artificial "reference sequence bias" due to the use of reference sequences in the computation of ka/ks. Thus we could not find evidence for globally different selection pressure between B and B'.

Fig 3. Relation between overall selection pressures on B and B'. For all codons common to the two reference sequences HXB2 and RL42 the log (ka/ks) values for these codons in B' (x-axis) and B (y-axis) are plotted. Black symbols: log (ka/ks) values with reference sequence HXB2 in kaksCodon method; red symbols: log (ka/ks) values for reference sequence RL42. The black dashed and red dotted lines are the least-square fits to the red and black symbols, respectively. The solid black line corresponds to log (ka/ks)B = log (ka/ks)B′.

Codons under increased directional selection in East Asian B'

The lack of evidence for global differences of selection pressure between B and B' does not preclude differences between the two clades at specific codons. We therefore proceeded to a codon-wise analysis to identify codons with differences in selection pressure between B and B'.

Our initial hypothesis was that the encounter of HIV-1 with a new host population that has a different composition of HLA-types etc. may exert differential selection pressure at specific sites of gp120. We therefore focused in our analysis on a subset of codons i satisfying the following three conditions: (1) Positions i had to have non-gap counterparts in both reference sequences HXB2 and RL42; this allowed us to check for reference sequence bias. (2) ka/ks for the respective codon i for the B' sequences had to be higher than the estimate for the corresponding codon i in the B sequences:

$ {\Delta _i} = {\left( {\frac{{{k_a}}}{{{k_s}}}} \right)_{B',i}} - {\left( {\frac{{{k_a}}}{{{k_s}}}} \right)_{B,i}} > 0 $ (5)

We required Eq (5) to hold irrespective of whether reference sequence HXB2 or RL42 was used in the kaksCodon calculation. (3) Only positions were considered with LOD > 3 for the subtype B ensemble of sequences; in this way we made sure that the ka/ks-shift Δi was operating on a position under significant selection pressure.

Figure 4 shows the distribution of the 83 codons i fulfilling all three conditions (see Appendix for complete list). At each position, two values of Δi are given computed with kaksCodon, one for reference sequence HXB2 (crosses), another for reference sequence RL42 (circles). The distance between cross and circle is a measure of the precision of Δi. In most cases, this distance is small, i.e. the precision is good. Interestingly, most of the variable loops do not carry many positions with Δi > 0, with the exception of V2. However, high-Δi positions seem to cluster at the boundaries of most variable loops, and in some C-regions between variable loops. This observation and Figure 1 and 2 are in agreement with the layered structural model of gp120 proposed by Pancera M (2010). In this model, gp120 is organized in several layers, from a conserved layer around the core interface to the gp41 fusion machinery, over an adapter layer, to a highly variable outer layer facing the host immune system. According to Figure 1 and 2, codons under directional selection pressure cluster most strongly in the variable loops. The changes in these regions may be buffered by the adapter layer, probably including the regions flanking the variable loops. This buffering could necessitate a softening of this adapter layer, leading to a decreased stabilizing selection that may even turn into a directional selection.

Fig 4. Distribution of codons i with Δi > 0 along gp120 sequence. At each codon i, cross and circle gives Δi obtained from kaksCodon with reference sequences HXB2 and RL42, respectively. Blue vertical lines indicate boundaries of variable loops V1 to V5.

Another remarkable feature of Figure 4 is that Δi is small for many of these codons. This is consistent with the observation that many of these codons lie in C-regions where stabilizing selection dominates (ka/ks < 1), see also Figure 1. Since our codon subset contains only positions with Δi > 0 this means that for positions under stabilizing selection in subtype B, this stabilizing selection could be weakened in B', so that ka/ks is shifted towards the neutral value of 1. We discuss this point in the following.

For a codon i fulfilling Eq (5) there are in general three possibilities: (a) We can have stabilizing selection in B and a weaker stabilizing selection in B' if (ka/ks)B′, I = (ka/ks)B′ii < 1; (b) Δi can shift selection from stabilizing to directional; (c) Δi can turn directional selection in B into a stronger directional selection in B'. Figure 5 shows which of these possibilities are realized amongst the 83 positions. The Figure confirms that most of these codons are under stabilizing selection in subtype B (log (ka/ks)B′i < 0), and that therefore most codons fall into cases (a) or (b), i.e. the stabilizing selection is weakened or turned into directional selection. However, one has to be cautious with the interpretation of the differences, since for the smaller B' set the power of kaksCodon is much lower than for the larger B set, and hence for some of the codons Δi > 0 may due to this smaller power. Still, some of the codons have log (ka/ks)B′I > 0 and thus fall into category (c) (upper right corner in Figure 5).

Fig 5. Selection pressure (ka/ks)B, i in subtype B vs. change Δi of selection pressure at the same 83 codons i in East Asian B', both quantities scaled logarithmically

A detailed, codon-by-codon discussion of these 83 codons lies beyond the scope of this work. However, a preliminary analysis gave first insights into possible functions of these codons. When comparing the 83 posi-tions with a high Δi with functionally annotated positions of the HXB2 reference sequence as available from the HIV database at LANL (http://hiv.lanl.gov/), we discov-ered that a conspicuously high number of the codons were situated close to one of the 23 annotated glycosylation sites in gp120.9 of these sites were a maximum of one codon away from one of the 83 positions. A Fisher exact test yields a p-value of 0.006 for an association between glycosylation and membership in the set of the 83 codons with Δi > 0, i.e. glycosylation sites are significantly over-represented in this set. This could mean that tuning of glycosylation sites could be an adaptation mechanism in the transition from subtype B to its East Asian variant B'.


The importance of glycosylation is underlined by a very recent systematic study by Wang W (2013a), showing firstly, a significant effect of specific Env glycosylations on infectivity, and secondly, that this effect depends strongly on HIV-1 subtype. Our findings are also consistent with an earlier study (Choisy M, et al., 2004) that had found with a maximum-likelihood method a strong association of N-glycosylation sites with sites under directional selection in env in various subtypes of HIV-1 and HIV-2. However, at the moment we cannot exclude that there are confounding factors, such as surface exposure. Glycosylation sites are naturally exposed to the solvent, and adapter function may also be easier to accommodate at the protein surface. We hope that the ongoing efforts to determine the structure and thus surface exposure of the functional Env spike (Liu J, et al., 2008; Mao Y, et al., 2012; Tran E E, et al., 2012) will hopefully allow to decide this question. Moreover, we expect that in the future the increasing availability of sequences will allow for higher statistical power in the analysis of selection pressure, and also for more detailed analyses of the evolutionary dynamics of HIV-1 and its changing selection patterns over time.


The authors gratefully acknowledge funding by Deutsche Forschungsgemeinschaft (http://www.dfg.de), grant TRR60/A6; the University of Duisburg-Essen (http://www.uni-due.de); and the Chinese Key National Science and Technology Program in the 12th Five-Year Period, grant 2012ZX10001006-002.


Designed research: DH; contributed data: YW, RY; performed research: SD, DH; interpreted data: SD, YW, BB, JV, RY, DH; wrote the paper: DH.


This is the list of the 83 codons fulfilling the criteria 1-3 described in the text (numbering refers to gp120 in reference genome HXB2 as defined in the HIV database at LANL, http://hiv.lanl.gov):

25, 36, 37, 38, 43, 47, 48, 51, 59, 62, 65, 68, 70, 72, 76, 78, 79, 82 83, 91, 93, 94, 102, 109, 110, 111, 115, 116, 117, 120, 121, 122, 125, 130, 155, 156 159, 160, 168, 171, 174, 176, 178, 183, 193, 197, 198, 200, 202, 208, 210, 211, 213, 216 221, 224, 225, 239, 242, 250, 255, 258, 268, 269, 278, 279, 281, 284, 286, 288, 292, 294 296, 298, 299, 300, 301, 302, 307, 321, 325, 327, 330, 332, 334, 339, 340, 342, 344, 349 369, 372, 375, 376, 377, 379, 380, 381, 385, 386, 389, 419, 421, 428, 432, 442, 444, 448 450, 453, 456, 466, 469, 470, 471, 478, 482, 484, 485, 492, 499, 504, 508, 509, 511


  1. . Chen L, Lee C. 2006. Distinguishing hiv-1 drug resistance, accessory, and viral fitness mutations using conditional selection pressure analysis of treated versus untreated patient samples. Biol Direct, 1: 14.
  2. . Chen L, Perlina A, Lee C J. 2004. Positive selection detection in 40, 000 human immunodeficiency virus (hiv) type 1 sequences automatically identifies drug resistance and positive fitness mutations in hiv protease and reverse transcriptase. J Virol, 78(7): 3722-3732.
  3. . Choisy M, Woelk C H, Guéan J F, Robertson D L. 2004. Comparative study of adaptive molecular evolution in different human immunodeficiency virus groups and subtypes. J Virol, 78(4): 1962-1970.
  4. . Delport W, Poon A F, Frost S D, KosakovskyPond S L. 2010. Datamonkey 2010:a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics, 26(19): 2455-2457.
  5. . Deng X, Liu H, Shao Y, Rayner S, Yang R. 2008. The epidemic origin and molecular properties of b' a founder strain of the hiv-1 transmission in asia. AIDS, 22(14): 1851-1858.
  6. . Graf M, Shao Y, Zhao Q, Seidl T, K?stler J, Wolf H, Wagner R. 1998. Cloning and characterization of a virtually full-length hiv type 1 genome from a subtype b'-thai strain representing the most prevalent b-clade isolate in china. AIDS Res Hum Retroviruses, 14(3): 285-288.
  7. . Huelsenbeck J P, Jain S, Frost S W, Pond S L. 2006. A dirichlet process model for detecting positive selection in protein-coding dna sequences. Proc Natl Acad Sci U S A, 103(16): 6263-6268.
  8. . Katoh K, Misawa K, Kuma K, Miyata T. 2002. Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res, 30(14): 3059-3066.
  9. . Kimura M. 1968. Evolutionary rate at the molecular level. Nature, 217(5129): 624-626.
  10. . Li W H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol, 36(1): 96-99.
  11. . Li Z, He X, Li F, Yang Y, Wang Q, Wang Z, Xing H, Takebe Y, Shao Y. 2012a. Tracing the origin and history of hiv-1 subtype b' epidemic in china by near full-length genome analyses. AIDS, 26(7): 877-884.
  12. . Li Z, Huang Y, Ouyang Y, Shao Y, and Ma L. 2012b. CorMut: Detect the correlated mutations based on selection pressure. R package version 1. 2. 0.
  13. . Liu J, Bartesaghi A, Borgnia M J, Sapiro G, Subramaniam S. 2008. Molecular architecture of native hiv-1 gp120 trimers. Nature, 455(7209): 109-113.
  14. . Mao Y, Wang L, Gu C, Herschhorn A, Xiang S H, Haim H, Yang X, Sodroski J. 2012. Subunit organization of the membrane-bound hiv-1 envelope glycoprotein trimer. Nat Struct Mol Biol, 19(9): 893-899.
  15. . Murrell B, Wertheim J O, Moola S, Weighill T, Scheffler K, Kosakovsky Pond S L. 2012. Detecting individual sites subject to episodic diversifying selection. PLoS Genet, 8(7): e1002764.
  16. . Nei M, Gojobori T. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol, 3(5): 418-426.
  17. . Nielsen R, Yang Z. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the hiv-1 envelope gene. Genetics, 148(3): 929-936.
  18. . Pancera M, Majeed S, Ban Y E, Chen L, Huang C C, Kong L, Kwon Y D, Stuckey J, Zhou T, Robinson J E, Schief W R, Sodroski J, Wyatt R, Kwong P D. 2010. Structure of hiv-1 gp120 with gp41-interactive region reveals layered envelope architecture and basis of conformational mobility. Proc Natl Acad Sci U S A, 107: 1166-1171.
  19. . Pond S L, Frost S D, Grossman Z, Gravenor M B, Richman D D, Brown A J. 2006. Adaptation to different human populations by hiv-1 revealed by codon-based analyses. PLoS Comput Biol, 2(6): e62.
  20. . R Development Core Team. 2006. R: A Language and Environment for Statistical Computing. R version 3. 0. 0; http://www.R-project.org.
  21. . Ratner L, Haseltine W, Patarca R, Livak K J, Starcich B, Josephs S F, Doran E R, Rafalski J A, Whitehorn E A, Baumeister K. 1985. Complete nucleotide sequence of the aids virus, htlv-iii. Nature, 313(6000): 277-284.
  22. . Rice P, Longden I, Bleasby A. 2000. Emboss: the european molecular biology open software suite. Trends Genet, 16(6): 276-277.
  23. . Tran E E, Borgnia M J, Kuybeda O, Schauder D M, Bartesaghi A, Frank G A, Sapiro G, Milne J L, Subramaniam S. 2012. Structural mechanism of trimeric hiv-1 envelope glycoprotein activation. PLoS Pathog, 8(7): e1002797.
  24. . Travers S A, O'Connell M J, McCormack G P, McInerney J O. 2005. Evidence for heterogeneous selective pressures in the evolution of the env gene in different human immunodeficiency virus type 1 subtypes. J Virol, 79(3): 1836-1841.
  25. . Wang W, Nie J, Prochnow C, Truong C, Jia Z, Wang S, Chen X S, Wang Y. 2013a. A systematic study of the n-glycosylation sites of hiv-1 envelope protein on infectivity and antibody-mediated neutralization. Retrovirology, 10: 14.
  26. . Wang Y, Rawi R, Wilms C, Heider D, Yang R, Hoffmann D. 2013b. A small set of succinct signature patterns distinguishes chinese and non-chinese hiv-1 genomes. PLoS One, 8(3): e58804.
  27. . Wernersson R, Pedersen A G. 2003. Revtrans: Multiple alignment of coding dna from aligned amino acid sequences. Nucleic Acids Res, 31(13): 3537-3539.