HTML
-
All Nairovirus sequences were downloaded from Genbank. 62 Full length S segment sequences were selected and aligned using ClustalX v2.0 [32] and gaps were removed to give a final alignment of 1 461 nt. Trees were estimated with the MEGA5 software package [30]using the Neighbourhood Joining method with the Maximum Likelihood Composite method (Tamura-Nei distance matrix) and uniform rates among sites. Sequence accession numbers and background information are listed in Table 1.
Table 1. Background information of sequences used in phylogenetic analysis and mutation analysis.
To identify sites containing mutations that reflected amino acid changes with significantly different properties, eight specific entries were selected from the Amino Acid Index (AAI) Database [17] that reflected changes in charge, volume, pKa and hydrophobicity. The accession numbers of the selected entries were FAUJ880112-Negative Charge [11], FAUJ880113-Positive Charge [11], FAUJ880114-pK-a value [11], GOLD730102-Residue volume [14], TSAJ990101-Packing Density [34], KRIW790103-Side chain volume [18], EISD840101-Consensus normalized hydrophobicity scale [8], ROSM880105-Hydropathies of amino acid side chains [27]. Each alignment was translated to amino acid and analyzed in turn with each AAI entry. Each sequence was inspected in turn and compared to the consensus for the entire set of CCHFV sequences. For charge entries, any mutation that produced a change in charge from neutral, positive or negative was considered significant. For other entries, the change in a parameter brought about by a mutation at a site was considered significant if
where △abI is the change in amino acid index I when amino acid a mutates to amino acid b.
The alignment was analyzed using a custom java program available from the authors on request.
-
The predicted tree for the S, segment is shown in Fig. 1. Each of the trees exhibit clear geographic subdivision, identifying seven clades that were named Asia 1, Asia 2, Europe 1, Europe 2, Africa 1, Africa 2 and Africa 3 and which are consistent with results from previous studies [2, 4, 5, 13, 15, 16, 22, 28, 29].
-
We next used entries from the AminoAcidIndex database [17] to analyze the alignment and investigate whether there were specific mutations that were more probable, or regions where mutations were more likely to occur. We investigated mutations that produced changes in charge, hydrophobicity and volume and mapped these mutations to the Asia 1, Asia 2, Europe 1, Europe 2, Europe 3, Africa 1, Africa 2 and Africa 3 clades identified in the previous section. The mutations are listed in Tables 2, 3 and 4. The results for Residue volume and Packing Density were identical so only Residue Volume is shown.
Table 2. Amino acid mutations leading to charge change in the alignment as classified by clades defined in Fig. 1.
Table 3. Amino acid mutations leading to significant changes in pKa and hydropathy values in the alignment as classified by clades defined in Fig. 1. The shaded mutations correspond to mutations that occurred outside the main European (Europe 1) and African (Africa 3) clades
Table 4. Amino acid mutations leading to charge change in the alignment as classified by clades defined in Fig. 1. The shaded mutations correspond to mutations that occurred outside the main European (Europe 1) and African (Africa 3) clades.
The most notable result is that in every category the Asia 2 clade appears to contain many more mutations than any of the other clades. Although this clade contains twice as many sequences as the other clades, this still doesn't appear to account for many of the observed differences. For negative charge mutations the numbers of changes were (Africa 3: 9 sequences / 3 mutations, Asia 1: 8 sequences / 1 mutation, Asia 2: 20 sequences / 19 mutations, Europe 1: 10 sequences / 1 mutation). Similarly for the pka index (Africa 3: 9 seqs / 2 muts, Asia 1: 8 seqs / 2 muts, Asia 2: 20 seqs / 22 muts, Europe 1: 10 seqs / 11 muts); hydrophobicity index (Africa 3: 9 seqs / 3 muts, Asia 1: 8 seqs / 1 muts, Asia 2: 20 seqs / 32 muts, Europe 1: 10 seqs / 4 muts); volume index (Africa 3: 9 seqs / 4 muts, Asia 1: 8 seqs / 2 muts, Asia 2: 20 seqs / 13 muts, Europe 1: 10 seqs / 4 muts). Even when the Europe 1 and Europe 2 clades and the Africa 1, Africa 2 & Africa 3 clades were consolidated into single European and African clades respectively, they still contained fewer mutations despite their greater genetic diversity (these additional mutations are highlighted in grey in Tables 2, 3 and 4).
There is no solved structure for the N protein, so it is difficult to determine the significance of these changes, particularly for parameters such as changes in side chain volume. In order to try and identify mutations of possible interest, we next mapped all these changes on to a graphical representation of the alignment. These are shown in Fig. 2 (charge change) and Fig. 3 (pka, hydropathy and volume changes). Again, the greater number of mutations in the Asia 2 clade is clear, but additional features are also apparent. First of all, there is a pair of negative charge mutations that appear to be present in several of the Asia 2 sequences (D127N and N266D). Since the first mutation produces a charge change of +1 and the second produces a change of -1 these two sites may be compensatory. Secondly, the Africa 1 sequences contain two adjacent mutations K262N (charge change -1) and E263G (charge change +1). A similar pair mutation is also present in the Europe 2 sequence, K262N (charge change -1) and D263G (charge change +1) suggesting that these sites also play an important functional or structural role. The schematic for pka, hydropathy and volume changes is more difficult to interpret because these types of changes can occur without producing the same impact as charge changes but it is clear that the Asia 2 clade once again contains more mutations than the other clades. Another interesting feature occurs around AA263 where the Europe and African clades contain a number of mutations that produce significant changes in all three indices.
Figure 2. Graphical representation of positive (green) and negative (red) changes in charge from the consensus sequence. A disproportionate amount of mutations that produce a negative change in charge occur in the Asia 2 clade, with multiple sequence containing the change at two sites (D127N and N266D) suggesting that these may represent compensatory mutations.
Figure 3. Graphical representation of mutations that produced significant changes in pka (orange), hydropathy (blue) and side change volume (red boxes) from the consensus sequence. Sites which contain mutations that change both pka and hydropathy are shown as hatched yellow/blue with a red border if there was also a significant change in volume. Consistent with figure 2, a disproportionate amount of mutations occur in the Asia 2 clade, with multiple changes also mapped to the two sites 127 and 266.
Finally, we mapped all the identified changes on to the predicted tree. These are shown in Fig. 4 (charge change) and Fig. 5(pka, hydropathy and volume changes). The plot identifies site mutations that occur in multiple sequences (vertical lines) and sites that contain mutations that have significant changes in multiple indices (horizontal lines). Sites which occur in multiple sequences and have changes in multiple indices are marked with both horizontal and vertical lines. In Fig. 4, only the Asia 2 clade contains multiple sequences with shared mutations. The compensatory positive/negative mutations in the Senegal sequences (Africa 1-DQ211639 and DQ211640)) are also apparent. In Fig. 4, the Uganda (DQ076413) and Congo (DQ211650) sequences in the Africa 2 clade have two mutations at two different sites that modify all three indices. Other mutations in this Africa 2 clade also modify the same sites in the Africa 1 sequences (Senegal DQ211639 and Senegal DQ211640). The box in the bottom right of the figure that spans the Africa 1, Africa 2 and Europe 2 clades delimit the cluster of mutations that are apparent in Fig. 3 around site AA262 which are shared across the clades and which modify three indices.
Figure 4. Positive and negative charge mutations in figure 2 mapped on to the predicted tree in figure 1. Mutations that occur at the same site in multiple sequences are connected by a vertical line. Sequences that have a mutation that changes both indices, i.e. a mutation that changes the site from positive to negative (or vice versa), are connected by a horizontal line (e.g. sequence GQ337053 contains three mutations at three different sites that produce positive to negative changes in charge). Only the Asia 2 clade contains multiple sequences with shared mutations. The Senegal sequences (Africa 1 -DQ211639 and DQ211640)) share compensatory positive/negative mutations.
Figure 5. Significant pka (yellow), hydropathy (blue) and volume mutations (red) mapped on to the predicted tree in figure 1. Mutations that occur at the same site in multiple sequences are connected by a vertical line. Sequences that have a mutation that changes more than one index at the same site are connected by a horizontal line. For example, the Uganda (DQ076413) and Congo (DQ211650) sequences in the Africa 2 clade have two mutations at two different sites (AA 181 & AA 311) that modify all three indices. Other mutations in this Africa 2 clade also modify the same sites in the Africa 1 sequences (Senegal DQ211639 and Senegal DQ211640). The box in the bottom right of the figure that spans the Africa 1, Africa 2 and Europe 2 clades delimit a set of mutations and sites that are shared across the clades and which modify three indices.