-
The Bayesian phylogenetic trees were constructed based on the SP309 and NSP309 datasets respectively (data not shown), which had identical tree topology similar to the tree constructed with full-length open reading frames (ORFs) sequence as described (Volk et al., 2010). Then we used the phylogenetic tree based on the SP309 dataset for the subsequent analyses (Figure 1). As revealed in Figure 1, the epidemic CHIKV strains had an ancestor of about 1600 years ago, and now could be classified as three major lineages (West Africa [WA], East Central South Africa [ECSA], and Asia lineages) with close association with their geographic distributions and epidemic times. The lineage WA included all viruses from West Africa detected before 1993. Based on the phylogenetic Bayes tree, the most common ancestor (tMRCA) of the WA lineage could be dated to 1953 when the first CHIKV strain was isolated in Tanzania (Lumsden, 1955). The ECSA and Asia lineages evolved separately from WA. They shared a common ancestor until 1874 and then diverged from each other. The history of epidemics caused by strains of Asia lineage could be traced back to 1952, earlier than the Thailand outbreak in 1958 when the first CHIKV epidemic in Asia was confirmed (Volk et al., 2010). The Asia lineage was then divided into two sub-lineages with the tMRCA dated to 1996. The Asia.old sub-lineage mainly contained the strains identified in South East Asia during 1958–1996, and the Asia.reemerge sub-lineage comprised the strains responsible for epidemics in America and islands in Caribbean during 2014–2015, which indicated the introduction of CHIKV into to North/South America and Caribbean Sea islands from Southeast Asia. According to the spatial and temporal distribution of CHIKV strains, the ECSA lineage was diverged into two sub-lineages, the ECSA.Africa (old genotype) and the ECSA.IOL (new genotype). ECSA.Africa composed of strains from the epidemics in Eastern/Central/Southern Africa mainly from 1953 to 1984 and four isolates identified from 2014 to 2015 in Southern Africa (Figure 1), indicating the existence of the old genotype in this area after a disappearance of virus activity for more than twenty years. The ECSA.IOL sub-lineage mainly consisted of viruses responsible for the outbreaks in islands of Indian Ocean and Indian subcontinent region from 2005 to 2009, and outbreaks in Southeast Asia from 2007 to 2013. In addition, we noted a few strains belonging to the lineages inconsistent with their spatial and temporal distributions. The basal location of isolates from Kenya was observed in the ECSA.IOL sub-lineage (Figure 1), which supported the speculation that the 2004 epidemic in Kenya promoted the evolution of IOL strains and their subsequent migration to Indian Ocean islands and Indian subcontinent (Kariuki Njenga et al., 2008). One strain isolated from Indian 1986 was included in ECSA.Africa lineage, suggesting CHIKV spreading from Africa to Indian at least 20 years before the 2004 epidemic (Figure 1). Therefore, these discrepancies suggested migrations and genetic exchanges among different lineages.
Figure 1. Evolution of 309 CHIKV strains basing on sequences coding structural proteins. The MCC Bayesian tree was based on strict clock model and coalescent constant population tree prior. The three major lineages are highlighted with different branch colors. And the sub-lineages are distinguished by translucent colored rectangle. The estimated time for the most recent common ancestors are labeled beside the node. Strains highlighted by red solid dot represented CHIKV isolated from Pakistan. One strain highlighted by green solid dot represents the speculated origination of IOL sublineage. Four strains high-lighted by black solid dot represent the recent CHIKV (isolated in 2014 or 2015) belonging to the ECSA. Africa sub-lineage. One strain highlighted by blue solid dot represents CHIKV isolated from Indian Ocean Islands but belongs to ECSA. Africa sub-lineage
It was the first time that CHIKV isolates were reported from Pakistan. Pairwise comparison of the nucleotide sequences of the Pakistani isolates showed that they shared very high similarities (99%) to each other except that the strains Pakistan-03 and Pakistan-04 shared 98% similarity. The eight strains from Pakistan were also involved in the phylogenetic tree, which showed that they clustered together and belonged to the ECSA.IOL sub-lineage. Phylogenetic analysis also showed that the Pakistan strains were mostly close to the strains identified from Indian in 2016 (Figure 1).
-
The CHIKV genomic sequence had two ORFs, one encoding a non-structural polyprotein precursor and the other encoding a structural protein precursor. The non-structural polyprotein precursor would be cleaved into four NSPs (nsP1, nsP2, nsP3 and nsP4) and played roles in viral RNA replication and translation. The structural protein precursor would be cleaved into five SPs: core protein (C), 6K/Transframe protein (6K), envelope proteins (E1, E2, and E3) and was suggested to be related to the host specificity of CHIKV. We investigated the lineage-specific amino acid varieties in both structural and non-structural proteins based on the 309 CHIKV sequences (Figure 2, Table 1). In total, we found that 32 sites had amino acid varieties in the Asia.reemerge sublineage, 13 in NSPs and 19 in SPs, 22 sites in the Asia.old lineage with 9 in NSPs and 13 in SPs, and 14 sites in the ECSA.IOL lineage with 8 in NSPs and 6 in SPs. Varieties were found in much fewer sites in the ECSA.Africa with 3 in NSPs and 2 in SPs, as well as the WA lineage with 2 in NSPs and 1 in SPs. We further calculated the evolutionary rate of each lineage based on the SP309 dataset according to the phylogenetic analyses (Table 2). We found that the ECSA.IOL sub-lineage had the highest substitution rate. Overall, the ECSA.IOL, Asia.reemerge and Asia.old lineages had higher evolutionary rates than the ECSA.Africa and WA lineages. So, the lineages with higher evolutionary rates might evolve more rapidly and resulted in varieties in more sites. We also found that among all lineages, E2 protein presented varieties in 11 sites and was the most variable protein comparing to other viral proteins. NS3 was the most stable protein with only a 4aa deletion in Asia.reemerge sub-lineage. The eight strains from Pakistan presented specific variations in a few sites comparing to other IOL strains. Most IOL strains employed L539 in NS2, R82 in NS4, E211 and A226 in E1, and K252 in E2. Differently, all 8 strains used S539 in NS2, S82 in NS4, K211 and V226 in E1, and Q252 in E2. The differences from the Pakistani strains might indicate the further evolution of the strains during the epidemics. Subsequently, to better understand the evolutionary influences on CHIKV genotypes, we characterized the detailed varieties in viral proteins of the two newly derived sub-lineages.
Figure 2. The Lineage-specific varieties in non-structural and structural proteins. The amino acid varieties in sites are represented by rectangles in different colors. The amino acid deletion is shown in grey box. The height of the rectangle is in proportion to the amount of stains in each lineage. The amino acid positions in each viral protein are indicated by corresponding numbers and those related to the lineage-specific varieties are labeled in red numbers.
Lineage NS1 NS2 NS3 NS4 C E3 E2 K E1 Asia.reemerge P/S3
K/M253
S/G454P/L16
Q/L273
K/M338
N/S768Del:380-382 M/T58
T/V101
Q/R235
E/D280
V/A582P/S23
Q/K37
A/V55
Q/R78
A/V93Q/R19
K/S44
S/R60T/I2
N/H3
V/A157
G/S194
G/D205
N/S207
V/A368T/M45
A/T47A/T98
P/S304Asia.old P/S3
K/M253
S/G454P/L16K/M338 M/T58
T/V101
E/D280
V/A582Q/K37
A/V55
Q/R78
A/V93K/S44
S/R60T/I2
V/A157
G/S194
G/D205A/T47 A/T98
P/S304ECSA.IOL T/K128 S/N54
H/Y374
L/S539
L/A43
R/S82
T/A254
A/T366
Q/L500K/Q252
V/A386S/A72
E/K211
A/V226
D/E284ECSA.Africa H/Y374 L/A43A/T366 A/T164 S/N72 WA T/A128 G/N226 A/T164 Table 1. Lineage-specific mutations of CHIKV. The different mutations in sites of CHIKV from Pakistan comparing to most other IOL strains were indicated in bold characters
Lineage Evolutionary rate
(subs/nt/year)Transmission
patternAsia.old 4.078E-4 epidemic Asia.reemerge 2.596E-4 epidemic ECSA.Africa 2.27E-4 enzootic ECSA.IOL 7.43E-4 epidemic West Africa 2.203E-4 enzootic Total 2.978E-4 -- Table 2. Evolutionary rate of each lineage
-
The ECSA.IOL specific varieties. All strains in ECSA.IOL lineage used A386 in E2 protein, which could be significantly distinguished from other lineage (Figure 2, Figure 3A, Table 1). However, this extra-lineage specific substitution was not described previously and the influence of the substitution remained to be investigated. Moreover, while strains of ECSA.Africa lineage all used A226 in E1 protein, the strains of ECSA.IOL lineage used either A or V at position 226 (Figure 2, Figure 3A, Table 1). It was suggested that the use of V226 in E1 protein made the ECSA.IOL strains more adaptive to the A. albopictus, which was more widely distributed than other related mosquito vectors in the world. Some IOL strains with V226 mutation in E1 protein further acquired a K to Q change at position 252 in E2 protein (Figure 2, Figure 3A, Table 1). This acquisition was reported to be related to enhanced infection efficiency to A. albopictus, which was experimentally confirmed using a reverse genetic system of CHIKV (Tsetsarkin et al., 2014). As with the strains from Pakistan, no the adaptive-related substitutions specific for the IOL lineages were obtained, as A226 in E1 protein and K252 in E2 protein were employed (Table 1). The IOL-specific varieties in non-structural proteins were found including extra-lineage variations in three sites that distinguished the IOL strains from the others (N54 in NS2, and A254 and L500 in NS4) and two sites within the IOL lineage (T/K128 in NS1, L/S539 in NS2 and R/S82 in NS4) (Figure 2, Table 1).
-
The Asia.reemerge lineage-specific substitutions. In the Asia.reemerge sub-lineage, we found a region of 4 aa deletions from positions 380 to 383 in NS3 protein among 107 isolates (Figure 3B). Besides, two strains from Malaysia and one from New Caledonia had a second deletion from positions 385 to 387 in NS3 (Figure 3B). We noted that the strains with the double deletions were identified in 2006 and 2011 respectively, which were early than those strains with one region of deletion. Since they located at a basal position within the Asia.reemerge sub-lineage, it is presumed that the CHIKV strains of Asia.reemerge sub-lineage might have the double deletions at the beginning of their emerging epidemics, subsequently experienced recombination with other strains lacking the second deletion in NS3 so as to acquire 4 amino acids at the positions.
In addition, the Asia.reemerge sub-lineage had a lineage-specific substitution at position 45 in 6K protein, which could be distinguished from other lineages significantly (Figure 2, Table 1). Within the sub-lineage, variations at four sites (N/H3, N/S207, V/A368) in E2 protein, one in E3 protein (Q/R19), one in Capsid protein (P/S23) were specific for the Asia.reemerge strains. In non-structural protein, the unique lineage-specific substitution at position 273 in NS2 protein (L273) distinguished the Asia.reemerge from others (Figure 2, Table 1). Since the epidemic of Asia lineage in South East Asia, it has been nearly ten years that the Asia.reemerge lineage with these unique substitutions emerged in Southeast Asia again and especially in the new territory including the Americas. So it is presumed that the emerging lineage might have acquired advantages from the genetic evolution for the better survival and adaptive abilities which results in a boarder range of epidemics in America. Such relationship needs further investigation and presently few studies were focused on the Asia.reemerge varieties analysis.
-
According to the phylogenetic analysis, the CHIKV strains could be divided into 3 major lineages originated from Africa and Asia. The 309 strains from 41 countries were catalogued as 12 regions were exhibited in the MCC tree with branches of corresponding colors (Figure 4A). The appearance of different colors under a major branch suggested the potential movement events among the different regions indicated by colors. Obviously, no migration events were observed in the West Africa lineage (Figure 4A), which was consistent with the results from phylogenetic analysis. To better understand the global movements of CHIKV strains, we mapped the spreading pathway among the 12 regions in a world map (Figure 4B). In general, CHIKV spreading mainly occurred in the Asia.reemerge and ECSA.IOL lineages. The Asia.reemerge was transmitted from Southeast Asia (mainly about Malaysia, Indonesia and Philippine) to India and Americas (including North America, Islands of Caribbean and Brazil) as well as Pacific Islands. Notably, French Caribbean island of Saint Martin was suggested to be the transmission origin of the epidemics in America, which was quite coherent to the fact that the first CHIKV in America was identified in Saint Martin in 2013 (Cauchemez, 2014). The spread pathway of ECSA.IOL mainly originated from Kenya. The viruses first moved to islands of Indian Ocean and Indian continent, and then spread to Southeast Asia. The transmission events from India to Europe (Italy and France) in 2006 were confirmed to be related with infected travelers (Parola, 2006). Finally, the newly emerging epidemics in Pakistan were suggested to be imported from Indian continent (Figure 4B).
-
Primer pairs Sequence Length of PCR products CHIKV-F-1
CHIKV-R-1347ATGGCTGCGTGAGACACACGTAG
TTGAATGCCCATAGACARCAGC1374 bp CHIKV-F-1308
CHIKV-R-2866GAGAAAGAACACTGACCTGCTG
TTGTCTAACTGCGTAAACTCCT1559 bp CHIKV-F-2705
CHIKV-R-4151GTGGACACTACAGGCTCAACAA
GAGGGTTAGCGGCGTTGACT1447 bp CHIKV-F-3751
CHIKV-R-5302TCGCATACACCATTAYCAACAG
GTTTCTCCCTCGCCTTCTTC1552 bp CHIKV-F-5281
CHIKV-R-6813CAGAAGAAGGCGAGGGAGAAAC
AGTGAATCATCTTGGCTCTTATC1533 bp CHIKV-F-6786
CHIKV-R-8191CCTTTGATAAGAGCCAAGATGA
TGCCAGCACCTGTAGGGATG1406 bp CHIKV-F-7920
CHIKV-R-9270GCACGAAGGTAAGGTAACAGGT
CGGGACCAGAGGGGAGTTAT1350 bp CHIKV-F-9032
CHIKV-R-10430CCGAAGAGATAGAGGTACACAT
GCAGTTACAGTGATGTTATTTCC1399 bp CHIKV-F-10286
CHIKV-R-11812CTGAAAACACGCAGTTGAGC
GAAATATTAAAAACAAAATAACATCTCC1527 bp Table S1. Primers used for CHIKV genome assembly
Geographic division Country No. sequence Total WA (West Africa) Nigeria 2 14 Senegal 9 Cote dIvoire 2 Cameroon 1 SEA (South east Asia) Indonesia 4 92 Thailand 28 Philippine 7 Malaysia 11 China 17 Myanmar 5 Bangladesh 1 Cambodia 7 Singapore 12 Oceania New Caledonia 1 1 CA (Central America) Nicaragua 50 50 NA (North America) USA 45 46 Mexico 1 CI (Caribbean Island) Dominican Republic 9 10 Country 1 Saint Martin SA (South America) Brazil 6 7 French Guiana 1 ESCA (East/South/Central Africa) Congo 2 16 South Africa 3 Central African 4 Tanzania 1 Gabon 1 Uganda 1 Angola 1 Yemen 1 Kenya 2 IOL (India and Indian Ocean island) India 30 58 Sri Lanka 20 Reunion 2 Mauritius 2 Madagascar 1 Mayotte 1 Comoros 2 Pakistan_2016 Pakistan 8 8 Europe Italy 5 6 France 1 Table S2. Geographic classification of CHIKV
Extensive evolution analysis of the global chikungunya virus strains revealed the origination of CHIKV epidemics in Pakistan in 2016
- Received Date: 04 September 2017
- Accepted Date: 28 November 2017
- Published Date: 11 December 2017
Abstract: Chikungunya virus (CHIKV) is a mosquito-borne virus that causes epidemics widely in the world especially in the tropical and subtropical regions. Phylogenetic analysis has found that the CHIKV lineages were associated with the spatial and temporal distributions, which were related to the virus adaption to the major mosquito species and their distributions. In this study, we reported the complete genome sequences of eight CHIKV isolates from the outbreak in Pakistan last year. Then we reviewed the evolutionary history using extensive phylogenetic analysis, analyzed lineagespecific substitutions in viral proteins, and characterized the spreading pathway of CHIKV strains including the Pakistani strains. The results showed that the Pakistani stains belonged to the ECSA.IOL sub-lineage and derived from India. The genetic properties of the Pakistani strains including the adaptive substitution to vectors were further characterized, and the potential risks from the occurrence of CHIKV infection in Pakistan were discussed. These results provided better understanding of CHIKV evolution and transmission in the world and revealed the possible origination of the CHIKV outbreak and epidemic in Pakistan, which would promote the disease prevention and control in the identified countries and territories with the history of CHIKV infections as well as new regions with potential risk of CHIKV outbreaks.