An Assessment of Amplicon-Sequencing Based Method for Viral Intrahost Analysis

Ming Ni; Chen Chen; Di Liu

doi:10.1007/s12250-018-0052-z

December 2018

Ming Ni, Chen Chen and Di Liu. An Assessment of Amplicon-Sequencing Based Method for Viral Intrahost Analysis[J]. Virologica Sinica, 2018, 33(6): 557-560. doi: 10.1007/s12250-018-0052-z

Citation: Ming Ni, Chen Chen, Di Liu. An Assessment of Amplicon-Sequencing Based Method for Viral Intrahost Analysis .VIROLOGICA SINICA, 2018, 33(6) : 557-560. http://dx.doi.org/10.1007/s12250-018-0052-z

扩增子测序研究病毒宿主内多样性的准确性评估

倪铭 ¹ ,
陈晨 ² ,
刘翟 ^3,,

1.
军事科学院军事医学研究院辐射医学研究所，北京 100850，中国
2.
首都医科大学附属北京地坛医院传染病研究所，北京 100015，中国
3.
中国科学院武汉病毒所，武汉 430071，中国

通讯作者： 刘翟, liud@wh.iov.cn, ORCID: http://orcid.org/0000-0003-3693-2726
收稿日期： 2018-05-07
录用日期： 2018-09-13
出版日期： 2018-11-05

摘要

NA病毒由于有很高的复制突变率，在宿主体内常以具有多样性的群体形式存在。最近，多个研究组通过研究病毒宿主内多样性，解读了病毒的宿主内微进化过程以及病毒-宿主相互作用。研究病毒宿主内多样性的常用技术手段之一为扩增子测序，即首先对样本中的病毒进行特异性扩增，再对扩增子进行深度测序。但是，扩增子测序在病毒扩增和测序过程中可能引入错误，所以需要对该技术的敏感性和特异性进行评估。本研究通过构建模型样本，定量评价了扩增子测序发现病毒宿主内单核苷酸突变（intrahost single nucleotide variation, iSNV）的假阳性和假阴性、iSNV突变等位基因频率（mutated allele frequency, MuAF）的准确性，以及MuAF在不同扩增子之间的偏性。在我们对模型样本的测试中，样本病毒载量较高时（Ct值为25，或约1 × 10⁵拷贝/微升），采用MuAF > 0.3%的阈值可使iSNV发现的假阴性率为零，假阳性率小于0.001。对于病毒载量很小的样本（Ct值为35，或约100拷贝/微升），采取更高的MuAF > 0.7%的阈值可获得类似的准确性。总的来说，在对临床和现场样本进行病毒宿主内多样性研究时，扩增子测序可提供足够高的准确性和敏感度，是一种有效可行的技术方案。
- 病毒宿主内多样性分析
- , 深度测序
- , 病毒基因组学

An Assessment of Amplicon-Sequencing Based Method for Viral Intrahost Analysis

Ming Ni ¹ ,
Chen Chen ² ,
Di Liu ^3,,

1.
Beijing Institute of Radiation Medicine, Beijing 100850, China
2.
Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
3.
Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China

Corresponding author: Di Liu, liud@wh.iov.cn
ORCID: http://orcid.org/0000-0003-3693-2726
Received Date: 07 May 2018
Accepted Date: 13 September 2018
Published Date: 05 November 2018

Abstract

References
1. Chen C, Jiang D, Ni M, Li J, Chen Z, Liu J, Ye H, Wong G, Li W, Zhang Y et al (2018) Phylogenomic analysis unravels evolution of yellow fever virus within hosts. PLoS Negl Trop D 12:e0006738
  doi: 10.1371/journal.pntd.0006738
2. Debbink K, McCrone JT, Petrie JG, Truscon R, Johnson E, Mantlo EK, Monto AS, Lauring AS (2017) Vaccination has minimal impact on the intrahost diversity of H3N2 influenza viruses. PLoS Pathog 13:e1006194
  doi: 10.1371/journal.ppat.1006194
3. Domingo E, Sheldon J, Perales C (2012) Viral quasispecies evolution. Microbiol Mol Biol Rev 76:159-216
  doi: 10.1128/MMBR.05023-11
4. Gire SK, Goba A, Andersen KG, Sealfon RS, Park DJ, Kanneh L, Jalloh S, Momoh M, Fullah M, Dudas G et al (2014) Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345:1369
  doi: 10.1126/science.1259657
5. Holmes EC, Dudas G, Rambaut Andersen KG (2016) The evolution of Ebola virus: insights from the 2013-2016 epidemic. Nature 538:193-200
  doi: 10.1038/nature19790
6. Lakdawala SS, Jayaraman A, Halpin RA, Lamirande EW, Shih AR, Stockwell TB, Lin X, Simenauer A, Hanson CT, Vogel L et al (2015) The soft palate is an important site of adaptation for transmissible influenza viruses. Nature 526:122-125
  doi: 10.1038/nature15379
7. Matranga CB, Andersen KG, Winnicki S, Busby M, Gladden AD, Tewhey R, Stremlau M, Berlin A, Gire SK, England E et al (2014) Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples. Genome Biol 15:519
  doi: 10.1186/s13059-014-0519-7
8. Metsky HC, Matranga CB, Wohl S, Schaffner SF, Freije CA, Winnicki SM, West K, Qu J, Baniecki ML, Gladden-Young A et al (2017) Zika virus evolution and spread in the Americas. Nature 546:411-415
  doi: 10.1038/nature22402
9. Ni M, Chen C, Qian J, Xiao HX, Shi WF, Luo Y, Wang HY, Li Z, Wu J, Xu PS et al (2016) Intra-host dynamics of Ebola virus during 2014. Nat Microbiol 1:16151
  doi: 10.1038/nmicrobiol.2016.151
10. Park DJ, Dudas G, Wohl S, Goba A, Whitmer SL, Andersen KG, Sealfon RS, Ladner JT, Kugelman JR et al (2015) Ebola virus epidemiology, transmission, and evolution during seven months in Sierra Leone. Cell 161:1516-1526
  doi: 10.1016/j.cell.2015.06.007
11. Poon LL, Song T, Rosenfeld R, Lin X, Rogers MB, Zhou B, Sebra R, Halpin RA, Guan Y, Twaddle A et al (2016) Quantifying influenza virus diversity and transmission in humans. Nat Genet 48:195-200
  doi: 10.1038/ng.3479
12. Yang X, Charlebois P, Macalalad A, Henn MR, Zody MC (2013) V-Phaser 2: variant inference for viral populations. BMC Genom 14:674
  doi: 10.1186/1471-2164-14-674
13. Joshi NA and Fass JN (2011) Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle. Accessed June 5th 2016.
14. Langmead B and Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357-359.
  doi: 10.1038/nmeth.1923
15. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079.
16. Nikolenko SI, Korobeynikov AI, and Alekseyev MA (2013) BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics, 14: Suppl 1, S7.
  doi: 10.1186/1471-2164-14-S1-S7
17. Schirmer M, Ijaz UZ, D'Amore R, Hall N, Sloan WT, Quince C (2015) Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 43:e37
  doi: 10.1093/nar/gku1341
Proportional views
10.1007s12250-018-0052-z.pdf

Figures(1) / Tables(3)

PDF

Article Metrics

Article views(4545) PDF downloads(19) Cited by(0)

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

HTML

Dear Editor,

RNA virus features the high per-generation mutation rate, resulting in highly diversified population, which determines viral phenotypes and fitness (Domingo et al. 2012). The development of deep sequencing and bioinformatic techniques paves the way to study viral population genetics at an intrahost level, and has provided new insights into viral evolution. In a study conducted by Lakdawala et al. (2015), viral intrahost analysis was applied to monitor rapid selection of influenza virus with functional mutations. With a similar method on seasonal influenza virus, Debbink and colleagues have examined the viral intrahost heterogeneity and claimed that vaccine-induced immunity had minor impact on antigenic drift (Debbink et al. 2017). Particularly, in combating emerging outbreaks, the intrahost analyses have also been conducted to investigate the evolution of infectious disease pathogens, including the Ebola virus (EBOV) in West Africa (Ni et al. 2016), Zika virus (ZIKV) transmission in the Americas (Metsky et al. 2017), and imported yellow fever virus (YFV) in China (Chen et al. 2018).

Nowadays, two approaches have been applied to effectively enrich viral genomes from clinical samples, including (1) viral RNA capture following the depletion of host RNA (Matranga et al. 2014; Metsky et al. 2017) and (2) direct viral-specific genomic amplification (Ni et al. 2016; Metsky et al. 2017; Chen et al. 2018). The bioinformatics methods have also been developed accordingly (Yang et al. 2013; Ni et al. 2016). From the comparison of these two approaches, the viral-specific genomic amplification exhibited higher sequencing depth of viral genome, thus may identify more intrahost single nucleotide variations (iSNVs), even for some clinical samples with ultra-low viral content (Ct > 30) (Ni et al. 2016; Metsky et al. 2017). However, due to the lack of evaluation of the viral-specific genomic amplification, the bias of using iSNVs to investigate viral population genetics was argued (Metsky et al. 2017). Specifically, concerns were mainly focused on (1) the discrimination of intrahost variations from amplification and/or sequencing errors, (2) sensitivity and specificity for iSNV detection, (3) accuracy of variant allele frequency, and (4) amplification bias among different amplicons. Herein, we conducted this study to address the above concerns.

First, we constructed two clonal plasmids harboring a 2.5 kbp wild-type and mutant EBOV genomic segments (from the position 1, 100–3, 600 of EBOV C15 genome, GenBank accession no. KJ660346.2, Fig. 1A and Supplementary Table S1), respectively. The mutant sequence contained 71 substitutions based on the iSNVs identified in EBOV strains from Sierra Leone 2014 (according to sequences in NCBI SRA database under BioProject no. PRJNA298842, Supplementary Table S2). Then, we mixed the two clonal plasmids with four ratios (mutant:wildtype = 1:2, 1:4, 1:8, and 1:32) to mimic distinct viral populations (Fig. 1B). In order to precisely determine the mutant:wild-type ratios, all mixtures were subjected to next-generation sequencing without gene-specific amplification. The ratios (mutants accounted for 36.7% ± 0.7%, 16.1% ± 0.3%, 8.65% ± 0.27% and 1.71% ± 0.11%) calculated from the sequencing data were used as the reference. Moreover, we also diluted all the four mixtures into a higher titer (~ 1 × 10⁵ copies/μL, equivalent to ~ 25 Ct value) and a lower titer (~ 100 copies/μL, equivalent to ~ 35 Ct value) to represent regular and challenging clinical samples (Fig. 1B). For each sample, 500 ng cDNA from the human A549 cell line was added as host background. We then applied two pairs of primers, which products covered a region of ~ 2.2 kbp with a 215 bp overlap, to simulate the process of viral genome amplification (Fig. 1A). All amplicons were subjected to NGS and bioinformatic analysis. The amplified viral genomic region contained 62 of the 71 substitutions. Excluding one near the primer region, 61 substitutions were used for the following assessment (Detailed methods were described in the Supplementary Methods).

Figure 1. A The designed iSNV positions and amplicons. Two EBOV specific amplicons aligned with the genome (EBOV C15, GenBank accession no. KJ660346.2) are shown. Short vertical lines indicate the substituted sites. B Flow chart of the assay. The DNA samples extracted from wild-type (wt) and mutant (mt) clonal plasmids were mixed whth four ratios, and were diluted to two DNA concentrations (denoted by red and blue) for EBOV specific singlex amplification. The PCR products of each sample were pooled for NGS. The four mixture samples of a 10⁵ copies μL^-1 concertation were also directly sequenced. C The identified iSNVs by amplicon-seq with four mixed ratios. The dashed lines indicate the mean values of MuAF by direct sequencing, with the mutant percentages shown. Only the iSNVs with a ≥ 1% MuAF are shown, and the false positive iSNV was denoted by red arrow. D Box-plots of the MuAFs of iSNVs based on direct sequencing (white) and sequencing of the amplicons. Δ is the difference of amplicon-based median value and directly sequencing based median value. Blue box, 10⁵ copies/μL; red box, 10² copies/μL. The same for E. E Box-plot of the MuAFs of iSNVs located in regions of amplicon 1 and amplicon 2. Δ is the difference of MuAF median values. F The variation of false positive rate in iSNV calling with threshold of MuAF at four mixed ratios. The vertical dashed lines indicate that FPRs equal 0.001, and the corresponding MuAF threshold are shown at the top. In D and E, boxes represent the interquartile range (IQR) between the first and third quartiles. Horizontal lines inside the boxes indicate the median, and the lines outside represent values within 1.5 times the IRQ

For the eight mixtures listed in Fig. 1B, we obtained an average of 51.4 Mbp (SD = 3.2 Mbp) data for each sample, and 94.4%–98.4% reads were aligned to the wildtype/mutant reference sequence (Supplementary Table S3). The on-target depth of all sites were > 17000X. We used a stringent threshold of mutated allele frequency (MuAF) ≥ 1% to identify iSNVs, according to the criteria in previous studies (e.g., MuAF ≥ 5%, Ni et al. 2016; ≥ 3%, Lakdawala et al. 2015; Poon et al. 2016; ≥ 1%, Debbink et al. 2017; Metsky et al. 2017). At this threshold, we identified all designed substitutions of the mutant in all mixed ratios irrespective of the regular or challenging viral contents (Fig. 1C). We found one false positive G-to-A substitution in the sample of the 1:8 mixtures at the lower titer. The results indicated that this approach could reach a zero false negative rate (FNR) and a zero false positive rate (FPR) for regular samples, and zero-FNR and < 0.02 FPR for challenging samples.

We then calculated the MuAFs of iSNVs for each samples, and compared them with those of the references (Fig. 1C, 1D). The box-plots of the MuAFs showed that for regular samples, the offsets of median MuAFs from the references were 0.015, 0.011, 0.003, and 0.0002 for 1:2, 1:4, 1:8, and 1:32 mixtures, respectively (Fig. 1D). The relative offset of median MuAF was around 3.9% ± 2.3% to the real MuAF (e.g., 0.015/0.367 ≈ 4.09% for 1:2 mixture). Meanwhile, for the challenging samples, the offset of the median MuAF was about 0.023 ± 0.0068, and the relative offset was 6.54%, 7.45%, 31.21%, and 175.44%. Of note, the relative offset increased dramatically when the mutant content getting lower. When compared the offset of MuAFs in each mixed ratio for regular and challenging samples, the median and standard deviation of the offset of the challenging sample were larger than those of the regular sample as expected. In summary, for samples with higher viral contents, the relative offset of MuAF of iSNV is around 5%; while for the lower viral-content samples, the relative offset was effected by the percentage of mutants, in that the mutant:wild-type ratio > 1:4 presented a ~ 10% relative offset.

We following examined the bias between the two amplicons targeting the same gene segment. For both higher and lower viral-content samples, the bias between amplicons exists (Fig. 1E). The biases of two amplicons measured by the median of MuAFs were ranged from 0.001 to 0.009 and the relative bias was 4.6% ± 1.38% (e.g., 0.009/0.367 ≈ 2.45% for 1:2 mixture) in the higher viral-content samples. The biases for the lower viral-content samples were much higher (0.037–0.059) and the relative biases were 15.0%, 23.6%, 68.2%, and 216.4% for 1:2, 1:4, 1:8, and 1:32 mixed ratios in this study. This result indicated that the bias between amplicons greatly affected the estimation of the MuAF of iSNVs for lower viralcontent samples.

We further explored the limit for iSNV identification, that is, to determine the lowest MuAF threshold, beyond which the iSNVs could be distinguished from the noise. Practically, we calculated the FPRs for iSNV calling with an increasing MuAF threshold (> 0.1% to > 1%, Fig. 1F). For the higher viral-content samples, the threshold of the MuAF could be set at as low as 0.3% to ensure a < 0.001 FPR, while for the lower viral-content samples, a 0.7% threshold and above could be used.

Generally, examining viral dynamics within-host by using iSNVs could provide critical information to know the transmission patterns and to understand the ability of natural selection shaping the viral diversity (Holmes et al. 2016). As for the EBOV outbreak in West Africa 2014, analyses of iSNVs have facilitated to unravel the transmission chains, to discover diverse selective pressures on the viral genome, and to identify the regulatory sites (Gire et al. 2014; Park et al. 2015; Ni et al. 2016). The cutoff used for these studies was MuAF ≥ 5%, and as the cutoff decreased, the numbers of observed iSNVs increased sharply (Gire et al. 2014; Park et al. 2015; Ni et al. 2016). Hence, determining the optimal threshold is of great priority for iSNV calling and the following analysis. In this study, we conducted an attempted assessment of the amplicon-based approach for iSNV identification. Our results suggested that in high viral-content samples, iSNVs with MuAF > 0.3% could be identified ideally with a < 0.001 FPR, the offset to the real MuAF was fluctuated within 7%, and the bias between amplicons was lower than 6%. While in low viral-content samples, a limited number of false positive iSNVs might appear, and the MuAF threshold of iSNV identification should not be lower than 0.7% at a > 0.001 FPR. The relative offset of MuAF and the bias between amplicons were much higher than those in high viral-content samples, especially for samples with a low fraction of mutants. Conclusively, the discovery of iSNV could reach the limit of 1% MuAF or lower in the samples of > 100 copies/μL viral contents. However, the allele frequency of iSNV should be aware of, especially in the low viral-content samples.

Acknowledgements

This work was supported in part by grants from the National Natural Science Foundation of China (91631110, 81571956 and U1435222). M.N was supported by the Beijing Nova Program (Z181100006218114).

Compliance with Ethics Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Animal and Human Rights Statement

This article does not contain any studies with human or animal subjects performed by any of the authors.

Figure (1) Table (3) Reference (17) Relative (20)

扩增子测序研究病毒宿主内多样性的准确性评估

摘要

An Assessment of Amplicon-Sequencing Based Method for Viral Intrahost Analysis

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

通讯作者: 陈斌, bchen63@163.com