Rapid Acquisition of High-Quality SARS-CoV-2 Genome via Amplicon-Oxford Nanopore Sequencing

Yi Yan; Ke Wu; Jun Chen; Haizhou Liu; Yi Huang; Yong Zhang; Jin Xiong; Weipeng Quan; Xin Wu; Yu Liang; Kunlun He; Zhilong Jia; Depeng Wang; Di Liu; Hongping Wei; Jianjun Chen

doi:10.1007/s12250-021-00378-8

October 2021

Citation: Yi Yan, Ke Wu, Jun Chen, Haizhou Liu, Yi Huang, Yong Zhang, Jin Xiong, Weipeng Quan, Xin Wu, Yu Liang, Kunlun He, Zhilong Jia, Depeng Wang, Di Liu, Hongping Wei, Jianjun Chen. Rapid Acquisition of High-Quality SARS-CoV-2 Genome via Amplicon-Oxford Nanopore Sequencing .VIROLOGICA SINICA, 2021, 36(5) : 901-912. http://dx.doi.org/10.1007/s12250-021-00378-8

Rapid Acquisition of High-Quality SARS-CoV-2 Genome via Amplicon-Oxford Nanopore Sequencing

Yi Yan ^1,2,3,4, ,
Ke Wu ^1,2,3,4, ,
Jun Chen ^5, ,
Haizhou Liu ^1,2,3 ,
Yi Huang ⁶ ,
Yong Zhang ¹ ,
Jin Xiong ¹ ,
Weipeng Quan ⁷ ,
Xin Wu ⁸ ,
Yu Liang ⁹ ,
Kunlun He ^10,11 ,
Zhilong Jia ^10,11 ,
Depeng Wang ⁸ ,
Di Liu ^{1,2,3,4,12
,,} ,
Hongping Wei ^{1
,,} ,
Jianjun Chen ^{1,2
,,}

1.
CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Center for Biosafety MegaScience, Chinese Academy of Sciences, Wuhan 430071, China
2.
National Virus Resource Center, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China
3.
Computational Virology Group, Center for Bacteria and Viruses Resources and Bioinformation, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan 430071, China
4.
University of Chinese Academy of Sciences, Beijing 101409, China
5.
Wuhan Pulmonary Hospital, Wuhan Tuberculosis Prevention and Treatment Institute, Wuhan 430030, China
6.
National Biosafety Laboratory, Chinese Academy of Sciences, Wuhan 430071, China
7.
GrandOmics Biosciences, Wuhan 430000, China
8.
GrandOmics Biosciences, Beijing 102200, China
9.
GrandOmics Diagnostics, Wuhan 430000, China
10.
Key Laboratory of Biomedical Engineering and Translational Medicine, Ministry of Industry and Information Technology, Chinese PLA General Hospital, Beijing 100039, China
11.
Beijing Key Laboratory for Precision Medicine of Chronic Heart Failure, Chinese PLA General Hospital, Beijing 100039, China
12.
First Affiliated Hospital of Xinjiang Medical University, Urumqi 830054, China

Corresponding author: Di Liu, liud@wh.iov.cn, ORCID: 0000-0003-0723-1701
Hongping Wei, hpwei@wh.iov.cn, ORCID: 0000-0002-9948-8880
Jianjun Chen, chenjj@wh.iov.cn, ORCID: 0000-0002-2966-2388
Yi Yan, Ke Wu, and Jun Chen contributed equally to this work.
Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s12250-021-00378-8.
Received Date: 28 October 2020
Accepted Date: 18 February 2021
Published Date: 13 April 2021
Available online: 01 October 2021

Abstract

Genome sequencing has shown strong capabilities in the initial stages of the COVID-19 pandemic such as pathogen identification and virus preliminary tracing. While the rapid acquisition of SARS-CoV-2 genome from clinical specimens is limited by their low nucleic acid load and the complexity of the nucleic acid background. To address this issue, we modified and evaluated an approach by utilizing SARS-CoV-2-specific amplicon amplification and Oxford Nanopore PromethION platform. This workflow started with the throat swab of the COVID-19 patient, combined reverse transcript PCR, and multi-amplification in one-step to shorten the experiment time, then can quickly and steadily obtain high-quality SARS-CoV-2 genome within 24 h. A comprehensive evaluation of the method was conducted in 42 samples: the sequencing quality of the method was correlated well with the viral load of the samples; high-quality SARS-CoV-2 genome could be obtained stably in the samples with Ct value up to 39.14; data yielding for different Ct values were assessed and the recommended sequencing time was 8 h for samples with Ct value of less than 20; variation analysis indicated that the method can detect the existing and emerging genomic mutations as well; Illumina sequencing verified that ultra-deep sequencing can greatly improve the single read error rate of Nanopore sequencing, making it as low as 0.4/10,000 bp. In summary, high-quality SARS-CoV-2 genome can be acquired by utilizing the amplicon amplification and it is an effective method in accelerating the acquisition of genetic resources and tracking the genome diversity of SARS-CoV-2.
- SARS-CoV-2
- , Genome
- , Amplicon
- , Nanopore sequencing

Electronic Supplementary Material

10.1007s12250-021-00378-8-ESM.pdf
References
1. Baker DJ, Kay GL, Aydin A, Le-Viet T, Rudder S, Tedim AP, Kolyva A, Diaz M, De Oliveira Martins L, Alikhan N, Meadows L, Bell A, Gutierrez AV, Trotter AJ, Thomson NM, Gilroy R, Griffith L, Adriaenssens EM, Stanley R, Charles IG, Elumogo N, Wain J, Prakash R, Meader E, Mather AE, Webber MA, Dervisevic S, Page AJ, O'grady J (2020) CoronaHiT: large scale multiplexing of SARS-CoV-2 genomes using Nanopore sequencing. bioRxiv. doi: https://doi.org/10.1101/2020.06.24.162156
2. Bangash MN, Patel J, Parekh D (2020) COVID-19 and the liver: little cause for concern. Lancet Gastroenterol Hepatol 1253: 20-21
  doi: 10.1016/S2468-1253(20)30084-4
3. Chen C, Jiang D, Ni M, Li J, Chen Z, Liu J, Ye H, Wong G, Li W, Zhang Y, Wang B, Bi Y, Chen D, Zhang P, Zhao X, Kong Y, Shi W, Du P, Xiao G, Ma J, Gao GF, Cui J, Zhang F, Liu W, Bo X, Li A, Zeng H, Liu D (2018) Phylogenomic analysis unravels evolution of yellow fever virus within hosts. PLoS Negl Trop Dis 12: 1-15
  doi: 10.1371/journal.pntd.0006738
4. Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, Qiu Y, Wang J, Liu Y, Wei Y, Xia J, Yu T, Zhang X, Zhang L (2020) Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 395: 507-513
  doi: 10.1016/S0140-6736(20)30211-7
5. De Wit E, Van Doremalen N, Falzarano D, Munster VJ (2016) SARS and MERS: recent insights into emerging coronaviruses. Nat Rev Microbiol 14: 523-534
  doi: 10.1038/nrmicro.2016.81
6. Diao B, Feng Z, Wang C, Wang H, Liu L, Wang C, Wang R, Liu Y, Liu Y, Wang G, Yuan Z, Wu Y, Chen Y (2020) Human kidney is a target for novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Infection. medRxiv. doi: https://doi.org/10.1101/2020.03.04.20031120
7. Dudas G, Carvalho LM, Bedford T, Tatem AJ, Baele G, Faria NR (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 176: 139-148
  doi: 10.17863/CAM.24183
8. Faria NR, Kraemer MUG, Hill SC, Goes de Jesus J, Aguiar RS, Iani FCM, Xavier J, Quick J, du Plessis L, Dellicour S et al (2018) Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science 361: 894-899
  doi: 10.1126/science.aat7115
9. Freed NE, Vlková M, Faisal MB, Silander OK (2020) Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and Oxford Nanopore Rapid Barcoding. Biol Methods Protoc 5: 1-7
  doi: 10.1093/biomethods/bpaa014
10. Guan WJ, Ni ZY, Hu Y, Liang WH, Ou CQ, He JX, Liu L, Shan H, Lei CL, Hui DSC, Du B, Li LJ, Zeng G, Yuen KY, Chen RC, Tang CL, Wang T, Chen PY, Xiang J, Li SY, Wang JL, Liang ZJ, Peng YX, Wei L, Liu Y, Hu YH, Peng P, Wang JM, Liu JY, Chen Z, Li G, Zheng ZJ, Qiu SQ, Luo J, Ye CJ, Zhu SY, Zhong NS, China Medical Treatment Expert Group for Covid-19 (2020) Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med 382: 1708-1720
  doi: 10.1056/NEJMoa2002032
11. Harel N, Meir M, Gophna U, Stern A (2019) Direct sequencing of RNA with MinION nanopore: detecting mutations based on associations. Nucleic Acids Res 47: e148
  doi: 10.1093/nar/gkz907
12. Kim D, Lee JY, Yang JS, Kim JW, Kim VN, Chang H (2020) The architecture of SARS-CoV-2 transcriptome. Cell 181: 1-8
  doi: 10.1016/j.cell.2020.03.025
13. Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17: 1-11
  doi: 10.1186/s13059-015-0866-z
14. James P, Stoddart D, Harrington ED, Beaulaurier J, Ly L, Reid S, Turner DJ, Juul S (2020) LamPORE: rapid, accurate and highly scalable molecular screening for SARS-CoV-2 infection, based on nanopore sequencing. medRxiv. doi: https://doi.org/10.1101/2020:2020.08.07.20161737
15. Jia L, Jiang M, Wu K, Hu J, Wang Y, Quan W, Hao M, Liu H, Wei H, Fan W, Liu W, Hu R, Wang D, Li J, Chen J, Liu D (2020) Nanopore sequencing of African swine fever virus. Sci China Life Sci 63: 160-164
  doi: 10.1007/s11427-019-9828-1
16. Kafetzopoulou LE, Pullan ST, Lemey P, Suchard MA, Ehichioya DU, Pahlmann M, Thielebein A, Hinzmann J, Oestereich L, Wozniak DM et al (2019) Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak. Science 363: 74-77
  doi: 10.1126/science.aau9343
17. Katoh K, Rozewicki J, Yamada KD (2018) MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20: 1160-1166
  doi: 10.1093/bib/bbx108
18. Lam TT, Jia N, Zhang YW, Shum MH, Jiang JF, Zhu HC, Tong YG, Shi YX, Ni XB, Liao YS, Li WJ, Jiang BG, Wei W, Yuan TT, Zheng K, Cui XM, Li J, Pei GQ, Qiang X, Cheung WY, Li LF, Sun FF, Qin S, Huang JC, Leung GM, Holmes EC, Hu YL, Guan Y, Cao WC (2020) Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins. Nature 583: 282-285
  doi: 10.1038/s41586-020-2169-0
19. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094-3100
  doi: 10.1093/bioinformatics/bty191
20. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25: 2078-2079
  doi: 10.1093/bioinformatics/btp352
21. Lu J, du Plessis L, Liu Z, Hill V, Kang M, Lin H, Sun J, François S, Kraemer MUG, Faria NR, McCrone JT, Peng J, Xiong Q, Yuan R, Zeng L, Zhou P, Liang C, Yi L, Liu J, Xiao J, Hu J, Liu T, Ma W, Li W, Su J, Zheng H, Peng B, Fang S, Su W, Li K, Sun R, Bai R, Tang X, Liang M, Quick J, Song T, Rambaut A, Loman N, Raghwani J, Pybus OG, Ke C (2020) Genomic epidemiology of SARS-CoV-2 in Guangdong Province, China. Cell 181: 997-1003
  doi: 10.1016/j.cell.2020.04.023
22. Lu X, Zhang L, Du H, Zhang J, Li YY, Qu J, Zhang W, Wang Y, Bao S, Li Y, Wu C, Liu H, Liu D, Shao J, Peng X, Yang Y, Liu Z, Xiang Y, Zhang F, Silva RM, Pinkerton KE, Shen K, Xiao H, Xu S WGCPNCST (2020) SARS-CoV-2 infection in children. N Engl J Med 382: 1663-1665
  doi: 10.1056/NEJMc2005073
23. Ma L, Xie W, Li D, Shi L, Mao Y, Xiong Y, Zhang Y, Zhang M (2020) Effect of SARS-CoV-2 infection upon male gonadal function: a single center-based study. medRxiv. doi:https://doi.org/10.1101/2020.03.21.20037267
24. Ni M, Chen C, Qian J, Xiao HX, Shi WF, Luo Y, Wang HY, Li Z, Wu J, Xu PS, Chen SH, Wong G, Bi Y, Xia ZP, Li W, Lu H, Ma J, Tong YG, Zeng H, Wang SQ, Gao GF, Bo XC, Liu D (2016) Intra-host dynamics of Ebola virus during 2014. Nat Microbiol 1: 16151
  doi: 10.1038/nmicrobiol.2016.151
25. Park WB, Kwon NJ, Choi SJ, Kang CK, Choe PG, Kim JY, Yun J, Lee GW, Seong MW, Kim NJ, Seo JS, Oh MD (2020) Virus isolation from the first patient with SARS-CoV-2 in Korea. J Korean Med Sci 35: 10-14
  doi: 10.3346/jkms.2020.35.e10
26. Quick J, Grubaugh ND, Pullan ST, Claro IM, Smith AD, Gangavarapu K, Oliveira G, Robles-Sikisaka R, Rogers TF, Beutler NA, Burton DR, Lewis-Ximenez LL, De Jesus JG, Giovanetti M, Hill SC, Black A, Bedford T, Carroll MW, Nunes M, Alcantara LC, Sabino EC, Baylis SA, Faria NR, Loose M, Simpson JT, Pybus OG, Andersen KG, Loman NJ (2017) Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc 12: 1261-1266
  doi: 10.1038/nprot.2017.066
27. Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, Bore JA, Koundouno R, Dudas G, Mikhail A, Ouédraogo N, Afrough B, Bah A, Carrol MW (2016) Real-time, portable genome sequencing for Ebola surveillance. Nature 530: 228-232
  doi: 10.1038/nature16996
28. Shen W, Le S, Li Y, Hu F (2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11: e0163962
  doi: 10.1371/journal.pone.0163962
29. Wood DE, Salzberg SL (2014) Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol 3: R46
  doi: 10.1186/gb-2014-15-3-r46
30. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A new coronavirus associated with human respiratory disease in China. Nature 579: 265-269
  doi: 10.1038/s41586-020-2008-3
31. Zeng JH, Liu YX, Yuan J, Wang FX, Wu WB, Li JX, Wang LF, Gao H, Wang Y, Dong CF, Li YJ, Xie XJ, Feng C, Liu L (2020) First case of COVID-19 infection with fulminant myocarditis complication: case report and insights. Infection 48: 773-777
  doi: 10.1007/s15010-020-01424-5
32. Zhang T, Wu Q, Zhang Z (2020) Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak. Curr Biol 30: 1-6
  doi: 10.1016/j.cub.2019.10.048
33. Zhao B, Ni C, Gao R, Wang Y, Yang L, Wei J, Lv T, Liang J, Zhang Q, Xu W, Xie Y, Wang X, Yuan Z, Liang J, Zhang R, Lin X (2020) Recapitulation of SARS-CoV-2 infection and cholangiocyte damage with human liver ductal organoids. Protein Cell 11: 771-775
  doi: 10.1007/s13238-020-00718-6
34. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si H-R, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579: 270-273
  doi: 10.1038/s41586-020-2012-7
Proportional views

Figures(5)

PDF

Article Metrics

Article views(6116) PDF downloads(19) Cited by()

Proportional views

HTML

Introduction

Since the first case of coronavirus disease 2019 (COVID-19) was reported as "pneumonia of unknown etiology" in early December 2019, this new coronavirus pneumonia epidemic caused by severe acute respiratory syndrome coronaviruses 2 (SARS-CoV-2), has spread around the world. As of Feb 24, 2021, there were 223 epidemic countries in the world, with 111,762,965 confirmed cases and 2,479,678 deaths (https://www.who.int/emergencies/diseases/novel-coronavirus-2019). The virus can infect people of almost all ages (Chen et al. 2020; Guan et al. 2020; Lu X et al. 2020). Although the mortality rate of SARS-CoV-2 is lower than that of severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) (2.2% vs 10% and 36%; De Wit et al. 2016), COVID-19 patients were often accompanied by multiple organ damage, including myocardial (Zeng et al. 2020), kidney (Diao et al. 2020), liver (Bangash et al. 2020; Zhao et al. 2020), male gonad (Ma et al. 2020), and even the central nervous system involvement, which may have lifetime impacts after recovery from infection. At present, a lot of issues needed to be addressed. For instance, the origin causing this outbreak are remaining unclear, although some related viruses were discovered on bats and pangolins (Lam et al. 2020; Zhang et al. 2020; Zhou et al. 2020). Global transmission pattern and genomic diversity are also essential to elucidate the dynamics for the pandemic.

Genome sequencing is an effective way to know the virus and thus to uncover its evolution. As exemplified in the Ebola outbreak in West Africa from 2013 to 2016, it was possible to reconstruct the spread, proliferation and decline of Ebola virus by analyzing the genome sequences of more than 5% of known cases (Dudas et al. 2017). At present, more genome sequences are still required, as about 0.54% genomes were deposited in public databases (include GISAID, GenBank, and NGDC). Therefore, an approach for rapid acquisition of SARS-CoV-2 genome is needed. Previous studies have shown that Oxford Nanopore sequencing possesses unique advantages in rapid accessing to pathogen genomes in infectious diseases like Lassa fever (Kafetzopoulou et al. 2019), Zika (Quick et al. 2017), Ebola (Quick et al. 2016), and ASFV (Jia et al. 2020) outbreak. Meanwhile, application of target-amplification would help to obtain high-quality virus genome from clinical samples (Ni et al. 2016; Chen et al. 2018; Faria et al. 2018). In this study, we further modified the approach that integrate SARS-CoV-2 target-amplification and Oxford Nanopore sequencing, and evaluate the approach by a set of clinical samples.

Materials and Methods

Sample Collection and Preparation

Respiratory specimens (swabs) collected from patients admitted to various Wuhan health care facilities were immediately placed into sterile tubes containing 3 mL of viral transport media (VTM). The swabs were deactivated by heating at 56 ℃ for 30 min in a biosafety level 2 (BSL 2) laboratory at the Wuhan Institute of Virology in Zhengdian Park with personal protection equipment for biosafety level 3 lab following the guidelines for detecting nucleic acid of COVID-19 in clinical samples. If the samples were not for use immediately, they were stored at 4 ℃.

Nucleic Acid Extraction and Viral Nucleic Acid Detection

Total nucleic acids were extracted using QIAamp^@ 96 virus Qiacube HT kit on QIAxtractor Automated extraction (Qiagen, Hilden, Germany) following the manufacturer's instructions. A commercially acquired kit for SARS-CoV-2 Nucleic Acid Detection (Jienuo Company, Shanghai, China) was used. The kit is a one-step RT-qPCR kit designed to target the open reading frame 1ab (ORF1ab) and nucleocapsid protein (N) genes of the coronavirus. The N gene was labeled with a FAM reporter dye while the ORF1ab gene was labeled with a Texas red reported dye. The 20 μL reaction mixture consisted of 18 μL of freshly prepared mix and 2 μL RNA template. The one-step RT-qPCR protocol was run using the Bio-Rad's CFX 96 instrument under the following conditions: 42 ℃ for 5 min, 95 ℃ for 10 s, followed by 45 cycles of 95 ℃ for 10 s and reading at 60 ℃ for 45 s, respectively. Only positive or suspected positive samples will be sent for sequencing.

Amplicon Nanopore Sequencing of SARS-CoV-2

The nucleic acids were performed degeneration at 65 ℃ for 5 min, with two 23 μL volume reactions, including 5 μL Template RNA, 3.6 μL primer pool 1 or pool 2 and 13.65 μL nuclease-free water.

Then a one-step reverse transcription and amplicon amplification procedure was performed in each 50 μL total volume reaction, with 2× reaction mix (Thermo Fisher Scientific Inc., Massachusetts, USA), 2 μL SuperScript^TM Ⅲ RT/Platinum^TM Taq Mix and the previous 23 μL reaction solution. The PCR program settings are as follows: reverse transcription for 45 min at 50 ℃, thermal activation for 30 s at 98 ℃, followed by 35 cycles of denaturation at 98 ℃ for 15 s and annealing at 65 ℃ for 5 min, respectively.

The pool 1 and pool 2 amplification products of each sample were mixed and purified by the Agencourt AMPure XP beads at a 1:1 ratio, and finally diluted in 30 μL EB buffer. 1 μL purified DNA amplicons were used for quantification by Qubit (Qubit^TM dsDNA HS Assay Kit). The amplicons of each sample were then diluted to 1 ng/μL and 5 ng was used for Nanopore library construction. The sequencing library preparation consists of two steps: native barcode ligation and sequencing adapter ligation. The native barcoding of amplicons was performed in a 15 μL volume reaction (5 μL DNA amplicons, 7.5 μL nuclease-free water, 1.75 μL Ultra Ⅱ End Prep Reaction Buffer and 0.75 μL Ultra Ⅱ end Prep Enzyme Mix) for 10 min at room temperature and 5 min at 65 ℃ and 1 min on ice, then followed by a 50.5 μL total volume reaction (2.5 μL NBXX barcode, 17.5 μL Ultra Ⅱ Ligation Master mix, 0.5 μL Ligation Enhancer and previous 15 μL reaction solution) for 15 min at room temperature, 10 min at 70 ℃ and final 1 min on ice. Then mix all barcoded amplicons into one tube and quantify using Qubit. The sequencing adapter was ligated in a 50 μL volume reaction, with 30 μL barcoded amplicon pools, 5× NEBNext Quick Ligation Reaction Buffer, 5 μL AMⅡ adapter mix, and 5 μL Quick T4 DNA Ligase. The ligation reaction was performed at room temperature for 15 min. The library was purified using AMPure XP beads and quantified using Qubit.

Sequencing was performed on the PromethION platform, the final library was loaded onto the flow cell according to the manufacturer's instructions. ONT MinKNOW software was used to collect raw sequencing data, and Guppy was used for local basecalling of the raw data after sequencing runs were completed. Only reads with a mean quality score greater than seven were collected for subsequent analyses.

Raw Data Processing

Firstly, sequencing data yielded from PromethION was filtered to remove low-quality reads with mean quality score less than seven. Secondly, data (reads quality score greater than seven) was subjected to remove reads shorter than 400 bp and longer than 600 bp by SeqKit (Shen et al. 2016). Thirdly, demultiplexing and adapters trimming were processed by qcat. The filtered data after using above three steps will be used as high-quality data for subsequent analysis.

Sequencing Data Quality Assessment

The quality of sequencing data (high-quality data) for each sample was assessed by the following indicators: total bases, total reads number, reads length distribution, and mapped reads number, mapping rate of reads, coverage, mean sequencing depth, as well as median depth based on reference genome. Total bases, total reads number, and reads length were calculated by Perl language command. The mapping procedure was performed using Minimap2 (Li 2018) (with the parameters of -x map-ont) based on the genome sequence of IVDC-HB-01 (GISAID accession number: EPI_ISL_402119), and filtered by SAMtools (Li et al. 2009) (with the parameters of -F 3840 -q 60). Then the number of mapping reads, coverage length, and depth information of every site were obtained using the SAMtools (Li et al. 2009). Then the mapping rate of reads was calculated from the previous result of total reads number and mapping reads number, and the coverage rate of each sample was calculated from the coverage length information, and average depth and median depth were calculated from the depth information of every site using R language.

Correlation Test Between Ct Value and Different Data Quality Indexes

The correlation between Ct values and other different data quality indicators (total bases, mapping rate of reads, coverage, average depth, median depth) was examined by SPSS. First, tested whether each group of data obeys the normal distribution, and judged by skewness value, kurtosis value, significance value of Shapiro–Wilk test, and Q-Q plot. Secondly, the Pearson correlation coefficient test was carried out for the two sets of data that both obeyed the normal distribution, and the Spearman correlation coefficient test was performed on data that did not follow the normal distribution.

Acquisition of SARS-CoV-2 Genome

Mapping results were subjected to call SNPs using the tool Medaka (with the filter standards of ref_prob ≤ 0.01, QUAL ≥ 28, DP ≥ 15, AF ≥ 0.6 or ref_prob ≤ 0.06, QUAL ≥ 17, DP ≥ 30, AF ≥ 0.8) and the command bcftools mpileup (with the filter standards of depth of 10× and frequency of 0.6), followed by artificial verification, finally generated consensus using a script, margin_cons.py (Quick et al. 2017) (https://github.com/artic-network/fieldbioinformatics/blob/master/artic/margin_cons.py). Results of consensus sequences were used as genome sequences.

Nucleotide and Amino Acid Substitution Recognition

The reference genome sequence of IVDC-HB-01 (GISAID accession number: EPI_ISL_402119) and 38 genome sequences provided in the present study were aligned using MAFFT (Katoh et al. 2018), then used the alignment as input file of PERL script (available at https://github.com/zer0liu/bioutils/blob/master/snp/), which was developed to identify sites variations compared to the reference, at the same time, judge synonymous or non-synonymous mutations through the annotation of coding regions.

Validation of Variation Sites and Mutated Allele Frequency between Oxford-Nanopore Sequencing and MiSeq Sequencing

Amplicons of six selected samples (with more SNPs than the others) were directly ligated Illumina sequencing adapters using VAHTS^TM Universal DNA Library Prep Kit for Illumina V3 (Vazyme Cat. ND607-01), then sequencing was conducted on the Miseq platform. The sequencing data were mapped to the reference genome of USA-CA1 (GenBank accession number: MN994467.1) with more variation sites compare to early Chinese isolate and performed SNP-calling by previously developed methods (Ni et al. 2016) (using standards of minor freq < 0.2 and major allele positive stripe to [0.0 to 1]). The differences in mutation sites and frequencies obtained by the two sequencing methods were compared for evaluation.

Accession Codes

All the genome sequences of SARS-CoV-2 sequenced in the present study have been deposited in NGDC (accession no. GWHALPE01000000-GWHALPT01000000 and GWHALRI01000000-GWHALSH01000000) and GISAID (accession no. EPI_ISL_493149-EPI_ISL_493190).

Discussion

When dealing with the emerging infectious diseases, acquiring the genome of the causative pathogens is a top priority in early anti-epidemic works, as virus genomics is one of the direct ways to understand the etiology of an emerging infectious disease. Moreover, virus genome information can help researchers to carry out pathogen identification, important genes analysis, origins tracing, dynamic tracking and transmission and epidemic prejudgment etc. For the COVID-19 epidemic caused by SARS-CoV-2, direct detection of clinical samples is the fastest way to obtain the viral genome, such steps like cell culture (Kim et al. 2020), virus isolation (Park et al. 2020) can be omitted. The most commonly used method was the metagenomic next-generation sequencing (mNGS) (Zhou et al. 2020), it has unique advantages in the screening of unknown pathogens. But for known pathogens, mNGS will need more data than amplicon sequencing for genome acquisition because clinical samples such as that of COVID-19 patients (most are oropharyngeal swabs) often have low viral nucleic acid load and complicated background, which increases the sequencing cost and analysis cost to some extent. On the other hand, because complete genome coverage cannot be guaranteed, Sanger sequencing is often used to fill in the gaps, which increases workload and time cost.

With the continuous development of sequencing technology, Oxford-Nanopore sequencing has become one of the powerful means for the rapid detection of pathogens. Its rapid, portable, and real-time characteristics make it played an important role in the outbreak of Lassa fever (Kafetzopoulou et al. 2019), Zika (Quick et al. 2017), Ebola (Quick et al. 2016), and other infectious diseases. In the current COVID-19 pandemic, some research teams have also improved their SARS-CoV-2 whole-genome sequencing (WGS) methods in other ways, such as increasing the length of amplicons to reduce costs (Freed et al. 2020), combining multi-target amplification and rapid barcode library preparation to shorten time costs (James et al. 2020), and using transposase mediated addition of adapters and PCR based addition of symmetric barcodes to increase throughput (Baker et al. 2020). In the present study, we merged the reverse transcription (RT) PCR and amplicons amplification into one-step to shorten experiment time, and comprehensively evaluated this type of amplicon-Nanopore sequencing technology.

First of all, results demonstrated that high-quality SARS-CoV-2 virus genome covering all ORF regions could be obtained from clinical samples within 24 h (Figs. 1, 3). Since the coverage and depth of the viral genome were evaluated with the change of sequencing time, recommended sequencing time was given for samples with different Ct value ranges (Fig. 3C–3G), which can effectively guide the reasonable sequencing arrangement. Moreover, the throughput of Nanopore sequencers can meet the needs of large-scale sequencing, and a single Nanopore PromethION 48 sequencer can process more than 1000 samples per day. According to our rough estimate, the total cost of nucleic acid positive detection and sequencing for a sample is less than $170, which is equivalent to the generally accepted low-cost mNGS sequencing. In addition, there is still potential for further improvement in genome collection as we used the primers V1 published by the ARTIC network, and the ARTIC team is constantly optimizing the primer pool. Only in the current version, the Ct values of the clinical pharyngeal swab samples we evaluated ranged from 18.74 to 39.14 (Supplementary Table S1). Compared with other studies using Nanopore sequencing (Baker et al. 2020; Lu J et al. 2020), 42 samples sequenced in this study showed better results, 38 samples (90.5%) covered more than about 90% of the genome with sequencing depth of more than 100×, and the other 4 samples (9.5%) covered more than 70% of the genome with sequencing depth over 100×. Furthermore, Illumina sequencing verified that the high error rate for single read in Nanopore sequencing (Jain et al. 2016) could be reduced or even be completely eliminated via ultra-high deep sequencing in the Nanopore sequencing platform: in the six verification samples, the overall error rate is less than 0.4 per 10, 000 bp; At the same time, 2/3 of the samples are 100% accurate (Fig. 5), and through the backtracking of the experimental process, it was found that 1/3 of the samples with false positives or false negatives may be caused by the low sample quality. Since the current SARS-CoV-2 virus genome variations are mostly random mutations (Supplementary Fig. S1), systematic errors in the genome sequences obtained by Nanopore sequencing is completely negligible in large-scale genomic analysis.

However, since the frequency of major mutant alleles in Nanopore sequencing is significantly lower than that in Illumina sequencing (Fig. 5, Supplementary Table S3), that is, there are still many minor mutations caused by technical errors in nanopore sequencing, which is the same as previous studies (Harel et al. 2019), so other auxiliary methods are needed in the analysis of intra-host mutations (e.g., iSNV).

Acknowledgements

The study was supported by grants from the Foundation for National Mega Project on Major Infectious Disease Prevention (grant number 2017ZX10103005-005), National Key Research and Development Program of China (2020YFC0845800 and 2020YFC0845600), and the National Natural Science Foundation of China (31970548 and 91631110). We thank the ARTIC-network for publishing their amplicon primers, we thank Lei Zhang, Ding Gao, Juan Min, Anna Du, Dongbo Nie of the core facility and technical support at Wuhan Institute of Virology, as well as Tao Du of National Biosafety Laboratory, Wuhan, Chinese Academy of Sciences for assistance with experimental platform and experimental environment maintenance.

Author contributions

This project was designed by Jianjun Chen, DL, HW. Samples were collected and prepared by Jianjun Chen, Jun Chen, YH, YZ, JX. Experiments were conducted by Jianjun Chen, Jun Chen, YY, KW, WQ, YL. The methods were developed by Jianjun Chen, DL, HW, YY, HL, XW, KH, ZJ, DW. The data analysis was performed by YY, KW, HL, XW. The manuscript was prepared by Jianjun Chen, DL, HW, YY. All authors read and commented on the paper.

Compliance with Ethical Standards

Conflict of interest

All authors declare that they have no conflict of interest.

Animal and Human Rights Statement

The study and use of all samples were approved by the Ethics Committee of Wuhan Pulmonary Hospital (No. 2020-LS-001), consents from patients were waived by the Ethics committee.

Figure (5) Reference (34) Relative (20)

Rapid Acquisition of High-Quality SARS-CoV-2 Genome via Amplicon-Oxford Nanopore Sequencing

Abstract

Electronic Supplementary Material

References

Proportional views

Article Metrics

Related

Proportional views