HTML
-
A total of 56, 865, 504 raw reads were obtained from the Se301 cells and 54, 569, 296 were obtained from the P8-Se301-C1 cells. After a series of stringent filtering processes, 55, 033, 958 and 52, 784, 058 clean reads were obtained from the Se301 and P8-Se301-C1 cells, respectively (Supplementary Table S2).
-
After primary assembly, we obtained 116, 048 counts with a mean length of 1, 173 nt and an N50 of 2, 137 nt from the P8-Se301-C1 cells and 104, 600 counts with a mean length of 1, 169 nt and an N50 of 2, 145 nt from the Se301 cells (Supplementary Table S3). After further assembly, we obtained 112, 565 counts with a mean length of 1, 093 nt and an N50 of 1, 824 nt from the P8-Se301-C1 cells and 102, 996 counts with a mean length of 1, 082 nt and an N50 of 1, 803 nt from the Se301 cells (Supplementary Table S4). The final set of unigenes from the P8-Se301-C1 cells comprised 24, 731 unigenes (24.33%) that were≥1, 000 nt long and 10, 229 unigenes (10.06%) that were > 2, 000 nt long. The final set of unigenes from the Se301 cells comprised 29, 849 unigenes (28.98%) that were≥1, 000 nt long and 13, 949 unigenes (13.54%) that were > 2, 000 nt long. The distributions of the lengths of the final unigenes from the P8-Se301-C1 and Se301 cells are presented in Supplementary Figure S1.
-
Of the 112, 565 final unigenes from the P8-Se301-C1 cells, 26, 553 (23.6%) were annotated, and of the 102, 996 final unigenes from the Se301 cells, 24, 906 (24.2%) were annotated. The distributions of the BLASTX search (Uniprot database was used) hits for the P8-Se301-C1 and Se301 samples are shown in Figure 1. There were a significant number of hits for the unigenes from both cell lines with species in the order Lepidoptera.
Figure 1. Distribution of the most frequent BLASTX hits (associated with 20 species) for the unigenes from the P8-Se301-C1 (A) and Se301 cells (B). There were a significant number of hits for the unigenes from both cell lines with species in the order Lepidoptera.
The main GO categories were cellular components, biological processes, and molecular functions. The terms associated with the P8-Se301-C1 sample were all from these three main GO categories and comprised 53 functional subcategories) (Supplementary Figure S2). Regarding the cellular components category, the largest proportion of unigenes were assigned to the following subcategories: cells (20.52%), cell part (20.52%), membrane (14.28%) and membrane part (11.48%). The majority of the unigenes in the biological process category were assigned to the metabolic process subcategory (26.57%) and the cellular process subcategory (23.75%). Most of the unigenes in the molecular function category were related to binding (45.90%) and catalytic activity (38.61%), which included genes that encoded kinases, transferases, and hydrolases, many of which are likely to be involved in DNA replication, transcription, and translation.
To assist with the functional classification of the final set of unigenes, information on the functional classification of their homologs in the COG database was explored (Liu et al., 2016). A total of 19, 825 unigenes from the P8-Se301-C1 cells were clustered into 25 COGs (Figure 2). Among them, the signal transduction mechanisms cluster was the largest (8.71%), followed by the general function prediction only cluster (6.75%). The other large clusters were transcription (5.03%), RNA processing and modification (4.48%), posttranslational modification, protein turnover and chaperones (4.40%), cytoskeleton (3.53%) and intracellular trafficking, secretion and vesicular transport (3.01%).
To identify the biological pathways that are active in the S. exigua cell lines, we mapped the 26, 553 annotated sequences from P8-Se301-C1 cells to those associated with the canonical reference pathways in the KEGG database. In total, 26, 553 unigenes were assigned to 290 known metabolic or signaling KEGG pathways. The top 11 KEGG pathways were metabolic pathways (2041 unigenes), biosynthesis of antibiotics (839), ribosomes (697), biosynthesis of secondary metabolites (687), protein processing in the endoplasmic reticulum (540), microbial metabolism in diverse environments (324), purine metabolism (308), HTLV-I infection (304), the PI3K-Akt signaling pathway (283), spliceosomes (279), and RNA transport (263).
-
After the clean reads from RNA-Seq were aligned with the SeMNPV genome, it was found that ten SeMNPV gene transcripts from the P8-Se301-C1 cells were not identified in the Se301 samples: se5 (unknown function), se7 (me53), se8 (envelope fusion protein), se12 (lef2), se43 (unknown function), se45 (ribonucleotide reductase small subunit), se89 (unknown function), se90 (unknown function), se124 and se126 (DNA binding protein).
RT-PCR was carried out to confirm the presence of the ten SeMNPV gene transcripts in the P8-Se301-C1 cells. The RT-PCR products were distinguished by agarose gel electrophoresis, and fragments of the expected sizes were detected for the P8-Se301-C1 samples, but not for the Se301 samples (Figure 3). The RT-PCR products were further verified by sequencing and by using a BLAST search of the NCBI's Nucleotide Database. All the RNA samples prepared in this study were subjected to RT-PCR with the same primers and the results indicated that there was no DNA contamination.
Figure 3. Ten SeMNPV genes (from the RNA-Seq analysis) verified by RT-PCR. The total RNA was extracted from the P8-Se301-C1 and Se301 cells and cDNAs were constructed. Subsequently, the ten SeMNPV genes were amplified by PCR and visualized on 1% agarose gel (the names of each of the genes are shown above each lane). PCRs were also performed with the RNA samples in place of the cDNA to exclude the possibility of DNA contamination.
-
To further understand the full-length transcripts of the ten SeMNPV genes from the P8-Se301-C1 cells, both 3' and 5' RACE analyses were performed using the total RNA extracts. No gene products were detected in the first round of amplification. However, after the second round of nested PCR using the RACE amplification products, a range of fragment products were observed (Figure 4).
Figure 4. RACE analyses of the 3' and 5' end sequences of the SeMNPV gene transcripts in P8-Se301-C1 cells. (A) 3′end RACE analysis. (B) 5′ end RACE analysis. The total RNA was extracted from the P8-Se301-C1 cells and 5'/3' RACE analyses were performed to determine the nucleotide sequences of the 5' and 3' ends of the SeMNPV gene transcripts. The PCR products were visualized on 1% agarose gel. No gene products were detected in the first round of amplification. However, after a second round of nested PCR on the RACE amplification products, a range of fragment products were observed. H2O were used to as a control.
The fragments of the 3′ and 5′ ends of the gene transcripts were recovered and sequenced, and then the sequences were assembled (along with the sequences from the previous RNA-Seq) into full-length transcripts. The full-length sequences of the transcripts were used to carry out a BLAST search of the NCBI's Nucleotide Database, and six additional SeMNPV gene transcripts from the P8-Se301-C1 cells were identified, including se11 (orf4 PE), se42 (unknown function), se44 (unknown function), se88 (iap-2), se91 (lef-3) and se127 (lef-6) (Figure 5A and Supplementary Figure S3). Thus, in total, sixteen viral gene transcripts that mapped to the SeMNPV genome were identified in the P8-Se301-C1 cells by a combination of RNA-Seq and RACE (Figure 5A), and the genes were found to be located in a disperse region of the SeMNPV genome (Figure 5B).
Figure 5. (A) Chimeric SeMNPV gene transcripts from the P8-Se301-C1 cells. The fragments of the 3' and 5' ends of the ten SeMNPV gene transcripts detected by RNA-Seq (shown on the left) were sequenced and assembled into full-length cDNA sequences. Six more SeMNPV gene transcripts were detected in RACE analyses. The positions of all sixteen detected SeMNPV genes (in green) are shown aligned to the SeMNPV genome. The SeMNPV genes integrate into the host genome and the 3′ or 5′ ends of the SeMNPV gene transcripts (in green) are aligned to the host genome (in yellow). The bar at the top of the figure is marked to show the size of each of the transcripts. (B) Overview of the SeMNPV transcripts from the P8-Se301-C1 cells. The sixteen SeMNPV genes (in green) detected by RNA-Seq and RACE analyses map to the SeMNPV genome (in dark green). The circular map has been modified according to the map of the SeMNPV genome (IJkel et al., 1999).
The full-length sequences of the transcripts showed that the SeMNPV genes are incorporated into the host cell genome. The genes consequently form chimeric fusion transcripts in P8-Se301-C1 cells, and either the 3' or 5' end of each transcript is aligned with the host genome (Figure 5A and Supplementary Figure S3).
The organization of each fusion transcript is as follows (Table 1). (1) The full-length transcript containing the se5 gene maps to the nt positions in the SeMNPV genome at 6190-7713, the 5' end sequence (nt 3-48) aligns to the Bombyx mori akh2 mRNA (nt 26-71) that encodes adipokinetic hormone-2 and the 3' end sequence cannot be aligned to any known sequence in the NCBI's Nucleotide Database. (2) The full-length transcript containing se7 maps to nt 9261-10433 of the SeMNPV genome, the 5' end sequence cannot be aligned to any known sequence and the 3' end sequence (nt 1373-1448) aligns to the B. mori dh40 mRNA (nt 3116-3192) that encodes diuretic hormone 40. (3) The full-length transcript containing se8 maps to nt 12498-14495 of SeMNPV the genome) and the 5' end sequence (nt 12-60) aligns to the B. mori dh40 mRNA (nt 1035-998). (4) The full-length transcript containing se12 (which map to nt 16064-16693 of SeMNPV genome) and the partial sequences of se11 (which map to nt 15817-15852 and nt 15923-16101 of the SeMNPV genome) are transcribed together. The 5' end sequence (nt 5-48) aligns to the B. mori akh2 mRNA (nt 27-71) and the 3' end sequence cannot be aligned to any known sequence. (5) The full-length transcript containing se43 (which maps to nt 42696-43856 of the SeMNPV genome) and the partial sequence of se42 (which maps to nt 42392-42606 of the SeMNPV genome) are transcribed together. Neither the 5' end nor the 3' end can be aligned to any known sequence. (6) The full-length transcript containing se45 (which maps to nt 44408-45394 of the SeMNPV genome) and the partial sequence of se44 (which maps to nt 44305-44312 of the SeMNPV genome) are transcribed together. The 5' end sequence cannot be aligned to any known sequence. (7) The full-length transcript containing se89(which maps to nt 85499-86398 of the SeMNPV genome) and the partial sequence of se88 (which maps to nt 85088-85799 of the SeMNPV genome) are transcribed together. The 5' end sequence cannot be aligned to any known sequence. (8) The full-length transcript containing se124 (which maps to nt 118809-119391 of the SeMNPV genome), the partial sequences of se90 (which maps to nt 86521-86789 of the SeMNPV genome) and se91 (which maps to nt 86788-86860 of the SeMNPV genome) are transcribed together. The 5' sequence (nt 3-50) aligns to the B. mori dh40 mRNA (nt 25-71). (9) The full-length transcript containing se126 (which maps to nt 120802-121788 of the SeMNPV genome) and the partial sequence of se127 (which maps to nt 121816-122307 of the SeMNPV genome) are transcribed together. The 5' end sequence cannot be aligned to any known sequence and the 3' end sequence (nt 1360-1413) aligns to the B. mori sifa mRNA (nt 462-564) that encodes SIFamide.
Transcript size (bp) Harbored SeMNPV genes 5' end sequence 3' end sequence Position in the transcript SeMNPV gene (position in the SeMNPV genome) Position in the transcript Aligned host gene mRNA (position in the host gene) Position in the transcript Aligned host gene mRNA (position in the gene) 1847 nt 106-1634 se5(nt 6190-7713) nt 3-48 B. mori akh2(nt 26-71) nt 1707-1847 No hita 1530 nt 138-1310 se7(nt 9261-10433) nt 1-130 No hit nt 1373-1448 B. mori dh40(nt 3116-3192) 2136 nt 104-2103 se8(nt 12498-14495) nt 12-60 B. mori dh40(nt 1035-998) —b — 978 nt 300-933
nt 55-91
nt 162-229se12(nt 16064-16693)
se11(nt 15817-15852)
se11(nt 15923-16101)nt 5-48 B. mori akh2(nt 27-71) nt 946-978 No hit 1691 nt 348-1511
nt 46-261se43(nt 42696-43856)
se42(nt 42392-42606)nt 1-46 No hit nt 1520-1691 No hit 1118 nt 161-1107
nt 62-70se45(nt 44408-45394)
se44(nt 44305-44312)nt 1-61 No hit — — 1370 nt 488-671
nt 35-489se89(nt 85499-86398)
se88(nt 85088-85799)nt 1-35 No hit — — 1035 nt 407-991
nt 57-323
nt 322-396se124(nt 118809-119391)
se90(nt 86521-86789)
se91(nt 86788-86860)nt 3-50 B. mori adh40(nt 25-71) nt 1007-1035 — 1421 nt 102-1122
nt 1164-1354se126(nt 120802-121788)
se127(nt 121816-122307)nt 1-62 No hit nt 1360-1413 B. mori sifa(nt 462-564) Note: a: The end sequence cannot be aligned to any known sequence in the NCBI's Nucleotide Database.
b: The end sequence is matched to SeMNPV gene.Table 1. The SeMNPV gene-containing full-length transcripts in P8-Se301-C1 cells
Sequence trimming
De novo assembly
Functional annotation
Identification of SeMNPV gene transcripts in P8-Se301-C1 cells by RNA-Seq
3′/5′ RACE analyses of the SeMNPV gene transcripts
-
Primer Sequence (5'-3') Genome site (nt) Product size (bp) RT-PCR se5-F GCCTCTGCTATCGTTGCT 6832-6849 858 se5-R CTGATCGGTGGTTTCTCC 7689-7672 se7-F GAGGAGATACGAGGTGATG 9674-9692 753 se7-R TTTCCAAACTTTAGTGCC 10426-10409 se8-F CGCCAAAGACATAGTCCA 12557-12574 1736 se8-R GCGTCAACATTGCCATTA 14292-14275 se12-F TATAGCGTTCTGTTTAGCG 16124-16142 557 se12-R ATTGGATTGGTGCCTTTG 16680-16663 se43-F TCAGCGTCAATAGACTCAT 43066-43084 651 se43-R CGAAGCGATTCATAAAGTA 43716-43698 se45-F ACGACGACTTTACCCAGAA 44523-44541 656 se45-R ATCGGCGACAAACTCAAT 45178-45161 se89-F ACCAACGCCGATTGTCTG 86155-86138 566 se89-R GTGCGGTGGGCATCTTCA 85590-85607 se90-F AGGGACCGTGTCGAAGTA 86765-86748 301 se90-R CTGCCACCGTCAATAGGA 86465-86482 se124-F GGTTGGGTGACGTGATAC 118937-118954 413 se124-R GTCGCTACATTCGTAGTTGT 119349-119330 se126-F TGGGATACTCAAGCCTAAA 121203-121221 533 se126-R TCTCGCTCACCTTCTTATT 121737-121756 RACE-PCR - - GSP-se5-F CAAGAGGAGCCCTGGAAC - - GSP-se5-R GTGGAGGTAGAATACGGC - - GSP-se7-F GAGGAGATACGAGGTGATG - - GSP-se7-R CCGTGATTTCAAACCTTT - - GSP-se8-F CGCCAAAGACATAGTCCA - - GSP-se8-R ATTACCGTTACAACTGCG - - GSP-se12-F AGCGACATACCGTTGCAAGT - - GSP-se12-R GGCGATGTACGCGTTGAAAA - - GSP-se43-F CACCTTCGCCCTCAACAGAT - - GSP-se43-R AGCGCGTACAGAATGCTCTT - - GSP-se45-F TGTCTGCCGCACGAAAAGTA - - GSP-se45-R CTAAACGTCAGTCCGGGCAT - - GSP-se89-F CCGGCCAGTTTGCCAAATAC - - GSP-se89-R TCGTTTCCGCTAACGTCGAA - - GSP-se124-F CCAAAATCCCGACGACAACG - - GSP-se124-R GGTCGCGCATCATCATCAAC - - GSP-se126-F CGTTTCTGCGCGAATCTCTG - - GSP-se126-R TTGATCGCGAGCGAATACGA - - NGSP-se5-F TGATTTCGATGGCCTACCCG - - NGSP-se5-R TTTCCGGTCTGTCATCAGGC - - NGSP-se7-F AGCGCAGAACCGTATGTCAA - - NGSP-se7-R TCTCGTCTCGGTGACCGTAT - - NGSP-se8-F GTGCCGCATGAGCGATAAAG - - NGSP-se8-R TCGCTTTCGAACCTACCCAC - - NGSP-se12-F GAAGCGTTGACGCCGAAAAA - - NGSP-se12-R GCGCGGACGAACTTGAAAAT - - NGSP-se43-F GATTGCAGCCGTTCAAGAGC - - NGSP-se43-R CCCAAGGTGTACGTGTCGAT - - NGSP-se45-F CAGGCGTTGGATTGCATGTG - - NGSP-se45-R TTGACTATACTGTCGGCGGC - - NGSP-se89-F GGCGTCACCTTACGAGACAA - - NGSP-se89-R TGTCGAAGCAGCCGTACATT - - NGSP-se124-F CCGCCGTTAAACAACCATCG - - NGSP-se124-R TCTCCAAGACGACACTCCCA - - NGSP-se126-F GAGCATTCGTTGGTCGAAGC - - NGSP-se126-R AGAACTTGCGCACAAACGTC - - Table S1. Names and sequences of the primers used for the RT-PCR and 5'/3' RACE-PCR
Cell line Raw reads Quality trimmed Adaptor trimmed rRNA trimmed Clean ratio P8-Se301-C1 54, 569, 296 54, 292, 372 53, 666, 292 52, 784, 058 96.7% Se301 56, 865, 504 56, 639, 360 55, 941, 000 55, 033, 958 96.8% Table S2. Statistical results of the stringent filtering processes
Cell line Counts Total length (nt) N50 (nt) Mean length N% GC% P8-Se301-C1 116, 048 136, 121, 302 2, 137 1, 173 0.0 37.5 Se301 104, 600 122, 303, 958 2, 145 1, 169 0.0 37.6 Table S3. Statistical results concerning the primary unigenes from P8-Se301-C1 and Se301 cells
Cell line Counts Total length (nt) N50 (nt) Mean length N% GC% P8-Se301-C1 112, 565 121, 964, 401 1, 824 1, 093 0.0 37.3 Se301 102, 996 109, 124, 144 1, 803 1, 082 0.0 37.4 Table S4. Statistical results concerning the final unigenes from P8-Se301-C1 and Se301 cells
Figure S1. (A) The distribution of the lengths of the final unigenes from P8-Se301-C1 cells. (B) The distribution of the lengths of the final unigenes from Se301 cells.