We assembled the HTS reads of IME-SA1/2/118/119 using Newbler(version 2.9, Roche). The genome size of IME-SA1 is 140, 218 bp and the reads mapping percentage was > 93%(Table 1); the genome size of IME-SA2 is 140, 906 bp and the read match was > 97%; the genome size of IME-SA118 is 139, 750 bp and the read match was > 64%(other reads belonged to S. aureus); and the genome size of IME-SA119 is 141, 026 bp and the read match was > 97%.
Bacteriophage Length (bp) # Matched Reads # Total Reads Match Percentage Min Coverage Max Coverage IME-SA1 140, 218 2, 251, 761 2, 405, 205 93.62% 676 5, 516 IME-SA2 140, 906 296, 039 288, 122 97.32% 194 1, 762 IME-SA118 139, 750 91, 643 142, 123 64.48% 36 529 IME-SA119 141, 026 2, 571, 317 2, 642, 206 97.32% 107 88, 881
Table 1. Summary of IME-SA1/2/118/1192 genome sequencing and assembly.
All the HTS reads from IME-SA1/2/118/119 were analyzed and ranked in descending order of HTS reads occurrence frequencies. As Figure 1 shows, IME-SA1/2/118/119 HTS read data all have one significant high-occurrence frequency read beginning with GGAATTCTTTTACCTCTCTC. The HTS data for the four phages share similar sequence occurrence frequency curves. More than 99% of reads have occurrence frequencies less than 299(IME-SA1), 134(IME-SA2), 21(IME-SA118) and 5183(IME-SA119)(Figure 2). Based on the termini analysis theory(Zhang et al., 2015), the reads with the highest occurrence frequencies represent the termini of IME-SA1/2/118/119. The ratios of the occurrence frequency of the terminal reads to general reads are 184(IME-SA1), 650(IME-SA2), 85(IME-SA118) and 3699(IME-SA119)(Table 2). Based on the phenomenon that IME-SA1/2/118/119 had only one significant high-occurrence frequency read, we hypothesize that Twortlikevirus S. aureus phages have conserved 5' terminus while the 3' terminus is variable.
Figure 1. Read occurrence distribution. Circles indicate the initial 20 bases of HTS reads of IME-SA1 (A), IME-SA2 (B), IME-SA118 (C) and IME-SA119 samples (D). The rank in the x-axis refers to the relative frequency of each read.
Figure 2. Occurrence rates of numbers of reads. Bars indicate the initial 20 bases of HTS reads of IME-SA1 (A), IME-SA2 (B), IME-SA118 (C) and IME-SA119 samples (D). The number on the x-axis represents the frequency from the last number to the next number (e.g. 593 represent the frequency between 299 and 593).
Bacteriophage Strand Ave. Freq. Ter. Freq. Occurrence Frequency Ratio Terminal Sequence IME-SA1 Positive 8 1473 184 GGAATTCTTTTACCTCTCTC IME-SA2 Positive 1 650 650 GGAATTCTTTTACCTCTCTC IME-SA118 Positive 1 85 85 GGAATTCTTTTACCTCTCTC IME-SA119 Positive 7 25894 3699 GGAATTCTTTTACCTCTCTC Note: Ave. Freq. – Average Frequency; Ter. Freq. – Terminal Frequency.
Table 2. Occurrence frequency of IME-SA1/2/118/119 terminal sequences.
Using the complete genomes of IME-SA1/2/118/119, a phylogenetic tree was constructed with these four phages and 19 other similar phages, using MEGA6 software for Windows(Figure 3). IME-SA1/2/118/119 grouped with Twortlikevirus S. phage G1 species. According to the International Committee on Taxonomy of Viruses(ICTV)Virus Taxonomy 2014 release(ICTV, 2014), the genus Twortlikevirus includes two species, Staphylococcus phage G1 and Staphylococcus phage K. The two species share similar terminal features. Figure 4 shows that all phages of the two species have conserved termini.
Figure 3. Neighbor-joining tree analysis based on the alignment of single nucleotide polymorphisms (SNPs) from Staphylococcus phage sequences available at NCBI. The numbers at nodes indicate bootstrap values
Figure 4. Comparison of the variable region in Twortlikevirus S. phages. The phages of species G1 are highlighted using the red dashed rectangle, while the K species are phages No. 17 and 18. The green bars represent the identical sequences in all sequences, while black bars show the identical sequence in some of the sequences. Numbers on the black bars illustrate the length of consensus sequence.
IME-SA1/2/118/119 genome analysis illustrated that the four phages have similar complete genome lengths(Table 1), all belong to G1 species(Figure 3), and that they share identical terminal sequences(Table 2) and similar terminal features(Figure 4). Thus IME-SA1/ 2/118/119 have similar characteristics. We conducted transmission electron microscopy of IME-SA2 to reveal their morphology. IME-SA2 had an isometric head(diameter about 60 nm) and a non-contractile tail(length about 200 nm)(Figure 5). According to ICTV guide-lines, IME-SA1/2/118/119 were classified as members of the Siphoviridae family, order Caudovirales.
From sequence analysis, we identified a variable region with a specific pattern just before the conserved terminal sequence of Twortlikevirus S. G1 species. The pattern is shown in formula(1).
*: repeats multiple times.
The details of the variable regions are shown in Supplementary Figure S1. As Supplementary Figure S1 shows, variable regions are found in all Twortlikevirus S. G1 species phages, but in neither of the Twortlikevirus S. K species. The TAC subunit repeats occur in variable numbers.
Subsequently, we cloned the phages IME-SA1 and IME-SA2, based on a previously described protocol(Green and Sambrook, 2012). Then, we chose different clones of IME-SA1 and IME-SA2 to perform terminal sequence PCR. Among the 30 sequenced clones of IME-SA1, four(IME-SA1-15, IME-SA1-18, IME-SA1-23, and IME-SA1-26) did not have the consensus IME-SA1 standard terminal sequence(Figure 6A). Similarly, among the 10 sequenced clones of IME-SA2, three(IME-SA2-6, IME-SA2-8, and IME-SA2-9) varied from the IME-SA2 standard sequence(Figure 6B).
Figure 6. PCR results of variable terminal regions in different phage clones. (A, B) Different terminal length of phage clones IME-SA1, IME-SA2. (C–E) Different terminal length of parents and offspring phage clones IME-SA1-23, IME-SA1-26, and IME-SA1-15 during successive generations. "M" represents the marker.
To further analyze the IME-SA1 clones that differed from the consensus IME-SA1 sequence, we cultured the respective IME-SA1 clones with the longest, average, and shortest variable sequence and then amplified their termini from the respective genomic DNA isolated from various generations using PCR. The clone with the longest terminus(IME-SA1-23) had variable terminal lengths in successive generations(Figure 6C). However, the clone with the shortest terminus(IME-SA1-26) and that with the average length terminus(IME-SA1-15) maintained the same terminal length as themselves during successive generations(Figure 6D, 6E).
Sequencing the variable region of four IME-SA1 and three IME-SA2 clones showed that subunit TAAGTA CCTTTGTTATG((TAC)*TAT)*(TAC)* is repeated three times in IME-SA1-15, six times in IME-SA1-18, eight times in IME-SA1-23, and twice in IME-SA1-26, with the TAC sub-subunit repeated variously(Supplementary Figure S2). Similarly, IME-SA2-6 has five repeats, IME-SA2-8 has six repeats, and IME-SA2-9 has three repeats of the subunit TAAGTACCTTTGTTATG((TAC)*TAT)*(TAC)*(Supplementary Figure S2).
Considering that IME-SA1-26 has the shortest variable region, we analyzed its HTS sequence. Among the HTS reads from IME-SA1-26, the upstream and downstream sequences of the variable region were consistent with each other while the variable regions were different from each other(51 HTS reads of IME-SA1-26 are listed in Supplementary Figure S3).
A comparison of the variable regions in IME-SA1 clones after infecting S. aureus, S. epidermidis, and S. haemolyticus shows no change occurred in the variable region(Data not shown).