Banana streak virus (BSV), a member of genus Badnavirus, is a worldwide pathogen which can cause streaked leaves on cultivated bananas (Musa spp.) and plantain plants . The occurrences of BSV have been confirmed in banana production areas of Asia, Africa, Australia, Americas and Europe. BSV infection in Musa spp. causes reduction of fruit yield and quality. BSV can be naturally transmitted by the citrus mealybug (Planococcus citri Risso) or possibly other species of mealybug [11, 13]. However, inadvertent use of infected planting material for vegetative propagation which acts as a main avenue of propagating banana , have given rise to greater risk of dissemination of BSV than that caused by mealybug . Therefore, the risk of BSV infection in bananas has become a hindrance to the international exchange of Musa germplasm. Moreover, some BSV species' genomes may integrate into the M. balbisiana host (B genome) [5, 9, 16]. In addition, tissue culture presents further risk for episomal expression of integrants and the induction of BSV infections; and other adverse environmental factors may also be involved in this virogenesis process . In view of the risks associated with latent infections by BSV, there is a great need to detect BSV to achieve virus-free banana plantlets. However, the large-scale genetic divergency among BSV isolates can cause problems for PCR based detection of BSV.
To investigate the genetic characteristic of BSV infected banana plants in Yunnan province in China, we sequenced the entire genome of an isolate we named BSAcYNV from a plantation in Yunnan, and performed sequence and phylogenetic analysis to determine its relationship amongst other strains.
The Banana streak virus investigated in this study was from M. Acuminata and isolated in Honghe, Yunnan province, China. The virus purification gave a moderate yield of bacilliform particles of size 23±2 nm×110±10 nm (Fig. 1A), which was used to raise antibody against BSV (data not shown).
Figure 1. A: BSV particles visualized by transmission electron microscopy image shows bacilliform particles of size 23±2 nm×110± 10 nm. B: Genome structure of Badnaviruses. VAP:virion-associated protein, polyprotein conting MP (movement protein), CP (coat protein), PR (aspartic protease), RT (reverse transcriptase) and RH (RNase H) shown in.
The complete genome of BSAcYNV (Banana Streak Acuminata Yunnan Virus) consists of 7722 base pairs (bp). A linearized Badnavirus full-length genomic structure is shown in the Fig. 1B. Consistent with other Badnaviruses and Caulimoviruses, the numbering of the genome sequence begins at the putative 5'-terminal of the tRNAiMet-binding site. The genome of badnaviruses usually contains a single molecule of dsDNA of about 7.2-7.8 kb long that forms an open circle interrupted by site-specific discontinuities that could contain an intergenic poly (A) region [8, 9]. The virus genome usually contains 3 ORFs, whose functions remain to be elucidiated. The largest ORF 3 polyprotein product consists of the movement protein (MP), the virus coat protein (CP), the aspartic protease(PR), the reverse transcriptase(RT) and the ribonuclease H (RH) domains. The CP domain of badnaviral ORF 3 displays at the C-terminus a large region rich in zinc finger (CCHC) motifs [8, 9]. Additionally, some badnaviruses show an additional dUTPase domain upstream or downstream to the CP domain within ORF 3 similar to that of several Retroviridae retroviruses . The function of the ORF 2 products remain elusive but the ORF 1-encoded protein is a virion-associated protein. The intergenic sequence is 1094 bp in the BSAcYNV genome (at positions 7538 to 7550) and was analyzed in silico to reveal a potential TATA box (sequence CTCTATAAGA GGA). The sequence TCACGCACGATGAC (7347-7363 bp) located upstream of the TATA-box is similar to the as-1 element in the 35S promoter of Cauliflower mosaic virus (CaMV). The CCA(N)nTGG motif and a repeated AGAAG motif (at positions 7244 to 7255) are also present in BSAcYNV promoter, which may impose vascular tissue-specific promoter activity, as observed in RTBV (Rice tungro bacilliform virus) and BSCvV (Banana Streak Cv. Virus) (from Cv. Williams, Australia) promoters . A putative trans cription start site and a polyadenylation signal were predicted at position 7569 and at position 7625 respectively (Fig. 2). Hence, a terminally redundant RNA transcribed from the genomic DNA was revealed at the transcriptional level. This large RNA is greater than one genome in length, termed as the pregenomic RNA (pgRNA). The pgRNA begins with a long leader with a short ORF (sORF) preceding the first long viral ORF which serves as a polycistronic mRNA contain ning three large ORFs.
Figure 2. Comparison of the structure of complete and defective genomic DNA in BSAcYNV. The intergenic region is depicted as a line, open reading frames (ORFs) as boxes and numbers are the positions in BSAcYNV genome; the start codons of large ORFs are in italics; arrowhead adjacent to a vertical line shows the primer-binding site (PBS) for reverse transcription; a small black rectangle denotes the putative TATA box; white triangle indicates a transcriptional start site; black triangle indicates a putative polyA signal; arrows under intergenic region denote complementary sequences that form the base of the large stem-loop structure. A: Primary structure in + strand of complete genomic DNA. B: Primary structure in + strand of defective genomic DNA. This defective genome lacks the whole RNA leader region and the major ORF1 coding region. C: The exacted DNAs from BSV preparation represent different size bands in agarose gel. 1. BSV genomic DNAs; 2. λDNA/ Hind Ⅲ Marker.
In contrast to the complete genome, the defective genomic DNA sequence lacks the intergenic region and the majority of the coding region for ORF1. However, it still contains a TATA-box and transcriptional start site, so it can be putatively transcribed into RNA. The transcriptional start site (at position 7569) directly links up with the 3'-terminal coding sequence (at position 1045) of ORF1. Thereby, the truncated genomic DNA is composed of 6525 bp and devoid of 1197 bp of sequence. This may be consistent with several different morphological sizes of extracted BSV genomic DNA identified on agarose gel electrophoresis (Fig. 2).
The folding of the BSAcYNV pgRNA leader resulted in an extended stem-loop structure formed just in front of ORF1, and only an in-frame CUG triplet (non-AUG start codon) in a favorable context was found in the downstream of AU rich sequence (Fig. 3), which could be used as a shunt acceptor. Also, ORF1 overlaps ORF2 using the motif AUGA, with AUG acting as the start codon of ORF2 and TGA acting as the stop codon of ORF1; ORF2 overlaps ORF3 via the motif UAAUG, with AUG acting as the start codon of ORF3 and UAA acting as the stop codon of ORF2. Along with the full-length mRNA, a fraction of shunting ribosomes could receive the AUG codon of ORF3 by leaky scanning. So we deduce that the combinational model of shunting and leaky scanning for pgRNA translation in BSAcYNV is similar to that in other Badnaviruses.
The positive strand contains three large ORFs, named ORF 1, ORF 2 and ORF 3. The ORF 1 of potentially encodes a protein of 176 amino acids. Note that the ORF1 of BSAcYNV has a non-AUG codon (UCUCUGG), it is in a suboptimal initiation context (the C of the CUG codon is designated +1, and there is pyrimidine instead of purine at -3 and a G at position +4) . However, the ORF1 of BSAcVNV and the counterpart of BSMyV don't begin with a conventional AUG start codon as well, but they are both in an optimal initiation context (ACUCUGG and AUUCUGG corresponding to BSAcVNV and BSMyV respectively) with a purine at position -3 and G at position +4. The non-AUG start codon, in a context favourable for binding which is in accordance with the Kozak consensus sequence, could partially recognized by shunting ribosomes, as has been shown for Cauliflower mosaic virus . But ORF 1 of BSOlV has a conventional start codon in a suboptimal initiation context and the AUG start codon for BSGfV ORF 1 lies in a very weak context (lacking both purines in position -3 and G in position +4), which is in agreement with the scanning model. The different context of sequence of the start codons for ORF1 of these BSV isolates might give rise to the variable abilities of ribosome recognition, as presented previously [6, 20]. It suggests the translational level of ORF1 genes may be diverse among these BSV isolates.
The ORF2 encodes a virion-associated protein (VAP) containing a coiled-coil domain which mediate not only homomerization but also interaction between heterologous proteins , and which is well-conserved in the plant pararetroviruses (Fig. 4). Alignment with ORF 2 of Badnaviruses revealed the same conserved coiled-coil domains. The coiled-coil domains of VAP can assemble into a parallel tetramer, a stable form in infected plants . The VAP could act as the "arm" of the virus particle, attaching to the capsid shell by its C terminus ; and VAP might participate in cell-to-cell and long-distance movement via interaction with other proteins (e.g. viral MP), as shown in the Cauliflower mosaic virus .
Figure 4. Amino acid sequence alignment of BSAcYNV, BSAcVNV, BSOEV and BSMyV, and Commelina yellow mottle virus (ComYMV), Dioscorea bacilliform virus (DBV), Kalanchoe top-spotting virus (KTSV) ORF2 proteins. Asterisks denote exact matches, and double or single dots denote positions of conserved or semiconserved amino acid residues respectively. boxes indicate the conserved coiled-coil domains in Badnaviruses.
The ORF 3 encodes a 217.7 kDa polyprotein, which contains the viral movement protein (MP), the coat protein (CP), putative viral aspartic protease, reverse transcriptase (RT) and RNase H [2, 10]. MP and CP are characterized by a coiled-coil motif [YVIPDIMMT IRDFYRHIQI, the hydrophobic amino acid (in bold) form the helix interface, while the others are hydrophilic polar residues.] and the cysteine rich, zinc finger like RNA-binding motif (CXCX2CX4HX4C). Additionally, this polyprotein contains another zinc-binding domain CXHX11CX2CX11CX2CX4CX2C (Fig. 5), which was found downstream of the first zinc finger motif. Although the tangible function of this zinc finger motif remains unclear, we speculated that it might be involved in package of virions due to their potential nucleotide-binding activity . The candidate NLSs (Nucleus Location Sites) of BSAcYNV represent in the putative CP region are PRRPRK (487-492, positions of the ORF3) and KRK (761-763, positions of the ORF3), so the multifunctional CP is also thought to be associated with transportation of the viral genome into nucleus via an importin α-dependent pathway as described in RTBV .
Figure 5. Alignment of the two Cysteine-Histidine motifs (A, B) of other badnaviruses and Rice tungro bacilliform virus (RTBV). The Badnaviruses are Banana streak virus isolates BSOEV, BSMyV, BSMyV, BSGfV, BSAcVNV and BSAcYNV, Commelina yellow mottle virus (ComYMV), Cacao swollen shoot virus (CSSV), Citrus yellow mosaic virus (CYMV), Dioscorea bacilliform virus (DBV), Kalanchoe top-spotting virus (KTSV), Sugarcane bacilliform virus (SCBV), Tara bacilliform virus (TBV). An asterisk denotes an exact match, and double or single dots denote positions of conserved or semiconserved amino acid respectively.
Through analysis by NCBI Blast, the BSAcYNV DNA sequence is found to be 88% identical to BSAcVNV, and their polyprotein amino acid (aa) sequences of share 95.1% homology. But overall the amino acid sequences show low identity to those of other BSV isolates (Table 1). Phylogenetic analysis of the zinc finger motifs showed that BSAcYNV was most closely related to BSAcVNV (Fig. 6), consistent with the neighbour-joining phylogenetic tree based on RT and RNase H domains (the data not shown). BSAcYNV and other four isolates did cluster together in a phylogenetic tree as well as forming a single cluster with Kalanchoe top-spotting virus (KTSV). This supports the idea that these banana streak badnaviruses from hosts with divergent genetic background can be regarded as a distinct virus species .
Table 1. Amino acid sequence homologies among BSAcYNV, other banana streak badnaviruses and Kalanchoe top-spotting virus (KTSV)
Figure 6. Phylogenetic tree constructed by the MrBayes program depicting the relationships of the Badnaviruses, Caulimoviruses and Rice tungro bacilliform virus (RTBV), based on the alignment of the reverse transcriptase and ribonuclease H domain. The badnaviruses are Banana streak virus isolates (BSOlV, BSMyV, BSGfV, BSAcVNV and BSAcYNV respectively), Commelina yellow mottle virus (ComYMV) Kalanchoe top-spotting virus (KTSV), Sugarcane bacilliform virus (SCBV).