When enterophage T7 infects enterobacteria, the success of the phage relies on the amazing catalytic efficiency of its RNA polymerase (T7 RNAP), which adds up to 250 nucleotides per second to an RNA transcript. In a matter of minutes after infection, protein expression is dominated by viral proteins mainly because T7 RNAP so efficiently transcribes RNA. This efficiency is harnessed in biotechnology and basic research. Without T7 RNAP, for example, structural biology would not be the same. Countless proteins studied by X-ray crystallography and NMR have been expressed using T7 RNAP-based expression systems in Escherichia coli (Studier F W, et al., 1986). Likewise, large scale preparations of RNA used in structural studies rely on T7 RNAP-catalyzed in vitro synthesis (Milligan J F, et al., 1987). Moreover, our understanding of transcription itself has been advanced enormously by using the T7 RNAP as a model enzyme. Even though T7 RNAP is smaller than most other RNA polymerases, and is composed of a single subunit (different from the multi-subunit RNAPs of prokaryotes and eukaryotes), it nevertheless displays most of the complexity of its larger "cousins". As a model system, T7 RNA polymerase has been thoroughly characterized in terms of both its structure and its mechanism; however, some big questions remain. This review focuses on structural and functional transitions during the synthesis of the first dozen or so nucleotides early in transcription.
Transcription occurs in two phases, initiation and elongation. While initiation involves drastic conformational changes in both protein and nucleic acid and is relatively slow, elongation involves only subtle conformational changes and is highly efficient. To initiate transcription, RNA polymerases bind sequence-specifically to promoter DNA and open a bubble in the DNA, exposing a single-stranded portion of the template strand in preparation for RNA synthesis. During transcription initiation, a growing DNA – RNA hybrid forms, with the 5'-end of RNA maintaining its interaction with the template strand (Fig. 1). In this phase of transcription, frequent loss of RNA from the initiation complex (IC) is observed. Because the enzyme remains bound to the promoter in these events, transcription can reinitiate quickly, leading to a process called abortive cycling that only ends when the RNA exceeds a certain length. Once the RNA reaches that length (for T7 RNAP, about 9-12 nucleotides), the promoter is released, the bubble collapses, and the 5'-end of the RNA dissociates from the template strand, resulting in a smaller size of the bubble (for T7 RNAP, a final size of about 8 nucleotides). In this elongation complex (EC), the size of the hybrid and the bubble is maintained as the polymerase translocates on the template strand. The events leading from transcription initiation to transcription elongation (reviewed for T7 RNAP in Martin C T et al., 2005 and Steitz T A, 2009) are summarized in Table 1.
Figure 1. Structure of T7 RNAP during early initiation (PDB ID 1QLN). This structure is also representative of the conformation of the enzyme in the absence of DNA. Color coding used in this cartoon and Fig. 2, 3, and 4: Nucleic acid show in blue (template strand), green (non-template strand) and red (RNA); the C-terminal domain of RNAP is shown in gray, black (thumb helix) and orange (specificity loop); the N-terminal domain is shown in magenta (promoter binding domain), green (residues that will form sub domain H in the elongation conformation) and yellow (residues 1-70). Selected secondary structure elements are labeled, and the side chain of proline 266 as well as the nucleotides forming the DNA RNA hybrid are highlighted in all-atom representation.
Feature IC3 (early initiation) EC (elongation) IC7 Biochemical features Promoter bound released like IC3 Bubble expanding on downstream side closing upstream, opening downstream like IC3 DNA RNA hybrid 3 bp 7-8 bp 7 bp 5'-end of RNA part of hybrid single stranded like IC3 RNA release frequent none like IC3 Structural features Structural representatives 1QLN 1MSW, 1H38 3E2E Promoter binding site intact destroyed like IC3 RNA exit channel blocked formed like IC3 Preventing upstream bubble collapse protein RNA like IC3 Interactions between helix C and C-terminal domain present present present Thumb helix bound to PBD bound to hybrid like EC
Table 1. Biochemical and structural features of different stages in transcription
While the catalytic role of RNAP is the same for initiation and elongation, its role in interacting with the nucleic acids changes dramatically. In RNAPs of prokaryotes and eukaryotes, this change in function is achieved by multi-subunit polymerases that shed promoter-binding subunits (such as the E. coli sigma factor) when transitioning from initiation to elongation. As a single-subunit polymerase able to initiate without additional protein factors, T7 RNAP achieves this change of function by undergoing a dramatic set of conformational changes, as crystal structures of the initiation and elongation complex show (Cheetham G M, et al., 1999; Yin Y W, et al., 2002; Tahirov T H, et al., 2002) (Fig. 2A and C). Similar to a gear box in a car, where a change in the juxtaposition of gears results in a different gear ratio, conformational changes in T7 RNAP result in different binding sites and protein-nucleic acid interactions when comparing the initiation complex to the elongation complex. The dramatic changes occur predominantly in the N-terminal part of the protein. In the initiation complex, the promoter-binding domain (PBD) interacts with the specificity loop to form the binding interface with the promoter (Fig. 1). In the elongation complex, these interactions are lost as the PBD moves away from its original position. Instead, the specificity loop now interacts with a sub domain newly formed from helices H1 and H2, forming a channel through which the 5'-end of the RNA exits. It has been argued that the high processivity of the elongation complex is due in some measure to the RNA bound tightly on the 3' side by the active site and on the 5' side by the RNA exit channel, topologically locking the template strand to the polymerase (Liu X, et al., 2009).
Figure 2. Structural changes from early initiation (IC3, PDB ID 1QLN) to late initiation (IC7, PDB ID 3E2E) to elongation (EC, PDB ID 1MSW). The C-terminal domains of the three structures were aligned to show them in a common orientation, which is a view from the top relative to Fig. 1. Residues 568-689 of the palm domain were omitted in the figures to afford a view of the growing hybrid. The IC3 structure does not resolve the downstream DNA, and the figure shows it modeled based on the EC structure (skyblue and light green) consistent with biophysical data (Turingan R S, et al 2007). The thumb helix, which loses interactions with the promoter binding domain in the transition from IC3 to IC7 and interacts with the minor groove of the hybrid in both IC7 and EC, is shown as a ribbon with side chains as sticks. Steric clashes that potentially act as triggers for subsequent conformational change are indicated by orange "lightning rods" for IC3 and IC7.
Knowing the initial and final states in a conformational change sometimes suggests a plausible path to achieve this transition. In this case, however, there is a substantial conformational change accompanied by the loss and gain. of many protein-protein and protein-nucleic acid interactions, begging for more structural data on intermediate states. In 2008, structures of an initiation complex of the P266L variant of T7 RNAP bound to 7 nt (IC7) and 8 nt RNA (IC8) were determined (Durniak K J, et al., 2008), confirming that the transition from initiation to elongation is not a two-state transition, but instead goes through multiple structural intermediates (Bandwar R P, et al., 2007)
Crystallography is a slow technique, and special measures have to be taken to capture intermediates that are short-lived. It is possible to stall RNA polymerase at any position by leaving out nucleotides, or by offering a 3'-deoxy nucleotide. Another approach is to pre-assemble the RNA-DNA complex (using strategically placed mismatches between template and non-template strands to favor formation of the RNA-DNA hybrid and the transcription bubble) and allow the enzyme to bind to it (Daube S S, et al., 1992). The latter approach was used in crystallizing the elongation complex (Tahirov T H et al., 2002, Yin Y W, et al., 2002). For the IC7 and IC8 intermediate, RNA dissociation presents a hurdle to either approach. The P266L variant of T7 RNAP was discovered in a genetic screen set up to select for mutations that show less abortive cycling (Guillerez J, et al., 2005). Transcription assays show that the P266L variant aborts less throughout, and most markedly at positions +5 through +9. Using P266L, which is less prone to abortive cycling, allowed Durniak K J et al. (2008) to assemble late initiation complexes sufficiently stable for crystallization.
As the P266L mutation does not interfere with any aspect of transcription, these crystal structures represent on-pathway intermediates of transcription. How representative are they of transcription in the wild type enzyme? Guillerez J et al (2005) argued that weaker promoter binding of P266L might allow an earlier transition to the elongation conformation, explaining the reduction in abortive products. Using a multipronged approached, Ramírez-Tapia L E and Martin (2012) compared the timing of the transition (by testing for promoter loss using a fluorescent technique and by testing for conformational change in the enzyme using a proteolytic cleavage susceptibility assay) at different stages in transcription. Halted at positions up to +8, no transition to elongation was detected (in the 60 sec time frame of the assay) in either mutant or wild type. However, the two enzymes showed marked differences at position +9 in both assays, with a large fraction of wild type enzyme already in the elongation state while the P266L mutant remained in the promoter-bound initiation state. Overall, the data show that promoter loss and the large conformational change of the enzyme is delayed in P266L compared to the wild type enzyme. It is not clear if more subtle events leading up to promoter loss are also delayed in P266L, i.e. the IC7 structure of P266L might be representative of the wild type enzyme in its IC6 state in some aspects. For the interpretation of the IC8 structure, there are multiple concerns. First, the structure was determined at very low resolution (7 Å) using molecular replacement to obtain phases. While the structure clearly shows that the promoter is still bound, and this is important information in itself, any more detailed inter pretation of the coordinates should be done with extreme caution as one expects severe model bias at this low resolution. Second, it is unclear whether the structure is representative of the IC8 state of the wild type, or whether the wild type would already have undergone changes leading to promoter loss. Given that the IC7 and IC8 structures are very similar and the IC7 structure was determined at much higher resolution, this review focuses on the IC7 structure, which contains a wealth of information about the transition of T7 RNAP from initiation to elongation.
In the structural transition from IC3 to IC7, promoter, the promoter binding domain and the specificity loop move as a rigid body away from the C-terminal domain, presumably pushed by the hybrid (Durniak K J, et al. 2008). Consequently, enzyme-promoter interactions are virtually unchanged, and space opens up to accommodate more template DNA and newly synthesized RNA within the enzyme. To allow the rigid body movement, which is a 40° rotation about an axis near the -4 region of the promoter, several elements in the structure undergo hinge or shear motions. Hinge motions include the base of the specificity loop, the single-stranded DNA connecting promoter and RNA-DNA hybrid, and the loop including residue 150 connecting helices G and H. The most dramatic change is the shear motion of helices C2 and D, which are connected with a loop that is disordered in both the IC3 and the IC7 structures (Fig. 3). In both structures, the two helices are in contact distance with a hydrophobic interface, but one helix moves relative to the other by more than 12 Å, reorganizing the contacts between hydrophobic residues Phe 51 and Phe 55 of helix C2 and Ile 74, Thr 75 and Leu 78 of helix D. There is a precedent for flat hydrophobic interfaces that allow multiple relative orientations of interaction partners (Ritacco C J, et al. 2013), suggesting that as RNA is extended stepwise from 3 to 7 nt, T7 RNAP might provide space stepwise by successively rotating around this axis.
Figure 3. Changes in the conformation of the N-terminal domain of T7 RNAP in the transition from IC3 to IC7. View similar to that of Fig. 1. The promoter binding domains (PBD, magenta) rotates together with the promoter and the specificity loop relative to the remainder of the N-terminal domain (yellow and green) along an axis that is vertical in this view. For this figure, the PBDs of IC3 and IC7 were superposed. Hydrophobic residues Phe 51 and Phe 55 of helix C on the one hand, and residues of the N-terminal domain Thr 75 and Ile 74 on the other hand, remain in contact but switch position during the transition (lower panel).
T7 RNAP, like other RNA polymerases, releases short RNA fragments early in transcription in a process called abortive cycling. The mechanism of abortive cycling comes down to binding interactions and dissociation kinetics of RNA. A stably bound RNA will not dissociate, but even a weakly bound RNA will stay bound if the dissociation rate is slow compared to the rate of adding the next nucleotide. From first principles and setting aside protein-nucleic acid interactions for the moment, one would expect that as the RNA-DNA hybrid gets longer, binding strength would increase while dissociation rates would decrease, simply because the number of base pairing and stacking interactions increase with length of the hybrid. If the transcription rate is independent of RNA length, rates of RNA fall off should decrease as transcription proceeds. However, transcription assays show that some RNA lengths are more prone to fall off and some less, varying in a non-systematic way. This points to discontinuous events during transcription, such as the DNA-RNA hybrid running out of space in the binding cavity of the enzyme, or the enzyme slowing down as conformational change becomes necessary to provide space. Vahia and Martin (2011) have probed the energetic basis of abortive cycling by systematically increasing or reducing different kinds of possible "stress" (destabilizing interactions) proposed in models of abortive cycling and measuring the ratio of abortive products (with a length of 2 to 6 nt) to longer products. None of the manipulations (changing the size of the template strand between promoter and transcription start site, increasing the size of RNA by adding bulk to the 5'-end, changing the energetic of bubble opening or collapse by introducing mismatches between template and non-template strands) show systematic changes in the amount of abortive products, suggesting that steric clashes might not directly influence binding strength, RNA dissociation kinetics and transcription rate in a way that explains the observed length distribution of abortive products.
Comparing the binding interactions of RNA with the enzyme in the IC3, IC7 and EC structures suggests that the set of interactions observed in the EC structure do not continuously become available as the RNA product increases in length, but appear in discontinuous jumps as the enzyme undergoes conformational change to expose or create binding interfaces (see Table 2). For example, while many of the C-terminal interaction partners seen in the EC structure are already in place in the IC7 structure, the exit channel lined by N-terminal residues is not. Once the protein undergoes the conformational change creating the exit channel and the 5' end of RNA binds to it, RNA binding affinity will increase markedly. Likewise in the transition from IC3 to IC7, residues of the thumb helix which eventually interact with the phosphate backbone and the minor groove of the hybrid (such as Arg 389) are engaged in interactions with the promoter binding domain. Only when the conformational change seen between the IC3 and IC7 structures occurs to expose this binding interface for contacts with the hybrid, will those interactions be able to kick in, again stabilizing the enzyme-hybrid interactions. The timing of protein conformational change making these nucleic acid binding interactions available might explain the observed non-systematic pattern of abortive products, but exploring this idea experimentally would require a probe sensitive to the protein conformational changes occurring from IC3 to I7.
Residue involved in RNA binding in the EC* Nucleotide interaction partner** Role in IC7 Role in IC3 K441 (-1, -2) RNA interaction RNA interaction R425 -2 RNA interaction RNA interaction R394 -4 RNA interaction RNA interaction K389 -5 RNA interaction bound to PBD R386 (-5, -6) RNA interaction RNA interaction K172 (-5, -6) distant from RNA (part of sub domain H) distant from RNA (part of sub domain H) R756*** ~ -9 Promoter binding Promoter binding R746*** ~ -9 Promoter binding Promoter binding *interactions of positively charged residues according to Tahirov et al. (2002); **numbering relative to the active site, not the transcription start site; ***based on 1MSW coordinates (Yin and Steitz, 2002).
Table 2. Protein RNA interactions in transcription complexes
The IC7 structure shows how T7 RNAP makes space for the growing hybrid while remaining bound to the promoter. Proceeding from IC7 to EC, the promoter is released from the enzyme, allowing the initially melted bubble to collapse, driving displacement of the 5' end of the RNA (Gong P et al 2004). Supporting these changes in nucleic acid structure and binding, the N-terminal domain of the enzyme undergoes substantial reorganization, including dissociation of the specificity loop from the promoter binding domain, an extensive rigid body movement of the latter, a refolding of sub domain H and fusing of helices C1 and C2 (see Fig. 1 and 2). As a result of the reorganization, the RNA exit channel forms, lined by helix C, specificity loop, sub domain H, and a loop from 292 to 301. Because structural data on the protein conformation between the IC8 and EC state are lacking, it is not clear in which order and with which timing they might occur. It has been suggested that the interactions between helix C1 and the C-terminal domain, which are observed to persist without change in IC3, IC7 and EC structures, might serve to organize the conformational change by staying intact throughout (Theis K, et al., 2004). Overall, the EC structure is better defined than the IC7 structure (the loop between helix C2 and PBD and the entire sub domain H are resolved in the EC structure, and it was determined at higher resolution), presumably because of multiple additional protein-protein contacts such as the one between sub domain H and the specificity loop, and a gain in secondary structure upon transition to the EC c onformation. Curiously, the mitochondrial RNA polymerase, which shows high homology to T7 RNAP in the C-terminal domain and has an N-terminal element resembling the fold of the PBD in T7 RNAP, only undergoes subtle conformational change comparing apo-structure and elongation complex (Schwinghammer K, et al., 2013). A comprehensive comparison between these two enzymes will be possible only when an initiation complex of mitochondrial RNAP including its requisite initiation factors becomes available.
The IC7 and IC8 structures show how T7 RNAP remains bound to the promoter while the enzyme undergoes conformational change to make room for the growing hybrid. As RNA synthesis extends past 8 nt, further changes are necessary to avoid a potential clash of the upstream end of the hybrid and the PBD (Fig. 2B). To avoid this clash, either the PBD or the hybrid (or both) have to make room. If the hybrid never exceeds 8 bp in length because the template strand starts to dissociates from RNA once it becomes longer than 8 nt, an IC7/IC8 like protein conformation could be maintained as the growing RNA fills the cavity in the protein (model A). In this model, the trigger for promoter loss would be the loss of the specificity loop from the promoter binding site, "pushed away" by single stranded 5' RNA. Conversely, the hybrid might grow to lengths longer than 8 bp (8 bp is the length in the IC7 structure as well as the EC structure) if the PBD continues to move away from the active site (model B). The trigger for promoter loss would be the growing hybrid pushing on the promoter binding domain until the binding site falls apart, perhaps because the covalent connections of the PBD with the C-terminal domain of the enzyme become "overstretched". From the structural data available at this point, however, it is not clear how the enzyme would support promoter binding while harboring a hybrid up to 12 bp in length. Merely continuing the rigid body motion observed from IC3 to IC7 to arrive at a putative IC12 structure would lead to clashes with the downstream DNA and fail to clear the path for the hybrid, which would need an addition al~17 Å space to accommodate another 5 bp compared to the IC7 structure (Fig. 2B). On the other hand, biochemical evidence suggests that at least some fraction of the enzyme remains promoter bound up to an RNA length of 12 nt (Tang G Q et al., 2009, Ramírez-Tapia L E, et al., 2012). This stage of the transition clearly warrants further research.
In some sense, the P266L mutant is a better enzyme than the wild type. Is it possible to rationalize how the mutation affects the amount of abortive products and the timing of initiation to elongation transition from the structural data? The chemical environment of Proline 266 changes both from IC3 to IC7 and from IC7 to EC structures (Fig. 4). Proline 266 is part of a loop that connects the most C-terminal helix of the promoter binding domain with the C-terminal domain of the protein, which does not change conformation in the IC to EC transition. In the IC3 and EC structures, the entire loop is well-defined, but in the intermediate IC7 structure, it is partially disordered (no coordinates for residues) and Pro 266 lies on the edge of the disordered segment, so its conformation might not be as well defined as in the other two states. In the early initiation conformation, proline 266 is part of a hydrophobic patch that includes Phe 400 (part of the thumb helix), Met 431 and Phe 432. As the polymerase transitions, Pro 266 moves away from these residues. For example, the distance between the C-alpha atoms of residues 266 and 400 is 7.1 Å in the IC3 structure, increases to 10.0 Å in the IC7 structure, and increases further to 15.2 Å in the EC structure. At the same time, Pro 266 approaches residues at the junction between helix C1 and C2. For example, the distance between the C-alpha atoms of residues 266 and Tyr 44 decrease from 11.9 Å in the IC3 structure to 5.5 Å in the IC7 structure to 4.4 Å in the EC structure. In the latter structure, helix C1 fuses with helix C2 to form a longer continuous helix that defines one side of the RNA exit channel. When the transition to elongation is complete, Pro266 stacks on the aromatic side chain of Tyr 44, again as part of a hydrophobic patch, and the entire loop containing Pro266 is resolved. Given that Pro 266 switches chemical environment at least twice (losing hydrophobic interactions with the thumb helix, and later gaining hydrophobic interactions with helix C1/C2), one can rationalize that the P266L mutation influences both early events (less 2-5 nt abortive) and late events (transitioning to EC later than the wild type enzyme). If the mutation disrupts the hydrophobic packing of residue 266 in both the IC3 and the EC conformation, one would expect that it becomes easier or faster to transition from IC3 to IC7, leading to less abortives. Likewise, one would expect that it becomes more difficult or slower to transition from IC7 to EC, explaining the observed later transition to elongation in Pro266Leu. Intriguingly, other mutations in this region, including the Phe55Pro mutation, result in higher levels of abortive products (Bandwar R P, et al., 2007) rather than the lower levels observed with Pro266Leu.
Figure 4. Residue 266 has three distinct chemical environments in the IC3 (Pro), IC7 (Leu) and EC (Pro) structures. While it interacts with the thumb helix in IC3 and with the C1-2 helix in the EC as part of hydrophobic regions, it is exposed in IC7. In contrast to these changing interactions, the binding interface between helix C1 and the thumb helix remains in place throughout.