Title: Protein Splicing of Inteins and Hedgehog Autoproteolysis: Structure, Function, and Evolution
Abstract: Protein splicing is a posttranslational editing process that removes an in ternal pro tein fragment (intein) from a precursor and ligates the ex ternal pro tein fragments (exteins) to form the mature extein protein. Inteins, once considered an oddity, are now known to be widely distributed. The excised intein is also a stable protein that can have homing endonuclease activity (reviewed in2Belfort M Roberts R.J Nucleic Acids Res. 1997; 25: 3379-3388Crossref PubMed Scopus (395) Google Scholar). Homing endonucleases are site-specific enzymes that make double-strand breaks in intronless or inteinless alleles, initiating a gene conversion process that results in insertion of the mobile intron or intein gene. They are grouped on the basis of signature motifs such as the LAGLIDADG (dodecapeptide) or HNH motifs. The intein plus the first downstream extein residue mediates both splicing and DNA cleavage. When inteins were first described, we thought it remarkable that one protein could direct two proteolytic cleavages, protein ligation and DNA cleavage. We now know that protein cleavage and ligation are mediated by a protein splicing element that is ∼150 amino acids (aa) and that the core endonuclease is a separate structural domain. The mystery of how a single protein can accomplish these disparate tasks was solved by a combination of approaches that culminated in the definition of the splicing mechanism and the 3-D structure of the Saccharomyces cerevisiae VMA intein, the Drosophila Hedgehog protein autoprocessing domain and the Mycobacterium xenopi gyrase subunit A intein. Inteins are named with a genus/species designation followed by the extein gene name. There have been over 50 putative inteins identified to date on the basis of sequence similarity to known inteins and disruption of previously described proteins or open reading frames (4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar, 11Perler F.B Olsen G.J Adam E Nucleic Acids Res. 1997; 25 (a): 1087-1093Crossref PubMed Scopus (178) Google Scholar, 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar). For an updated list of inteins and their characteristics, see the Intein Registry Web site at http://www.neb.com/neb/inteins.html. Inteins have been found in 22 species of archaea, eubacteria, and single cell eucarya in 24 different proteins, ∼50% of which are involved in replication, DNA repair, transcription, or translation. This bias may be partially due to the types of genes present on vectors (phage, episomes, etc.) that are likely to transmit mobile inteins. Ten conserved intein motifs were identified, although 4 (blocks C–E and H) are now known to be in the endonuclease (Figure 1) (11Perler F.B Olsen G.J Adam E Nucleic Acids Res. 1997; 25 (a): 1087-1093Crossref PubMed Scopus (178) Google Scholar, 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar). Only 2 residues are absolutely conserved in all inteins, a His in block B and the intein C-terminal Asn. Ser/Thr/Cys are found on the C-terminal side of both splice sites. Conflicting phylogenetic analyses have been published based on intein motifs (11Perler F.B Olsen G.J Adam E Nucleic Acids Res. 1997; 25 (a): 1087-1093Crossref PubMed Scopus (178) Google Scholar) or a Hidden Markov Model (4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar). These discrepancies may be due to different data sets or an absence of statistically significant branches. Splicing is extremely rapid and to date, precursors have not been identified in native systems. The mechanism of protein spicng (Figure 2) has been extensively reviewed (12Perler F.B Xu M.-Q Paulus H Curr. Opin. Chem. Biol. 1997; 1 (b): 292-299Crossref PubMed Scopus (138) Google Scholar, 14Shao Y Kent S.B.H Chem. Biol. 1997; 4: 187-194Abstract Full Text PDF PubMed Scopus (40) Google Scholar). The process begins when the side chain hydroxyl or thiol of the conserved intein N-terminal Ser1 or Cys1 attacks the carbonyl (C = O) of the preceding amino acid, resulting in an ester or thioester bond at the N-terminal splice site (step 1, acyl rearrangement). This carbonyl is then attacked by the hydroxyl/thiol group of the Ser/Thr/Cys at the beginning of the C extein (the +1 residue) resulting in N-terminal cleavage and formation of a branched intermediate (step 2, transesterification). The branched intermediate is resolved by cleavage of the peptide bond at the C-terminal splice site due to cyclization of the intein C-terminal Asn (step 3, Asn cyclization). Finally, a spontaneous O-N or S-N acyl rearrangement establishes a peptide bond between the exteins (step 4, acyl rearrangement). Although protein splicing was once thought to follow a unique pathway, the initial acyl rearrangement has now been linked to activation of autocatalytic reactions in diverse biological processes including autoproteolysis, protein targeting, and addition of prosthetic groups (reviewed in1Beachy P.A Cooper M.K Young K.E von Kessler D.P Park W Hall T.M.T Leahy D.J Porter J.A Cold Spring Harbor Symp.Quant. Biol. 1997; in pressGoogle ScholarPerler et al., 1997b). Each step requires assistance from a proton donor and acceptor to facilitate the nucleophilic displacement. Mutagenesis data suggest that the conserved His in block B assists in the initial ester/thioester formation (8Kawasaki M Nogami S Satow Y Ohya Y Anraku Y J. Biol. Chem. 1997; 272: 15668-15674Crossref PubMed Scopus (68) Google Scholar) and the intein penultimate His assists in Asn cyclization (17Xu M.-Q Perler F.B EMBO J. 1996; 15: 5146-5153Crossref PubMed Scopus (253) Google Scholar). Four inteins (Ceu ClpP, Mja PEP, Mja Rpol A′, and Mja KlbA) do not have a penultimate His (4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar, 11Perler F.B Olsen G.J Adam E Nucleic Acids Res. 1997; 25 (a): 1087-1093Crossref PubMed Scopus (178) Google Scholar, 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar). Another residue may assist in Asn cyclization in these inteins or they may require cofactors. The Chlamydomonas eugametos ClpP intein failed to splice in Escherichia coli unless the intein penultimate Gly was mutated to His (16Wang S Liu X.Q J. Biol. Chem. 1997; 272: 11869-11873Crossref PubMed Scopus (40) Google Scholar). However, failure to splice in E. coli doesn't always indicate a defective intein since several active inteins are unable to splice in E. coli, possibly due to misfolding, inhibiting intracellular pH, redox potential, etc. Inteins have most likely coevolved with their exteins to optimize the coordination of the four nucleophilic displacements. Protein splicing is less efficient when an intein is expressed within a foreign protein, often leading to dead-end cleavage reactions. Proximal foreign extein residues can potentially disturb the intein active-site by steric hindrance, etc. Anraku and coworkers have suggested that the N extein interacts with intein residues to align the splice sites (10Nogami S Satow Y Ohya Y Anraku A Genetics. 1997; 147: 73-85Crossref PubMed Google Scholar). Determining the role of exteins in intein folding awaits solution of the structure of a precursor. Future experiments are also needed to define suitable locations for intein insertion if inteins are to be useful in protein engineering. Several lines of evidence suggest that endonuclease and splicing functions are encoded by separate intein sequences. First, mutagenesis data indicate that the endonuclease and splicing active-sites are distinct (12Perler F.B Xu M.-Q Paulus H Curr. Opin. Chem. Biol. 1997; 1 (b): 292-299Crossref PubMed Scopus (138) Google Scholar). Second, five mini-inteins lack endonuclease motifs (4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar, 11Perler F.B Olsen G.J Adam E Nucleic Acids Res. 1997; 25 (a): 1087-1093Crossref PubMed Scopus (178) Google Scholar, 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar). Splicing has been observed with the Mxe GyrA mini-intein (15Telenti A Southworth M Alcaide F Daugelat S Jacobs W.R Perler F.B J. Bact. 1997; 179: 6378-6382PubMed Google Scholar) and with the Sce VMA and Mycobacterium tuberculosis RecA inteins after deletion of the central region between blocks B and F (3Chong S Xu M.Q J. Biol. Chem. 1997; 272: 15587-15590Crossref PubMed Scopus (95) Google Scholar, 5Derbyshire V Wood D.W Wu W Dansereau J.T Dalgaard J.Z Belfort M Proc. Natl. Acad. Sci. USA. 1997; 94: 11466-11471Crossref PubMed Scopus (111) Google Scholar). Thir, a second family of homing endonuclease, an HNH homing endonuclease, is present in the Synechocystis gyrase subunit B intein (4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar, 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar). On the basis of sequence analysis, 4Dalgaard J.Z Moser M.J Hughey R Mian I.S J. Comput. Biol. 1997; 4: 193-214Crossref PubMed Scopus (64) Google Scholar and 13Pietrokovski S Protein Sci. 1998; 7 (in press)Crossref PubMed Scopus (131) Google Scholar proposed that the splicing element is composed of N-terminal and C-terminal regions connected by a linker or an endonuclease. This prediction is supported by the crystal structure of the Sce VMA and Mxe GyrA inteins with the exception of the inclusion of an endonuclease DNA recognition region (DRR) in the N-terminal splicing region of the Sce VMA intein (Figure 1). The Sce VMA intein has two structural domains (6Duan X Gimble F.S Quiocho F.A Cell. 1997; 89: 555-564Abstract Full Text Full Text PDF PubMed Scopus (234) Google Scholar). Domain II is the core endonuclease. Domain I (the first 182 aa and the last 44 aa) is a bifunctional domain composed of the DRR and the splicing element and is almost entirely β strands. The N and C termini are 2.9 Å apart and His79 in block B is near Cys1 as predicted by mutagenesis. Our understanding of the structure of the splicing domain was fine tuned by comparison to the structure of the Drosophila hedgehog protein autoprocessing domain (Hh-C), including a sequence alignment of inteins and Hh-C. The Hh-C domain is composed of Hh-C17, which directs thioester formation, followed by the 63 aa sterol recognition region required for cholesterol transfer. Because of the similarity of the architecture of Hh-C17 with the β-strand core of intein splicing domains (Figure 3), this new protein fold has been termed the Hint module (Hedgehog, intein) (7Hall T.M.T Porter J.A Young K.E Koonin E.V Beachy P.A Leahy D.J Cell. 1997; 91: 85-97Abstract Full Text Full Text PDF PubMed Scopus (229) Google Scholar). Hedgehog proteins are essential signaling molecules for embryonic development (reviewed in1Beachy P.A Cooper M.K Young K.E von Kessler D.P Park W Hall T.M.T Leahy D.J Porter J.A Cold Spring Harbor Symp.Quant. Biol. 1997; in pressGoogle Scholar). They are synthesized as inactive precursors with an N-terminal signaling region linked to a C-terminal Hh-C autoprocessing region. Hh-C begins with a Cys that undergoes an acyl rearrangement analogous to step 1 of the protein splicing pathway. The hydroxyl group of cholesterol is the nucleophile that cleaves this thioester bond and the reaction results in attachment of cholesterol to the C terminus of the signaling domain. Cholesterol anchors the signaling domain to the cell surface. Hh-C17 has an all β-strand structure with a flattened disk shape. Two superimposable structural subdomains are related by a pseudo-twofold axis of symmetry with a single hydrophobic core. Thr326 and His329 (corresponding to like residues in intein block B) are in hydrogen bonding distance to the α-amino group of Cys258 (equivalent to the intein N terminus), and mutagenesis data show that they are required for thioester formation. Active-site residue Asp303 is not needed for thioester formation, but is required for cholesterol transfer. The 198 aa Mxe GyrA mini-intein was crystallized with a 1 residue (Ala) N-extein and a Cys1 to Ser1 substitution in an attempt to capture a "pre-splicing" state (9Klabunde T Sharma S Telenti A Jacobs W.R Sacchettini J.C Nature Struct.Biol. 1998; in pressGoogle Scholar). The Mxe GyrA mini-intein has a compact β-structure, 100 aa of which are superimposable onto the Hint module fold (Figure 3). A 50 aa linker replaces the core endonuclease domain present in the 420 aa M. leprae GyrA intein allele. The DRR is absent, suggesting that it is not required for splicing. The intein termini are on adjacent antiparallel β strands in the central cleft of the horseshoe shaped β core. The Mxe GyrA mini-intein structure has confirme therole of the three nucleophiles and revealed proton donors and acceptors in the splicing pathway. The scissile peptide bond between Ala (the N extein) and Ser1 is in a destabilized, energetically unfavorable cis conformation and is held in this position by Ser1 and Thr72. The hydroxyl group of Ser1 is oriented toward the scissile peptide bond ready to attack (step 1), but no base is present to deprotonate the hydroxyl (as would be required of all inteins naturally beginning with Ser). The thiol group of the native Cys1 is deprotonated due to its lower pKa. Thr72, Asn74, and His75 (block B) would assist in thioester formation. When Thr + 1 (the N terminus of the C extein) is modeled in the structure, its hydroxyl group is in position to initiate transesterification (step 2). His197 can donate a proton to facilitate Asn cyclization, but no residue in this precursor structure is positioned to deprotonate the Asn side chain nucleophile (step 3). Ser53, Ser179, and Ser196 also assist in these reactions. The Mxe GyrA mini-intein represents a splicing element without a DRR or core endonuclease domain. Larger inteins, like the Sce VMA intein, contain homing endonucleases. The structural similarity among splicing elements and Hh-C17 can be seen in the common positioning of β strands and the large number of residues with superimposable Cα atoms (the main chain carbon atom to which the side chain is attached) (Figure 3). However, there is little sequence identity among the three proteins except for the residues involved in catalysis (Figure 1 and Figure 3). Because of the similarity of their core structures, 7Hall T.M.T Porter J.A Young K.E Koonin E.V Beachy P.A Leahy D.J Cell. 1997; 91: 85-97Abstract Full Text Full Text PDF PubMed Scopus (229) Google Scholar proposed that inteins and Hh-C17 evolved from a common precursor (Figure 4). The Hint module mediates ester/thioester formation, activating the linkage between the element and a second protein domain. Other systems, such as the Ntn hydrolase family, employ an acyl rearrangement to activate catalysis but do not have structural similarity to the Hint module (7Hall T.M.T Porter J.A Young K.E Koonin E.V Beachy P.A Leahy D.J Cell. 1997; 91: 85-97Abstract Full Text Full Text PDF PubMed Scopus (229) Google Scholar). Inteins and Hh-C subsequently evolved separate methods of cleaving this bond. Inteins evolved the ability to ligate two exteins and acquired the DRR and core endonuclease. The endonuclease allowed inteins to spread by lateral transmission. Hh-C17 acquired a sterol recognition region (SRR) directing addition of cholesterol to the hedgehog protein signaling domain, anchoring the signaling domain to the cell surface. Alternatively, the Hint module could have invaded a preexisting signaling domain/SRR element. Several nematode Hh-C domains contain unrelated C-terminal extensions that may interact with molecules other than cholesterol and have been tentatively termed adduct recognition regions (1Beachy P.A Cooper M.K Young K.E von Kessler D.P Park W Hall T.M.T Leahy D.J Porter J.A Cold Spring Harbor Symp.Quant. Biol. 1997; in pressGoogle Scholar). The order of these events is speculative, including the order of module assembly. Each of the intein and hedgehog elements may have coevolved or associated after independent formation. Endonucleases may have also been lost from inteins. As we understand the mechanism of protein splicing and Hh-C autoproteolysis, we will begin to be able to harness these elements to cleave or splice any target protein at will.