Title: The crystal structure of a novel glucosamine‐6‐phosphate deaminase from the hyperthermophilic archaeon <i>Pyrococcus furiosus</i>
Abstract: N-acetylglucosamine is an important building block for structural polysaccharides in several organisms. In the metabolism of amino sugar, the conversion between glucosamine 6-phosphate (GlcN6P) and fructose 6-phosphate (Fru6P) is a key step in both anabolic and catabolic directions.1 The anabolic pathway reaction is mediated by GlcN6P synthase (GlmS) that catalyzes an irreversible formation of GlcN6P. GlmS is a glutamine-dependent amidotransferase family member, which consists of an N-terminal glutamine amide transferase domain and a C-terminal isomerase domain that utilizes ammonia for the amination of Fru6P and isomerizes to give GlcN6P.2 It is known that GlmS cannot utilize free ammonia as a nitrogen donor. It has been reported that the C-terminal isomerase domain alone does not show ammonia-dependent glucosamine synthesis activity.3 On the other hand, the catabolic degradation of GlcN6P is mediated by GlcN6P deaminase (GlmD).4 GlmD catalyzes the deamination and isomerization reactions from GlcN6P to Fru6P with the concomitant release of ammonia, and the enzyme can also catalyze the reverse reaction in the presence of a high concentration of ammonia. Although, GlmS and GlmD display similar catalytic reactions, there is no relationship between the primary structures of GlmS and GlmD. There have been many studies on GlmD from eukarya and bacteria.5-7 The corresponding enzyme from archaea; however, has not been reported. Archaeal genomes do not harbor any genes homologous to GlmD. Recently, a gene cluster responsible for chitin degradation was cloned from the hyperthermophilic archaeon Thermococcus kodakaraensis, and gene encoding for GlcN6P deamination was found within the cluster.8 This archaeal GlmD shows a novel type of GlmD distinct from previously known enzymes. It functions as a dimer and displays a certain sequence which is similar to the sugar isomerase domain of GlmS rather than other GlmDs, which have hydrogenase-like, six-stranded fold and act as a hexamer. Although, this archaeal GlmD is similar to the isomerase domain of GlmS, it kinetically prefers the deamination of GlcN6P rather than the reverse ammonia-dependent amination of Fru6P.8 A complete genome of another hyperthermophilic archaeon, Pyrococcus furiosus DSM 3638, has been sequenced and a conserved gene cluster for chitin degradation and an archaeal GlmD gene homologue have been found.9 Therefore, this type of GlmD would be common in archaebacteria and its molecular structure could be helpful in understanding the functional differences of archaeal GlmDs to those from eukarya and bacteria. Here we present the crystal structure of GlmD from P. furiosus determined by single-wavelength anomalous dispersion (SAD) and show its putative active site. The gene encoding GlmD (glmD) from P. furiosus was amplified by a polymerase chain reaction (PCR) from genomic DNA using a primer pair encoding the predicted 5′- and 3′-ends of glmD. The amplified fragments were digested with NdeI and XhoI and ligated into the expression vector pET30a. The DNA sequence of the inserted fragment in the recombinant plasmid (pET30a:glmD) was confirmed and then, the plasmid was introduced into the methionine auxotroph E. coli strain B834 (Novagen) for the over-expression of the SeMet substituted protein. An overgrown culture of 20 mL was inoculated into 2 L of a minimal medium supplemented with 50 μg/mL of SeMet and 30 μg/mL of kanamycin. The cells were grown at 37°C. When the culture reached a log phase (OD600 = 0.6), the expression of the GlmD protein was induced by the addition of isopropyl β-D-thiogalactoside to a final concentration of 1 mM. The cells continued to grow for an additional 16 h at 22°C. Harvested cells were resuspended in Buffer A (20 mM Tris-HCl, pH 8.0 containing 5 mM β-mercaptoethanol) and disrupted by sonication on ice. After centrifugation, the soluble supernatant was incubated for 10 min at 85°C and then a heat-stable protein solution was obtained. To purify the His-tagged recombinant protein, this solution was loaded on to a Ni-NTA agarose (QIAGEN) column and the recombinant proteins were eluted with Buffer A containing 300 mM imidazole. For further purification, the eluted fraction was applied on to a HiTrap Q and then size exclusion chromatography was performed with a Superdex G-200 pre-equilibrated with Buffer A. The purified SeMet-substituted protein contains hexahistidine-tag (LEHHHHHH) at the carboxy-terminus. The recombinant GlmD protein was concentrated and subjected to initial crystallization screening at 239 K using a hanging drop-vapor diffusion method with commercially available crystal screening solution sets from Hampton Research and Emerald Biostructures. Crystals were obtained under two conditions. One was 20% PEG 1000 in 0.1M sodium/potassium phosphate pH 6.2 containing 0.2M sodium chloride, and another was 35% (v/v) tert-butanol in 0.1M sodium citrate pH 5.6. After optimization, the best crystals were obtained from a drop containing 2 μL of 40 mg/mL protein and 2 μL of reservoir solution (24% PEG 1000, 0.2M sodium chloride in 0.1M sodium/potassium phosphate pH 6.25) using the sitting-drop vapor diffusion method. One of the best crystals was transferred to a cryo-protecting solution containing 10% glycerol in the reservoir solution and mounted to a beam. The crystal was found to belong to space group C2221 with unit cell parameters of a = 61.48, b = 93.89, and c = 200.64 Å, where the asymmetric unit contained two molecules and the value of the Matthews coefficient VM was 1.9 Å3/Da, corresponding to a solvent content of 35.3%. Diffraction data were collected at the peak wavelength (0.97930 Å) with 1° oscillation using ADSC Quantum 210 CCD area detector at 4A beamline (MXW) at Pohang Accelerator Laboratory (PAL, Pohang, Korea). The diffraction data were integrated and scaled using the HKL2000 suite.10 Although, the completeness of the highest resolution shell (1.86–1.80 Å) is 26% because of the corner area of the square shape detector, we used all data up to 1.8 Å resolution for structural refinement since the diffraction spots on the area were still useful (I/σ = 2.7 and Rsymm = 24.7%). We, however, used data up to 2.3 Å resolution with 95.3% of the completeness for finding selenium sites. The asymmetric unit contains two polypeptides with 7 SeMet atoms/chain. Twelve of the 14 Se atoms in the unit were identified using the program SOLVE,11 and these sites were further analyzed with all data up to 1.8 Å resolution. The overall figure of merit (FOM) was 0.380. The density map was subjected to density modification using the RESOLVE program,12 which increased FOM to 0.692 with 85% (567 of 666 amino acid residues) of the residues built. Further model building was performed manually using the program O,13 and refinement with isotropic displacement parameters was performed with CCP4 refmac5.14 Crystallographic data statistics are summarized in Table I. The final model has been deposited in the Protein Data Bank (PDB) under the PDB ID code 2CB0. The crystal structure contains two polypeptide chains, 290 water molecules and one glycerol. The final Rwork and Rfree of the model were 17.0 and 21.9%, respectively. We found 320 residues (1–320) from one polypeptide chain (Chain A) and all residues (1–333) of the other one (Chain B), including the C-terminal hexa-histidine tag. Both polypeptide chains were identical regarding experimental error (0.304 Å of RMSD between Cα atoms). The average isotropic temperature factors (B factor) of Chains A and B are 14.5 and 21.4 Å2, respectively. The electron density for the C-terminal residues (321–333) of Chain A is not defined despite its overall lower overall B factor value than that of Chain B, which contains a clearly identified C-terminal tail in the electron density map. It implies that the extreme C-terminal coil region, after D320, could be easily disordered. Except for two Phe318s, all the residues from both chains excluding prolines and glycines are in the most favorable or additionally allowed regions in the Ramachandran plot. Phe318 from Chain A is in generously allowed region (Phi and Psi angles are 72.8° and −17.3°, respectively) and the other from Chain B is in disallowed region (Phi and Psi angles are 76.0° and −33.0°, respectively). P. furiosus GlmD reveals a bilobal structure and it consists of two topologically identical subdomains of similar size [Fig. 1(A)]. A short coil region links the N-terminal (1–154) and C-terminal (161–309) subdomains and a long coil structure (310–325) follow the C-terminal subdomain. Each subdomain has an α/β structure and is dominated by a five-stranded parallel β sheet flanked on either side by α helices. The strand orders of the sheets are 5-4-3-1-2 (N-terminal subdomain) and 10-9-8-6-7 (C-terminal subdomain). This folding is a five-stranded flavodoxin type and is similar to the glucose 6-phosphate isomerase and the isomerase domain of GlmS.15 Structures of glucosamine 6-phosphate deaminase from P. furiosus. (A) A polypeptide chain consisted of two similar five-stranded subdomains and C-terminal tail. N-terminal subdomain (red and yellow) and C-terminal subdomain (blue and magenta) were linked by short loop (gray) were followed by long C-terminal coil region (gray). (B) Twofold symmetrical interaction of two polypeptide chains (green and blue). A helix (yellow) from one chain interacts a cleft between N- and C-terminal subdomains of another chain (B_Nt and B_Ct) forming an active site (red circle). (C) A substrate, glucosamine 6-phosphate (yellow) from crystal structure of the isomerase of E. coli GlmS was superimposed on the active site of GlmD consisted of both monomers (green and blue). (D) C-terminal tail covers the substrate-binding site between two monomers (blue and green). Glucosamine 6-phosphate is placed to the binding pocket. Residue from other than a molecule containing sugar-binding site are indicated by asterisk. Residues D320, R324, and W325 from C-terminal region interact to K230, R226, and W249 from another monomer. During the purification steps of P. denitrifican GlmD, the gel filtration chromatography indicated that the molecular mass of the eluted protein was about the double size of a recombinant GlmD, which was estimated to be 38 kDa by SDS–PAGE. It implies that the dimeric formation of the enzyme is in the solution. GlmD from T. kodakaraensis was also reported as a homodimer.8 Other published GlmDs from eukarya and bacteria were mostly hexameric and recently, a monomeric form from Bacillus subtilis has been reported.16, 17 In the refined crystal structure of P. denitrifican GlmD, there are two polypeptide chains in the asymmetric unit and those bind to each other to form a globular shape [Fig. 1(B)]. A helix region, between strands 7 and 8 of the C-terminal subdomain, is placed into a groove between two subdomains of the other monomer, which form an active site. The corresponding region from the N-terminal subdomain also interacts in the same fashion in the opposite side. This noncrysallographic dimer of GlmD in the asymmetric unit could suggest GlmD's dimeric interaction. This dimerization is similar to that of the GlmS isomerase domains.18 The amino acid sequence of GlmD from P. furiosus was compared with those of GlmD from T. kodakaraensis and of the isomerase domains of GlmS from T. kodakaraensis and E. coli (Fig. 2). GlmD sequence of P. furiosus has 62% of amino acid sequence identity to that of T. kodakaraensis while it has 21% and 25% of identities to the sequences of GlmS isomerase domains of T. kodakaraensis and E. coli, respectively. With the crystal structure of E. coli GlmS with GlcN6P,15 the residues involved in the reaction and the binding of GlcN6P were revealed. Those residues are well conserved in GlmDs. S347, S349, and T352 for GlcN6P binding in the GlmS, are conserved at S87, S89, and T92 in GlmD. Important amino acid residues for enzymatic activity including E488, H504, and K603 of E.coli GlmS19 are conserved at E211, H227, K321, respectively. H227, which is known to be involved in the ring-opening reaction, comes from another monomer in dimer. The similarity in sequences between the GlmD and isomerase domains of the GlmS suggests that the archaeal deaminase may come from its GlmS. Sequence alignment and analysis suggest that the active site of the GlmD would be similar to that of the isomerase domain of the GlmS. We placed GlcN6P on the structure of P. furiosus GlmD based on the superimposition of the GlmD structure on the complex structure of E.coli GlmS isomerase domain and GlcN6P [Fig. 1(C)]. GlmD has a suitable cavity for the binding of GlcN6P between the N-terminal subdomain of one monomer and the C-terminal subdomains of another monomer, and it is covered with the C-terminal tail. The cavity is surrounded by conserved serine and threonine residues from the N-terminal subdomain and conserved histidine residue at the helix between strands 7 and 8 from the C-terminal subdomain. Hydroxyl groups of S87, T92, and S42 are involved in the binding to the phosphate group of GlcN6P. The amino group of GlcN6P has hydrogen-bonded to the carbonyl oxygens of V133 from the N-terminal subdomain and of D320 from the C-terminal coil. Despite the twofold pseudosymmetry of the GlmD monomer containing two subdomains, it has only one active site since the C-terminal coil region is also required. The multiple sequence alignment of glucosamine 6-phsophate deaminases (GlmD) with glucosamine 6-phosphate synthases (GlmS). Amino acid sequences of the GlmD (1–325) from P. furiosus (Pfu_glmD), GlmD (1–326) and iosmerase domain (252–602) of GlmS from T. kodakaraensis (Tko_Glms and Tko_GlmD), and isomerase domain (249–608) of GlmS from E. coli (Eco_GlmS) were aligned. Numbering shown is from P. furiosus GlmD. Shown above alignments are elements of secondary structure of GlmD from P. furiosus. Residues involved in glucosamine 6-phsophate binding are marked with closed blue triangles. Multiple alignment was done using the T-coffe software and visualized using ESPript softwate both located on the ExPASy Proteomics Server (http://au.expasy.org/). In GlmS, the C-terminal tail contributes to the formation of the sugar-binding site, and it provides the catalytically-essential lysine residue. The sequence of this region is well conserved among GlmSs. The consensus sequence, DXPXXLAK(C/S)VT, is considered to be a fingerprint of these enzymes.15 As compared in Figure 2, the sequence of the C-terminal tail region of GlmD, DNPRFLDKVVRW, also is very similar to the consensus sequence, including the catalytically important conserved K321. This indicates that this region of GlmD plays similar roles for sugar isomerization and nitrogen transferring activity. There are, however, minor sequence differences between the GlmD and the consensus sequence. Among them, D320 and W325 of GlmD are involved in the interaction of the C-terminal tail region over the active site [Fig. 1(D)]. They may confer different enzymatic activity of GlmD, which is a kinetically favored deamination reaction. Residues F318 from both chains locate out of favored regions in the Ramachandran plot. ϕ-Psi angles for F318 from molecule B (76.0° and −33.0°) are more extreme than those of molecule A. The value of molecule B may be due to the position of the C-terminal tail region. There is strong interaction including an ionic interaction of the carboxyl group of D320 to the amino group of K230 from another monomer [Fig. 1(D)]. The carboxyl group is also hydrogen-bonded to the guanidine group of R226 from molecule A via a water molecule. This might push the C-terminal coil to turn tightly at F318 when covering the substrate-binding cleft between the two monomers. In molecule B, all C-terminal residues are clearly shown in the electron density map. The carbonyl group of R324 from chain B has a hydrogen-bond to the side chain of R226 from chain A. The indole side chains of C-terminal W325 of molecule B and W249 from molecule A are stacked against each other. Such interaction helps the C-terminal coil region to cover the substrate-binding pocket. Unlikely molecule B, the C-terminal residues of molecule A are invisible after D320, indicating that the disordered C-terminus leaves the active site open. Therefore, molecules A and B may represent open and closed active sites, respectively. In GlmD's dimeric form, there is no open gate to enter the substrate or to release the product because the C-terminal tail region covers the active site. This substrate-binding site only could be exposed to the solvent by the separation of the two monomers or by the removal of the C-terminal tail region. Since dimerization includes interaction over a large protein area, it could be easier for a GlmD dimer to open the active site by the C-terminus, which covers the active site just as molecule A does, instead of the monomerization of the dimeric enzymes. Here, we determined the first crystal structure of archaeal GlmD from P. furiosus. The dimeric structure of this bilobal flavodoxin-like fold revealed structural differences between archaeal GlmD and other GlmDs from eukarya or bacteria, and similarities of archaeal GlmD to the isomerase domain of GlmS. Aside from that, different C-terminal positions of the two monomers in the crystal structure suggest an opening mechanism for the active site in GlmD.