Title: Whole-Genome Sequencing Analysis Reveals High Specificity of CRISPR/Cas9 and TALEN-Based Genome Editing in Human iPSCs
Abstract: Human iPSCs provide renewable cell sources for human biology and disease research and the potential for developing gene and cell therapy. Realization of this potential will rely in part on our ability to precisely edit or engineer the human genome in an efficient way. Recent developments in designer endonuclease technologies such as zinc finger nuclease (ZFN), transcription activator-like effector nuclease (TALEN), and clustered regulatory interspaced short palindromic repeat (CRISPR)/Cas9 endonuclease have provided ways to significantly improve genome editing efficiency in human iPSCs. These endonucleases make a double-stranded break (DSB) at a predetermined DNA sequence and trigger natural DNA repair processes such as nonhomologous end joining (NHEJ) or homologous recombination (HR) with a donor DNA template. Among these existing approaches, RNA-guided CRISPR/Cas9 is the most user-friendly and versatile system, and it has been applied in both animal models and cell lines (Cong et al., 2013Cong L. Ran F.A. Cox D. Lin S. Barretto R. Habib N. Hsu P.D. Wu X. Jiang W. Marraffini L.A. Zhang F. Science. 2013; 339: 819-823Crossref PubMed Scopus (9979) Google Scholar, Hsu et al., 2014Hsu P.D. Lander E.S. Zhang F. Cell. 2014; 157: 1262-1278Abstract Full Text Full Text PDF PubMed Scopus (3606) Google Scholar, Mali et al., 2013Mali P. Yang L. Esvelt K.M. Aach J. Guell M. DiCarlo J.E. Norville J.E. Church G.M. Science. 2013; 339: 823-826Crossref PubMed Scopus (6424) Google Scholar). The most commonly used system consists of a single polypeptide endonuclease Cas9 complexed with a single guide RNA (gRNA) that provides complementarity to 20-nucleotide target DNA sequence. However, the specificity and efficiency of this approach in human iPSCs have not been studied in detail (Cong et al., 2013Cong L. Ran F.A. Cox D. Lin S. Barretto R. Habib N. Hsu P.D. Wu X. Jiang W. Marraffini L.A. Zhang F. Science. 2013; 339: 819-823Crossref PubMed Scopus (9979) Google Scholar, Ding et al., 2013Ding Q. Regan S.N. Xia Y. Oostrom L.A. Cowan C.A. Musunuru K. Cell Stem Cell. 2013; 12: 393-394Abstract Full Text Full Text PDF PubMed Scopus (395) Google Scholar, Mali et al., 2013Mali P. Yang L. Esvelt K.M. Aach J. Guell M. DiCarlo J.E. Norville J.E. Church G.M. Science. 2013; 339: 823-826Crossref PubMed Scopus (6424) Google Scholar, Yang et al., 2013Yang L. Guell M. Byrne S. Yang J.L. De Los Angeles A. Mali P. Aach J. Kim-Kiselak C. Briggs A.W. Rios X. et al.Nucleic Acids Res. 2013; 41: 9049-9061Crossref PubMed Scopus (294) Google Scholar). Some analyses using cancer cell lines reported higher-than-expected levels of off-target mutagenesis by Cas9-gRNAs (Fu et al., 2013Fu Y. Foden J.A. Khayter C. Maeder M.L. Reyon D. Joung J.K. Sander J.D. Nat. Biotechnol. 2013; 31: 822-826Crossref PubMed Scopus (2208) Google Scholar, Hsu et al., 2013Hsu P.D. Scott D.A. Weinstein J.A. Ran F.A. Konermann S. Agarwala V. Li Y. Fine E.J. Wu X. Shalem O. et al.Nat. Biotechnol. 2013; 31: 827-832Crossref PubMed Scopus (3023) Google Scholar), raising concerns about the practical applicability of this approach in therapeutic contexts. Some recent studies, including one on human adult stem cells, showed a minimal level of off-target effects by CRISPR/Cas9 (Schwank et al., 2013Schwank G. Koo B.K. Sasselli V. Dekkers J.F. Heo I. Demircan T. Sasaki N. Boymans S. Cuppen E. van der Ent C.K. et al.Cell Stem Cell. 2013; 13: 653-658Abstract Full Text Full Text PDF PubMed Scopus (968) Google Scholar). However, these existing analyses of off-target effects and mutational load in gene-corrected stem cells have been restricted to checking predicted off target sites and are therefore limited in scope. To assess the value of this type of gene editing approach for therapeutic applications, it is critical to rigorously examine whether it is possible to generate gene-edited cell lines with minimal mutational load. To this end, we have conducted whole-genome sequencing of four iPSC clones successfully targeted at the AAVS1 locus, a “safe harbor” in the human genome that is used for stable transgene expression in a variety of contexts. To generate the lines, we used an integration-free human iPSC line, BC1, whose genomic integrity has been characterized in detail by next-generation sequencing (Cheng et al., 2012Cheng L. Hansen N.F. Zhao L. Du Y. Zou C. Donovan F.X. Chou B.K. Zhou G. Li S. Dowey S.N. et al.NISC Comparative Sequencing ProgramCell Stem Cell. 2012; 10: 337-344Abstract Full Text Full Text PDF PubMed Scopus (197) Google Scholar) and targeted a GFP expression cassette into the AAVS1 site with either a previously reported Cas9-gRNA combination or a pair of improved heterodimeric TALENs (Mali et al., 2013Mali P. Yang L. Esvelt K.M. Aach J. Guell M. DiCarlo J.E. Norville J.E. Church G.M. Science. 2013; 339: 823-826Crossref PubMed Scopus (6424) Google Scholar, Yan et al., 2013Yan W. Smith C. Cheng L. Sci Rep. 2013; 3: 2376PubMed Google Scholar) (Table S1 and Supplemental Experimental Procedures available online). Twenty days after transfection of the donor plasmid and either the TALENs or Cas9-gRNA into BC1, we harvested four clones with confirmed targeted integration (hCas9-C4, hCas9-C16, TALEN-C3, and TALEN-C6; Table S1 and Supplemental Experimental Procedures) and the parental BC1 iPSCs for whole-genome sequencing. The sequencing reads, ranging from 83 Gbps to 100 Gbps from each targeted clone, were first aligned to the human hg19 reference genome to enable identification of single-nucleotide variants (SNVs) and small indels (Table S1). Our analysis identified ≥4.2 million SNVs and ≥500,000 indels in each genome (Table S1) in comparison to the hg19 reference genome, suggesting that it is a rigorous data set that covers the genome in sufficient depth to detect sequence variants. The “germline” variants (present in BC1 parental iPSCs and different from hg19) were readily detectable in each targeted cell line (80%%–88%), indicating that the sensitivity of variant detection in our analysis is high (Table S1). The variations from each targeted clone were then compared to the BC1 parental iPSCs to enable the generation of a list of potential variations arising during the gene editing process, which we then confirmed using genomic PCR and Sanger sequencing. We confirmed 62 out of 69 SNVs tested for an overall confirmation rate of 90%, and based on that we estimate that the total SNVs in the four iPSC clones range between 217 and 281 and that the total indels range between 7 and 12 (Table S1). Overall the genomic variation levels in TALEN- and Cas9-targeted groups were comparable. One important consideration is how many of these detected SNVs and indels were the results of off-target mutagenesis by the engineered endonucleases. To address this question, we generated a list of 3,665 (Cas9) and 238 (TALEN) putative off-target positions by using the EMBOSS fuzznuc software package. Each candidate SNV and indel was compared to this list and none of them are within a potential off-target region (Table S1), consistent with previous analyses looking at predicted off-target sites. Our analysis also shows that each SNV and indel is unique and that none of them occurred in more than one cell line. The absence of recurring mutations and the fact that none of the mutations resides in any putative off-target site by bioinformatic prediction strongly suggest that these mutations were randomly accumulated during regular cell expansion and are not direct results of off-target activities by Cas9 or TALENs. Our results from whole-genome sequencing analysis of Cas9- and TALEN-targeted human iPSC clones demonstrate that these engineered endonucleases provide efficient genome-editing tools with high specificity. It remains to be clarified whether the higher off-target rates observed in cancer cell lines are due to the overexpression of gRNAs and Cas9 protein and/or due to exacerbated and faulty DNA repair in these cell types. The higher specificity observed in human iPSCs, combined with the rapid development of next-generation sequencing technology, makes it possible to characterize and isolate high quality genome-edited stem cell clones with minimal mutational load. The guiding principle established with human iPSCs will likely be applicable to other types of stem cells and come with improvements in gene transfer and targeting efficiencies. Our current study of gene targeting in human iPSCs will help to establish better models for human biology and disease research and to provide proof-of-principle for future gene therapy. The authors thank Dr. Prashant Mali for helpful discussions. This work was supported in part by grants from Maryland Stem Cell Research Funds (2011-MSCRFE-0087 and 2009-MSCRFII-0047) and by NIH (2R01-HL073781 and U01-HL107446). L.A.-A. received a fellowship from La Caixa Foundation (Spain). The WGS data can be accessed at the NCBI SRA database with the accession number SRP042612. Download .pdf (.14 MB) Help with pdf files Document S1. Supplemental Experimental Procedures Download .xlsx (.19 MB) Help with xlsx files Table S1. Whole-Genome Sequencing Analysis of Human iPSC Clones after Cas9-Mediated and TALEN-Mediated Homologous Recombination at the AAVS1 LocusThe complete WGS data can be accessed at the NCBI SRA database with the accession number SRP042612. The further-analyzed data are presented in the following sheets. (1) Summary of WGS Results: Four targeted iPSC clones (hCas9-C4 and -C16 and TALEN-C3 and -C6), which were confirmed for targeted integration by both PCR and Southern blot, were selected for whole-genome sequencing in comparison to parental BC1 iPSCs. The total SNVs and Indels in each clone were calculated based on both informatic analysis and the Sanger sequencing confirmation rates. None of the variants identified in the targeted clones reside in any potential off-target region. No recurring mutations were detected. (2) Variant Details: A similar number of SNVs and Indels were identified in each cell line. The “germline” variants (that are present in BC1 parental iPSCs and different from those in the hg19 reference genome) were highly detectible in each edited cell line, indicating that the sensitivity to detect a variation is high. (3) Point Mutations: The list of the SNV variants (compared to BC1 parental iPSCs) identified in each targeted clone is presented. (4) Indels: The list of the small insertions/deletions (compared to BC1 parental iPSCs) identified in each targeted clone is presented. (5) Sanger Validation: A subset of candidate variations was randomly selected when we generated a random number for each SNV/Indel in a Microsoft Excel file using the RAND function, and it was validated using Sanger sequencing to ensure mutation call accuracy. The verification rate for each cell line was used to calculate the estimated total SNVs (validation rate × candidate SNVs = calculated SNVs) shown in “Summary of WGS Results.” (6) Gene Targeting Timeline: BC1 iPSCs were conucleofected by a donor plasmid AAV-CAGGS-EGFP with either TALENs or Cas9-gRNA. Nucleofected cells were cultured at a low density for 4 days before puromycin selection. Individual puromycin-resistant clones picked at day 11 were further expanded for an additional 9 days. Genomic DNAs isolated at day 20 were used for analyses of targeted integration events and for whole-genome sequencing. (7) Targeting Diagram: Diagrams of the donor plasmid AAV-CAGGS-EGFP, the native AAVS1 (PPP1R12C) locus structure, and the genomic structure after targeted integration of GFP expression cassette (GFP KI). Primers for PCR amplification of targeted allele (red), untargeted allele (blue), and random integration (purple) were shown. DNA probes for Southern analysis of SphI (S) digested genomic DNA were also indicated.