Title: The mutational landscape of the <scp>SCAN</scp> ‐B real‐world primary breast cancer transcriptome
Abstract: Report14 September 2020Open Access Transparent process The mutational landscape of the SCAN-B real-world primary breast cancer transcriptome Christian Brueffer Christian Brueffer orcid.org/0000-0002-3826-0989 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Sergii Gladchuk Sergii Gladchuk Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Christof Winter Christof Winter orcid.org/0000-0002-0253-9056 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Johan Vallon-Christersson Johan Vallon-Christersson Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Cecilia Hegardt Cecilia Hegardt Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Jari Häkkinen Jari Häkkinen orcid.org/0000-0002-8466-9179 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Anthony M George Anthony M George Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Yilun Chen Yilun Chen Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Anna Ehinger Anna Ehinger Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Pathology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Christer Larsson Christer Larsson Lund University Cancer Center, Lund, Sweden Division of Molecular Pathology, Department of Laboratory Medicine, Lund University, Lund, Sweden Search for more papers by this author Niklas Loman Niklas Loman Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Oncology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Martin Malmberg Martin Malmberg Department of Oncology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Lisa Rydén Lisa Rydén Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Surgery, Skåne University Hospital, Lund, Sweden Search for more papers by this author Åke Borg Åke Borg Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Lao H Saal Corresponding Author Lao H Saal [email protected] orcid.org/0000-0002-0815-1896 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Christian Brueffer Christian Brueffer orcid.org/0000-0002-3826-0989 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Sergii Gladchuk Sergii Gladchuk Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Christof Winter Christof Winter orcid.org/0000-0002-0253-9056 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Johan Vallon-Christersson Johan Vallon-Christersson Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Cecilia Hegardt Cecilia Hegardt Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Jari Häkkinen Jari Häkkinen orcid.org/0000-0002-8466-9179 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Anthony M George Anthony M George Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Yilun Chen Yilun Chen Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Search for more papers by this author Anna Ehinger Anna Ehinger Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Pathology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Christer Larsson Christer Larsson Lund University Cancer Center, Lund, Sweden Division of Molecular Pathology, Department of Laboratory Medicine, Lund University, Lund, Sweden Search for more papers by this author Niklas Loman Niklas Loman Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Oncology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Martin Malmberg Martin Malmberg Department of Oncology, Skåne University Hospital, Lund, Sweden Search for more papers by this author Lisa Rydén Lisa Rydén Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden Department of Surgery, Skåne University Hospital, Lund, Sweden Search for more papers by this author Åke Borg Åke Borg Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Lao H Saal Corresponding Author Lao H Saal [email protected] orcid.org/0000-0002-0815-1896 Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden Lund University Cancer Center, Lund, Sweden CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden Search for more papers by this author Author Information Christian Brueffer1,2, Sergii Gladchuk1,2, Christof Winter1,2,8, Johan Vallon-Christersson1,2,3, Cecilia Hegardt1,2,3, Jari Häkkinen1,2, Anthony M George1,2, Yilun Chen1,2, Anna Ehinger1,2,4, Christer Larsson2,5, Niklas Loman1,2,6, Martin Malmberg6, Lisa Rydén1,2,7, Åke Borg1,2,3 and Lao H Saal *,1,2,3 1Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden 2Lund University Cancer Center, Lund, Sweden 3CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden 4Department of Pathology, Skåne University Hospital, Lund, Sweden 5Division of Molecular Pathology, Department of Laboratory Medicine, Lund University, Lund, Sweden 6Department of Oncology, Skåne University Hospital, Lund, Sweden 7Department of Surgery, Skåne University Hospital, Lund, Sweden 8Present address: Institut für Klinische Chemie und Pathobiochemie, Klinikum rechts der Isar, Technische Universität München, München, Germany *Corresponding author. Tel: +46 46 2220365: E-mail: [email protected] EMBO Mol Med (2020)12:e12118https://doi.org/10.15252/emmm.202012118 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Breast cancer is a disease of genomic alterations, of which the panorama of somatic mutations and how these relate to subtypes and therapy response is incompletely understood. Within SCAN-B (ClinicalTrials.gov: NCT02306096), a prospective study elucidating the transcriptomic profiles for thousands of breast cancers, we developed a RNA-seq pipeline for detection of SNVs/indels and profiled a real-world cohort of 3,217 breast tumors. We describe the mutational landscape of primary breast cancer viewed through the transcriptome of a large population-based cohort and relate it to patient survival. We demonstrate that RNA-seq can be used to call mutations in genes such as PIK3CA, TP53, and ERBB2, as well as the status of molecular pathways and mutational burden, and identify potentially druggable mutations in 86.8% of tumors. To make this rich dataset available for the research community, we developed an open source web application, the SCAN-B MutationExplorer (http://oncogenomics.bmc.lu.se/MutationExplorer). These results add another dimension to the use of RNA-seq as a clinical tool, where both gene expression- and mutation-based biomarkers can be interrogated in real-time within 1 week of tumor sampling. Synopsis A bioinformatics pipeline was developed for detection of single nucleotide variants and small insertions/deletions from RNA sequencing (RNA-seq) data. The mutational landscape of 3,217 primary breast cancer transcriptomes in relation to patient survival was made available through a public web portal. An optimized pipeline for detection of single nucleotide variants and short insertions and deletions from RNA-seq data was developed and applied to 3,217 primary breast tumors. The mutational portraits identified mutations in clinically important genes, including mutations in one or more potentially druggable genes in 85.3% percent of cases. Mutational portraits revealed significant relationships to patient outcome within specific treatment groups, including treatment resistance mutations. This rich dataset was made publicly available via our open source web-based application, the SCAN-B MutationExplorer, accessible at http://oncogenomics.bmc.lu.se/MutationExplorer. The paper explained Problem Breast cancer is a disease of genomic alterations, of which the complete panorama of somatic mutations and how these relate to molecular subtypes, therapy response, and clinical outcomes is incompletely understood. RNA sequencing is a powerful technique for profiling tumor transcriptomes; however, using it for reliable detection of single nucleotide variants and small insertions and deletions is challenging. Results Within the Sweden Cancerome Analysis Network-Breast project (SCAN-B; ClinicalTrials.gov NCT02306096), we developed an optimized bioinformatics pipeline for detection of single nucleotide variants and small insertions and deletions from RNA-seq data. From this, we describe the mutational landscape of 3,217 primary breast cancer transcriptomes and relate it to patient overall survival in a real-world setting (median follow-up 75 months, range 2–105 months). We demonstrate that RNA-seq can be used to call mutations in important breast cancer genes such as PIK3CA, TP53, ESR1, and ERBB2, as well as mutation status of key molecular pathways and tumor mutational burden. We identify mutations in one or more potentially druggable genes in 86.8% of cases and reveal significant relationships to patient outcome within specific treatment groups, such as occurrence of mutations inducing resistance to standard of care drugs in untreated patients. To make this rich and growing mutational portraiture of breast cancer available for the wider research community, we developed an open source interactive web application, SCAN-B MutationExplorer, publicly accessible at http://oncogenomics.bmc.lu.se/MutationExplorer. Impact These results add another dimension to the use of RNA-seq as a potential clinical tool, where both gene expression-based signatures and gene mutation-based biomarkers can be interrogated simultaneously and in real-time within 1 week of tumor sampling. Treatment resistance mutations can be detected in early disease and could inform clinical decision-making. Introduction Mutations in the cancer genome, including single nucleotide variants (SNVs) and small insertions and deletions (indels), can shed light on cancer biology, tumor evolution and susceptibility or resistance to therapeutic agents (The Cancer Genome Atlas, 2012; Bose et al, 2013; Robinson et al, 2013). Mutations can now even be used to track circulating tumor DNA in the blood of patients (Garcia-Murillas et al, 2015; Förnvik et al, 2019). In recent years, the characterization of the mutational landscape of breast cancer has been performed primarily on the DNA level (The Cancer Genome Atlas, 2012; Cheng et al, 2015; Ciriello et al, 2015). Adoption of massively parallel RNA sequencing (RNA-seq) as a clinical tool has been slower, despite several complementary advantages over DNA-seq. In addition to gene and isoform expression profiling and detection of de novo transcripts such as fusion genes, RNA-seq can approximate classical DNA-seq capabilities in the detection of SNVs, indels, as well as structural variants (Ma et al, 2018) and coarse copy number (preprint: Talevich & Shain, 2018). This makes RNA-seq an excellent tool for biomarker development (Brueffer et al, 2018) and potential clinical deployment (Byron et al, 2016; Cieślik & Chinnaiyan, 2018). For these reasons, among others, in 2010, the Sweden Cancerome Analysis Network–Breast (SCAN-B) initiative (ClinicalTrials.gov ID NCT02306096) selected RNA-seq as the primary analytic tool (Saal et al, 2015; Rydén et al, 2018). SCAN-B is a prospective real-world and population-based multicenter study with the aim of developing, validating, and clinically implementing novel biomarkers. To this end, SCAN-B collects tumor tissue and blood samples from enrolled patients with a diagnosis of primary breast cancer (BC). To date, over 15,000 patients have been enrolled, and messenger RNA (mRNA) sequencing is performed on patient tumors within 1 week of surgery. All patients are treated uniformly according to the Swedish national standard of care regimen. Expression profiling is an excellent tool to develop gene signatures for established and novel biomarkers (Sotiriou et al, 2006; Roepman et al, 2009; Brueffer et al, 2018), and many such signatures can be applied to a single RNA-seq dataset. However, for the detection of SNVs and indels from RNA-seq data, there are several challenges. Unlike DNA-seq, where whole-genome or targeted sequencing reads are distributed approximately uniformly and in proportion to DNA copy number, the abundance of reads in RNA-seq is proportional to the expression of each gene or locus. Consequently, only variants in expressed transcripts of sufficient level can be detected. In cancer, this means that variants in oncogenes can likely be detected, whereas those in tumor suppressor genes, e.g., TP53, BRCA1, or BRCA2, are more likely to be missed. For example, mutations inducing premature stop codons can lead to nonsense-mediated decay, causing loss of expression and subsequently false-negative calls. The transcriptome is also more complex and challenging than the genome. RNA structures, such as alternative splicing, add computational challenges to alignment, and RNA editing can contribute to false-positive variant calls. Finally, there is the lack of benchmark datasets for RNA-seq, as are available for DNA from the Genome in a Bottle consortium and others (Zook et al, 2016; Li et al, 2018). The aim of this study was to optimize RNA-seq somatic mutation calling through comparison to matched targeted DNA-seq, discern the mutational landscape of the early breast cancer transcriptome across a large cohort of 3,217 treatment-naïve SCAN-B cases with sufficient follow-up time, and to make the resulting vast dataset available for exploration by the wider research community. To demonstrate the power of the methodology and dataset, we assessed the impact of mutations in important breast cancer driver genes and pathways, as well as tumor mutational burden (TMB) on patient overall survival (OS). Results An outline of the study design, which comprised DNA sequencing and RNA sequencing of 275 samples from the ABiM cohort, and RNA sequencing of 3,217 samples from the SCAN-B cohort, is shown in Fig 1. Figure 1. Study designStudy design flow diagram for DNA-seq-informed optimization of RNA-seq variant calling. Download figure Download PowerPoint Variant filter performance Mutation calling in the 275 sample ABiM cohort resulted in 3,478 somatic post-filter mutations from the matched tumor/normal targeted capture DNA, and 1,459 variants from tumor RNA-seq in the DNA capture regions (Table 1 and Fig EV1A). Comparing these DNA and RNA variants resulted in 1,132 mutations that were present both in DNA and RNA in the capture regions and whose frequencies were generally in line with previous studies such as The Cancer Genome Atlas (TCGA) (The Cancer Genome Atlas, 2012) (Fig EV1B). Of the 1,459 RNA-seq variants, 884 (60.6%) were identified as somatic in DNA, 248 (17.0%) as germline in DNA, and 327 (22.4%) as unique to RNA. These RNA-unique variants are a mix of somatic mutations missed in DNA-seq, e.g., due to regional higher sequencing coverage in RNA-seq or tumor heterogeneity, unfiltered RNA editing sites, or artifacts caused by PCR, sequencing, or alignment and variant calling. Table 1. Number of mutations in the ABiM (DNA-seq and RNA-seq) and SCAN-B (RNA-seq) cohorts Cohort Source Coverage Total mutations SNVs Insertions Deletions Samples with mutations Mutations per sample ABiM DNA Capture regions 3,478 3,173 50 173 274 12.7 ABiM RNA Capture regions 1,459 1,304 57 98 265 5.5 ABiM RNA Whole mRNA 16,683 15,764 235 684 275 60.7 SCAN-B RNA Whole mRNA 144,593 141,095 1,112 2,386 3,217 44.9 Sample numbers differ from total cohort sizes due to filtering resulting in samples with no remaining post-filter mutations. Click here to expand this figure. Figure EV1. Overview of frequently mutated genes in targeted DNA-seq and RNA-seq across 275 ABiM samples A, B. Waterfall plot of the 20 most mutated genes (rows) across 275 ABiM samples (columns) in (A) targeted DNA-seq and (B) RNA-seq. Genes are ranked by variant frequency. Samples are sorted by histological subtype and alteration occurrence. Mutations are colored by predicted functional impact. Download figure Download PowerPoint Landscape of somatic mutations in the breast cancer transcriptome We applied the filters derived from the 275 sample set to the entire RNA-seq SCAN-B 3,217 sample set, resulting in 144,593 total variants comprised of 141,095 SNVs, 1,112 insertions, and 2,386 deletions (Table 1). The number of mutations per sample in the SCAN-B set was lower compared to the ABiM set, likely due to the ABiM set being sequenced to a higher depth (Table EV1). The SNVs comprised 50,270 missense, 2,311 nonsense, 1,042 splicing, 68,819 affecting 3′/5′ untranslated regions (UTRs), 17,057 synonymous mutations, as well as 1,596 mutations predicted otherwise. The majority of indels were predicted to cause frameshifts or affect 3′/5′ UTRs (Table EV2). After removing synonymous mutations, the number of mutations was reduced to 127,536 variants in the SCAN-B set, i.e., an average of 40 mutations per tumor. We analyzed the contribution of the six nucleotide substitution types (C>A, C>G, C>T, T>A, T>C, and T>G) to SNVs in the ABiM and SCAN-B sets (Fig 2A). Compared to DNA, RNA-seq-based variant calls showed a relative under-representation of C>T substitutions and an over-representation of T>C substitutions. Figure 2. Overview of non-synonymous mutations in terms of base substitution signatures, molecular subtype, and protein impact A. Contribution of base change types to the overall SNV composition in the ABiM cohort for captured DNA regions and mRNA in the captured DNA regions, as well as SCAN-B whole mRNA. B. Number of non-synonymous mutations per sample. Bars are colored by PAM50 subtypes Luminal A (dark blue), Luminal B (light blue), HER2-enriched (pink), basal-like (red), Normal-like (green) and Unclassified (gray). C–F. Lollipop plots showing the location, abundance, and impact of SNVs in (C) TP53, (D) PIK3CA, (E) PTEN, and (F) ERBB2 on the respective encoded protein. Protein change labels are shown for the most mutated amino acid positions, with residues ordered left to right by mutation frequency within each label. Download figure Download PowerPoint In accordance with published studies of primary BC, the most frequently mutated genes were the known BC drivers PIK3CA (34% of samples), TP53 (23%), MAP3K1 (7%), CDH1 (7%), GATA3 (7%), and AKT1 (5%) (Fig 3). As reported before (Ciriello et al, 2015), disruptive alterations in CDH1 were a hallmark of lobular carcinomas (135/386 [35.0%] of samples), while alterations in TP53, MAP3K1, and GATA3 were more common in the ductal type. 86.8% of SCAN-B samples had at least one mutation in a gene targeted by an approved or experimental drug, based on the Database of Gene-drug Interactions (DGI). Figure 3. Overview of frequently mutated genes across 3,217 SCAN-B samplesWaterfall plot of the 20 most frequently mutated genes (rows) across 3,217 SCAN-B samples (columns). Genes are ranked from top to bottom by mutation frequency. Samples are sorted by histological subtype and alteration occurrence. Mutations are colored by predicted functional impact. Download figure Download PowerPoint Somatic mutations in important BC genes We examined known driver BC genes more closely and found our RNA-seq-based mutation calls to recapitulate known mutation rates and hot spots, summarized in Table 2, Table EV2, and Fig 2C–F. Associations of mutated genes and clinical and molecular biomarkers are summarized in Table EV3, and several examples are highlighted below. Table 2. The most occurring non-synonymous mutations in the genes PIK3CA, AKT1, SF3B1, GATA3, ERBB2, TP53, FOXA1, and CDH1 in 3,217 SCAN-B samples Gene AA change Number of mutations Mut. samples (%) Mut. in gene (%) PIK3CA H1047R 483 15 41.5 E545K 212 6.6 18.2 E542K 142 4.4 12.2 H1047L 77 2.4 6.6 N345K 49 1.5 4.2 E726K 26 0.8 2.2 C420R 20 0.6 1.7 E453K 13 0.4 1.1 G1049R 11 0.3 0.9 E545A 10 0.3 0.9 Q546K 10 0.3 0.9 M1043I 8 0.2 0.7 Other 102 3.2 8.8 AKT1 E17K 121 3.8 76.1 Other 38 1.2 23.9 SF3B1 K700E 60 1.9 74.1 Other 21 0.7 25.9 GATA3 P409fs 30 0.9 12.2 M294K 14 0.4 5.7 D336fs 10 0.3 4.1 D332fs 10 0.3 4.1 Other 182 5.7 74 ERBB2 L755S 28 0.9 23.9 V777L 24 0.7 20.5 D769Y 9 0.3 7.7 Other 56 1.7 47.9 TP53 R273C 25 0.8 3.2 R248Q 25 0.8 3.2 R175H 24 0.7 3.5 R248W 22 0.7 3.1 R273H 19 0.6 2.4 Y220C 17 0.5 2.2 F134L 14 0.4 1.8 E285K 13 0.4 1.7 R213* 12 0.4 1.5 R282W 12 0.4 1.5 R306* 10 0.3 1.3 Y163C 10 0.3 1.3 L194R 9 0.3 1.2 R342* 9 0.3 1.2 E286K 8 0.2 1 G245S 8 0.2 1 H179R 8 0.2 1 Q331* 8 0.2 1 Other 529 16.4 65.1 FOXA1 S250F 23 0.7 15.8 F266L 11 0.3 7.5 Other 112 3.5 76.7 CDH1 Q23* 18 0.6 7.7 I650fs 8 0.2 3.4 P127fs 8 0.2 3.4 Other 199 6.2 85.4 Shown are the total number of mutations, the frequency of the mutations in the SCAN-B cohort (Mut. samples), and the frequency of a particular mutation within all mutations in the gene (Mut. in gene). PIK3CA was the most frequently mutated gene, with 1,163 non-synonymous mutations in 1,095 patient samples (34% of patients). As expected, and in line with previous studies (Saal et al, 2005; The Cancer Genome Atlas, 2012; Pereira et al, 2016), the majority of alterations were the known hot spot mutations H1047R/L, E545K, and E542K (Table 2, Fig 2D), which lead to constitutive signaling (Bader et al, 2006). All hot spot mutations and the vast majority of other PIK3CA alterations were missense mutations. Mutations were associated with lobular, ER+, PgR+, HER2−, and Luminal A (LumA) BC (Table EV3). TP53 is frequently disrupted by somatic SNVs; however, a few hot spot mutations exist (Giacomelli et al, 2018). The mutation frequency in BC is estimated to be 35.4-37% (The Cancer Genome Atlas, 2012; Pereira et al, 2016), which we could confirm in our DNA-seq ABiM filter-definition cohort (37%). Likely due to nonsense-mediated decay (NMD), loss of heterozygosity, and/or decreased mRNA transcription, in the 3,217 cases, the frequency of TP53 mutations was lower at 23% (782 mutations in 733 samples). Despite underdetection by RNA-seq, the identified hot spot residues were the same as reported in the IARC TP53 database (release R20) (Bouaoun et al, 2016). The most often mutated amino acids we observed were R273, R248, R175 (50, 49, and 24 mutations respectively, total 123/782 [15.7%]), followed by positions Y220 (21/782 [2.7%]), R280 (19/782 [2.4%]), and R342 (17/782 [2.2%]) (Table 2, Fig 2C). Most detected mutations are in the DNA binding domain, and 77.6% of overall mutations are missense mutations, likely leading to protein loss of function (LoF). As anticipated, TP53 mutations were associated with ductal, ER−, PgR−, HER2+, hormone receptor positive (HoR+)/HER2+ (HoR+ defined as ER+ and PgR+, HoR− otherwise), HoR−/HER2+, triple-negative BC (TNBC), and the basal-like and HER2-enriched PAM50 subtypes (Table EV3), as reported before (The Cancer Genome Atlas, 2012). PTEN is a crucial tumor suppressor gene and regulator of PI3K activity, and PTEN protein expression is associated with poor outcome (Saal et al, 2007). In our dataset, we found 124 non-synonymous mutations in 116/3,217 (3.6%) samples, including hot spot mutations in H303 and H266 of unknown significance (Fig 2E). Mutations were significantly associated with HER2− disease (Table EV3). ERBB2 (HER2) mutations have emerged as a novel biomarker and occur by the majority in patients without ERBB2 amplification (Bose et al, 2013), but also in ERBB2-amplified cases (Cocco et al, 2018). Evidence is mounting that recurrent ERBB2 mutations lead to increased activation of the HER2 receptor in tumors classified as HER2 normal (Bose et al, 2013; Wen et al, 2015; Pahuja et al, 2018). Activating ERBB2 mutations have been shown to confer therapy resistance against standard of care drugs such as trastuzumab and lapatinib (Cocco et al, 2018), but can be overcome using pan-HER tyrosine kinase inhibitors (TKIs) such as neratinib (Bose et al, 2013; Ben-Baruch et al, 2015; Ma et al, 2017; Cocco et al, 2018). ERBB2 mutations have also been shown to confer resistance to endocrine therapy in the metastatic setting (Nayar et al, 2018), where HER2-directed drugs are effective (Murray et al, 2018). We identified 117 non-synonymous ERBB2 mutations in 103 patients (3.2%), higher than the previously reported incidence rates of 1.6%-2.4% (Bose et al, 2013; Wen et al, 2015; Ross et al, 2016), but lower than in metastatic BC where rates as high as ~ 7% have been reported (Cocco et al, 2018). Two hot spots, L755S (28/117) and V777L (24/117) that cause constitutive HER2 signaling (Fig 2F) (Bose et al, 2013; Wen et al, 2015), accounted for 44.4% of total ERBB2 mutations. Co-occurrence of ERBB2 mutation and amplification has been reported before, however mainly in the metastatic setting (Cocco et al, 2018). In our untreated, early BC cohort, we observed ERBB2 mutation and amplification in 12 tumors, demonstrating that co-incident ERBB2 mutation and amplification is rare but can occur in early, treatment-naïve BC. Mutation and amplification were not mutually exclusive (P = 0.88), and interestingly ERBB2 mutations occurred predominantly in tumors classified as PAM50 HER2-enriched subtype (P = 0.0001). Moreover, ERBB2 mutation was significantly associated with PgR− and lobular BC (Table EV3). Loss of E-cadherin (CDH1) protein expression is a hallmark of the lobular BC phenotype (Ciriello et al, 2015). With 12% of our cohort being of lobular type, we observed 137 of total 233 CDH1 mutations in lobular BCs (58.8%, P = 1.6E-72). The mutations were mostly comprised of nonsense mutations (37.2%) and frameshift indels (35.4%), suggesting they contribute to CDH1 expression loss and drive the lobular phenotype. We observed one nonsense mutation hot spot (Q23*, n = 18), and this residue was also hit by a rare missense mutation (Q23K, n = 1). In addition to lobular BC, CDH1 mutations were associated with ER+, HER2−, and HoR+/ HER2− status, and the LumA subtype (Table EV3). Other notable mutated genes in our set were MAP3K1, AKT1, ESR1, GATA3, FOXA1, SF3B1, and CBFB. MAP3K1 is a regulator of signaling pathways and regularly implicated in various