Title: A database of vacancy formation enthalpies for materials discovery
Abstract: <strong>A database of vacancy formation enthalpies for materials discovery</strong> Matthew Witman<sup>a</sup>, Anuj Goyal<sup>b</sup>, Tadashi Ogitsu<sup>c</sup>, Anthony McDaniel<sup>a</sup>, Stephan Lany<sup>b</sup> <sup>a</sup> Sandia National Laboratories, <sup>b</sup> National Renewable Energy Laboratory, <sup>c</sup> Lawrence Livermore National Laboratories <strong>Abstract</strong> This dataset provides DFT calculations of cation and oxygen vacancy defects in oxides which can be used to derive efficient data-driven models for vacancy formation enthalpy. DFT calculations were performed as described in <DOI: 10.26434/chemrxiv-2022-frcns>, where a graph neural network surrogate model was trained and used to screen the Materials Project for promising solar thermochemical water splitting materials. The data, models, scripts and code needed to reproduce the results in <DOI: 10.26434/chemrxiv-2022-frcns> are described below. <strong>Data & Models</strong> 1) data_01_03_22/* corresponds to oxide compounds used in model training 2) known_cmpds/* corresponds to known STCH compounds 3) screeningMP/* corresponds to the screening related data screening_inelements/* stores only Materials Project oxides whose composition is a subset of the training elements and contains all the vacancy defect predictions MP_O_PDs/* stores offline PDs from Materials Project so that adjusting oxide stability metrics can be done somewhat rapidly MP_O_Compounds/* stores possible MP oxide compounds to screen In general, the above folders contain: DFT data/structures are included in sub-directories: poscars, magnetic moments, oxidation states, and csvs (containing the vacancy enthalpy for each unique site) cgcnn/* contains the processed DFT data for use in the CGCNN code (see <strong>Scripts</strong> for how to prepare this) id_prop.csv.* contains [cif name, defect formation enthalpy] pairs Different id_prop.csv.* files correspond to different K-fold stratifications in the screening directory, defect formation enthalpy is omitted since it has not been computed with DFT model-(X1)k(X2)_(X3)_(X4) corresponds to different CV models for X1 different training set sizes (i.e., try to train with only 10%, 40%, or 100% of the data) X2 different k folds X3 = "struct" or "" for "structure-wise validation" or "defect-wise validation", respectively X4 for different encoding strategies structure X-Yz.cif indicates structure X, defect element Y, symmetry site z, where one instance of that site has been re-ordered to be the first atom in the cif file *.locals contains a one-hot encoding of oxidation states of all sites in that crystal *.locals_continuous contains a continuous encoding of oxidation state in that crystal *.globals contains global properties of the host structure <strong>Scripts</strong> scripts/*.sh scripts to rerun the screenings for different k-folds, encodings, etc. scripts/*.ipynb to analyze results scripts/prepare_cgcnn.py for translating the data in (poscars/*, csvs/*, oxstate/*, mags/*) to the ML input needed in cgcnn/* <strong>Code</strong> Install CGCNN and its defect modifications from https://github.com/mwitman1/cgcnndefect <strong>Questions/Collaborations</strong> Please contact [email protected] <strong>Acknowledgements</strong> This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Energy Efficiency and Renewable Energy (EERE), specifically the Hydrogen and Fuel Cell Technologies Office. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Part of the work was performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory under contract No.~DE- AC52-07NA27344. The National Renewable Energy Laboratory (NREL) is operated by the Alliance for Sustainable Energy, LLC, for the DOE under Contract No.~DE-AC36-08GO28308. This work used High-Performance Computing resources at NREL, sponsored by DOE-EERE. The views expressed in this article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.