doped.analysis module
Code to analyse VASP defect calculations.
These functions are built from a combination of useful modules from pymatgen, alongside substantial modification, in the efforts of making an efficient, user-friendly package for managing and analysing defect calculations, with publication-quality outputs.
- class doped.analysis.DefectParser(defect_entry: DefectEntry, defect_vr: Vasprun | None = None, bulk_vr: Vasprun | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, **kwargs)[source]
Bases:
objectCreate a DefectParser object, which has methods for parsing the results of defect supercell calculations.
Direct initiation with DefectParser() is typically not recommended. Rather DefectParser.from_paths() or defect_entry_from_paths() are preferred as shown in the doped parsing tutorials.
- Parameters:
defect_entry (DefectEntry) – doped DefectEntry
defect_vr (Vasprun) – pymatgen Vasprun object for the defect supercell calculation
bulk_vr (Vasprun) – pymatgen Vasprun object for the reference bulk supercell calculation
skip_corrections (bool) – Whether to skip calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()) such as bulk_locpot_dict, bulk_site_potentials etc. Mainly used by DefectsParser to expedite parsing by avoiding reloading bulk data for each defect.
- classmethod from_paths(defect_path, bulk_path, dielectric: float | int | ndarray | list | None = None, charge_state: int | None = None, initial_defect_structure_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_bandgap_path: str | None = None, **kwargs)[source]
Parse the defect calculation outputs in defect_path and return the DefectParser object. By default, the DefectParser.defect_entry.name attribute (later used to label defects in plots) is set to the defect_path folder name (if it is a recognised defect name), else it is set to the default doped name for that defect.
Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).
- Parameters:
defect_path (str) – Path to defect supercell folder (containing at least vasprun.xml(.gz)).
bulk_path (str) – Path to bulk supercell folder (containing at least vasprun.xml(.gz)).
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant. If not provided, charge corrections cannot be computed and so skip_corrections will be set to true.
charge_state (int) – Charge state of defect. If not provided, will be automatically determined from the defect calculation outputs, or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).
initial_defect_structure_path (str) – Path to the initial/unrelaxed defect structure. Only recommended for use if structure matching with the relaxed defect structure(s) fails (rare). Default is None.
skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_bandgap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk bandstructure calculation instead. Alternatively, you can edit/add the “gap” and “vbm” entries in DefectParser.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will calculate “gap”/”vbm” using the outputs at: DefectParser.defect_entry.calculation_metadata[“bulk_path”]
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()) such as bulk_locpot_dict, bulk_site_potentials etc. Mainly used by DefectsParser to expedite parsing by avoiding reloading bulk data for each defect.
- Returns:
DefectParser object.
- load_FNV_data(bulk_locpot_dict=None)[source]
Load metadata required for performing Freysoldt correction (i.e. LOCPOT planar-averaged potential dictionary).
Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP LOCPOT files to be present in these directories. Can read compressed “LOCPOT.gz” files. The bulk_locpot_dict can be supplied if already parsed, for expedited parsing of multiple defects.
Saves the bulk_locpot_dict and defect_locpot_dict dictionaries (containing the planar-averaged electrostatic potentials along each axis direction) to the DefectEntry.calculation_metadata dict, for use with DefectEntry.get_freysoldt_correction().
- Parameters:
bulk_locpot_dict (dict) – Planar-averaged potential dictionary for bulk supercell, if already parsed. If None (default), will load from LOCPOT(.gz) file in defect_entry.calculation_metadata[“bulk_path”]
- Returns:
bulk_locpot_dict for reuse in parsing other defect entries
- load_and_check_calculation_metadata()[source]
Pull metadata about the defect supercell calculations from the outputs, and check if the defect and bulk supercell calculations settings are compatible.
- load_bulk_gap_data(bulk_bandgap_path=None, use_MP=False, mpid=None, api_key=None)[source]
Get bulk band gap data from bulk OUTCAR file, or OUTCAR located at actual_bulk_path.
Alternatively, one can specify query the Materials Project (MP) database for the bulk gap data, using use_MP = True, in which case the MP entry with the lowest number ID and composition matching the bulk will be used, or the MP ID (mpid) of the bulk material to use can be specified. This is not recommended as it will correspond to a severely-underestimated GGA DFT bandgap!
- Parameters:
bulk_bandgap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk bandstructure calculation instead. If None, will use self.defect_entry.calculation_metadata[“bulk_path”].
use_MP (bool) – If True, will query the Materials Project database for the bulk gap data.
mpid (str) – If provided, will query the Materials Project database for the bulk gap data, using this Materials Project ID.
api_key (str) – Materials API key to access database.
- load_eFNV_data(bulk_site_potentials=None)[source]
Load metadata required for performing Kumagai correction (i.e. atomic site potentials from the OUTCAR files).
Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP OUTCAR files to be present in these directories. Can read compressed “OUTCAR.gz” files. The bulk_site_potentials can be supplied if already parsed, for expedited parsing of multiple defects.
Saves the bulk_site_potentials and defect_site_potentials lists (containing the atomic site electrostatic potentials, from -1*np.array(Outcar.electrostatic_potential)) to DefectEntry.calculation_metadata, for use with DefectEntry.get_kumagai_correction().
- Parameters:
bulk_site_potentials (dict) – Atomic site potentials for the bulk supercell, if already parsed. If None (default), will load from OUTCAR(.gz) file in defect_entry.calculation_metadata[“bulk_path”]
- Returns:
bulk_site_potentials for reuse in parsing other defect entries
- class doped.analysis.DefectsParser(output_path: str = '.', dielectric: float | int | ndarray | None = None, subfolder: str | None = None, bulk_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_bandgap_path: str | None = None, processes: int | None = None, json_filename: str | bool | None = None)[source]
Bases:
objectA class for rapidly parsing multiple VASP defect supercell calculations for a given host (bulk) material.
Loops over calculation directories in output_path (likely the same output_path used with DefectsSet for file generation in doped.vasp) and parses the defect calculations into a dictionary of: {defect_name: DefectEntry}, where the defect_name is set to the defect calculation folder name (_if it is a recognised defect name_), else it is set to the default doped name for that defect. By default, searches for folders in output_path with subfolder containing vasprun.xml(.gz) files, and tries to parse them as `DefectEntry`s.
By default, tries to use multiprocessing to speed up defect parsing, which can be controlled with the processes parameter.
Defect charge states are automatically determined from the defect calculation outputs if POTCAR`s are set up with `pymatgen (see docs Installation page), or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).
Uses the (single) DefectParser class to parse the individual defect calculations. Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).
- Parameters:
output_path (str) – Path to the output directory containing the defect calculation folders (likely the same output_path used with DefectsSet for file generation in doped.vasp). Default = current directory.
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant. If not provided, charge corrections cannot be computed and so skip_corrections will be set to true.
subfolder (str) – Name of subfolder(s) within each defect calculation folder (in the output_path directory) containing the VASP calculation files to parse (e.g. vasp_ncl, vasp_std, vasp_gam etc.). If not specified, doped checks first for vasp_ncl, vasp_std, vasp_gam subfolders with calculation outputs (vasprun.xml(.gz) files) and uses the highest level VASP type (ncl > std > gam) found as subfolder, otherwise uses the defect calculation folder itself with no subfolder (set subfolder = “.” to enforce this).
bulk_path (str) – Path to bulk supercell reference calculation folder. If not specified, searches for folder with name “X_bulk” in the output_path directory (matching the default doped name for the bulk supercell reference folder).
skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energies (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in any charge correction is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_bandgap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk bandstructure calculation instead. Alternatively, you can edit/add the “gap” and “vbm” entries in DefectParser.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will calculate “gap”/”vbm” using the outputs at: DefectParser.defect_entry.calculation_metadata[“bulk_path”]
processes (int) – Number of processes to use for multiprocessing for expedited parsing. If not set, defaults to one less than the number of CPUs available.
json_filename (str) – Filename to save the parsed defect entries dict (DefectsParser.defect_dict) to in output_path, to avoid having to re-parse defects when later analysing further and aiding calculation provenance. Can be reloaded using the loadfn function from monty.serialization as shown in the docs, or DefectPhaseDiagram.from_json(). If None (default), set as “{Chemical Formula}_defect_dict.json” where {Chemical Formula} is the chemical formula of the host material. If False, no json file is saved.
- doped.analysis.bold_print(string: str) None[source]
Does what it says on the tin.
Prints the input string in bold.
- doped.analysis.check_and_set_defect_entry_name(defect_entry: DefectEntry, possible_defect_name: str) None[source]
Check that possible_defect_name is a recognised format by doped (i.e. in the format “{defect_name}_{optional_site_info}_{charge_state}”).
If the DefectEntry.name attribute is not defined or does not end with the charge state, then the entry will be renamed with the doped default name.
- doped.analysis.defect_entry_from_paths(defect_path: str, bulk_path: str, dielectric: float | int | ndarray | list | None = None, charge_state: int | None = None, initial_defect_structure_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_bandgap_path: str | None = None, **kwargs)[source]
Parse the defect calculation outputs in defect_path and return the parsed DefectEntry object. By default, the DefectEntry.name attribute (later used to label the defects in plots) is set to the defect_path folder name (if it is a recognised defect name), else it is set to the default doped name for that defect.
Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).
- Parameters:
defect_path (str) – Path to defect supercell folder (containing at least vasprun.xml(.gz)).
bulk_path (str) – Path to bulk supercell folder (containing at least vasprun.xml(.gz)).
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant. If not provided, charge corrections cannot be computed and so skip_corrections will be set to true.
charge_state (int) – Charge state of defect. If not provided, will be automatically determined from the defect calculation outputs.
initial_defect_structure_path (str) – Path to the initial/unrelaxed defect structure. Only recommended for use if structure matching with the relaxed defect structure(s) fails (rare). Default is None.
skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_bandgap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk bandstructure calculation instead. Alternatively, you can edit/add the “gap” and “vbm” entries in self.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will use DefectEntry.calculation_metadata[“bulk_path”].
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()) such as bulk_locpot_dict, bulk_site_potentials etc.
- Returns:
Parsed DefectEntry object.
- doped.analysis.defect_from_structures(bulk_supercell, defect_supercell, return_all_info=False, bulk_voronoi_node_dict=None)[source]
Auto-determines the defect type and defect site from the supplied bulk and defect structures, and returns a corresponding Defect object.
If return_all_info is set to true, then also returns: - _relaxed_ defect site in the defect supercell - the defect site in the bulk supercell - defect site index in the defect supercell - bulk site index (index of defect site in bulk supercell) - guessed initial defect structure (before relaxation) - ‘unrelaxed defect structure’ (also before relaxation, but with interstitials at their
final _relaxed_ positions, and all bulk atoms at their unrelaxed positions).
- Parameters:
bulk_supercell (Structure) – Bulk supercell structure.
defect_supercell (Structure) – Defect structure to use for identifying the defect site and type.
return_all_info (bool) – If True, returns additional python objects related to the site-matching, listed above. (Default = False)
bulk_voronoi_node_dict (dict) – Dictionary of bulk supercell Voronoi node information, for expedited site-matching. If None, will be re-calculated.
- Returns:
doped Defect object. If return_all_info is True, then also: defect_site (Site):
pymatgen Site object of the _relaxed_ defect site in the defect supercell.
- defect_site_in_bulk (Site):
pymatgen Site object of the defect site in the bulk supercell (i.e. unrelaxed vacancy/substitution site, or final _relaxed_ interstitial site for interstitials).
- defect_site_index (int):
index of defect site in defect supercell (None for vacancies)
- bulk_site_index (int):
index of defect site in bulk supercell (None for interstitials)
- guessed_initial_defect_structure (Structure):
pymatgen Structure object of the guessed initial defect structure.
- unrelaxed_defect_structure (Structure):
pymatgen Structure object of the unrelaxed defect structure.
- bulk_voronoi_node_dict (dict):
Dictionary of bulk supercell Voronoi node information, for further expedited site-matching.
- Return type:
defect (Defect)
- doped.analysis.defect_name_from_structures(bulk_structure, defect_structure)[source]
Get the doped/SnB defect name using the bulk and defect structures.
- Parameters:
bulk_structure (Structure) – Bulk (pristine) structure.
defect_structure (Structure) – Defect structure.
- Returns:
Defect name.
- Return type:
str
- doped.analysis.dpd_from_defect_dict(defect_dict: dict) DefectPhaseDiagram[source]
Generates a DefectPhaseDiagram object from a dictionary of parsed defect calculations in the format: {“defect_name”: defect_entry}), likely created using defect_entry_from_paths() (or DefectParser), which can then be used to analyse and plot the defect thermodynamics (formation energies, transition levels, concentrations etc). Note that the DefectEntry.name attributes (rather than the defect_name key in the defect_dict) are used to label the defects in plots.
- Parameters:
defect_dict (dict) – Dictionary of parsed defect calculations in the format: {“defect_name”: defect_entry}), likely created using defect_entry_from_paths() (or DefectParser). Must have ‘vbm’ and ‘gap’ in defect_entry.calculation_metadata for at least one defect (from DefectParser.load_bulk_gap_data())
- Returns:
doped DefectPhaseDiagram object (DefectPhaseDiagram)
- doped.analysis.dpd_transition_levels(defect_phase_diagram: DefectPhaseDiagram)[source]
Iteratively prints the charge transition levels for the input DefectPhaseDiagram object (via the from a defect_phase_diagram.transition_level_map attribute).
- Parameters:
defect_phase_diagram (DefectPhaseDiagram) – DefectPhaseDiagram object (likely created from analysis.dpd_from_defect_dict)
- Returns:
None
- doped.analysis.formation_energy_table(defect_phase_diagram: DefectPhaseDiagram, chempots: Dict | None = None, el_refs: Dict | None = None, facets: List | None = None, fermi_level: float = 0)[source]
Generates defect formation energy tables (DataFrames) for either a single chemical potential limit (i.e. phase diagram facet) or each facet in the phase diagram (chempots dict), depending on the chempots input supplied. This can either be a dictionary of chosen absolute/DFT chemical potentials: {Element: Energy} (giving a single formation energy table) or a dictionary including the key-value pair: {“facets”: [{‘facet’: [chempot_dict]}]}, following the doped format. In the latter case, a subset of facet(s) / chemical potential limit(s) can be chosen with the facets argument, or if not specified, will print formation energy tables for each facet in the phase diagram.
Returns the results as a pandas DataFrame or list of DataFrames.
Table Key: (all energies in eV) ‘Defect’ -> Defect name ‘q’ -> Defect charge state. ‘ΔEʳᵃʷ’ -> Raw DFT energy difference between defect and host supercell (E_defect - E_host). ‘qE_VBM’ -> Defect charge times the VBM eigenvalue (to reference the Fermi level to the VBM) ‘qE_F’ -> Defect charge times the Fermi level (referenced to the VBM if qE_VBM is not 0
(if “vbm” in DefectEntry.calculation_metadata)
‘Σμ_ref’ -> Sum of reference energies of the elemental phases in the chemical potentials sum. ‘Σμ_formal’ -> Sum of _formal_ atomic chemical potential terms (Σμ_DFT = Σμ_ref + Σμ_formal). ‘E_corr’ -> Finite-size supercell charge correction. ‘ΔEᶠᵒʳᵐ’ -> Defect formation energy, with the specified chemical potentials and Fermi level.
Equals the sum of all other terms.
- Parameters:
defect_phase_diagram (DefectPhaseDiagram) – DefectPhaseDiagram for which to plot defect formation energies (typically created from analysis.dpd_from_defect_dict).
chempots (dict) – Dictionary of chemical potentials to use for calculating the defect formation energies. This can have the form of {“facets”: [{‘facet’: [chempot_dict]}]} (the format generated by doped’s chemical potential parsing functions (see tutorials)) and facet(s) (chemical potential limit(s)) to tabulate can be chosen using facets, or a dictionary of DFT/absolute chemical potentials (not formal chemical potentials!), in the format: {element symbol: chemical potential} - if manually specifying chemical potentials this way, you can set the el_refs option with the DFT reference energies of the elemental phases in order to show the formal (relative) chemical potentials as well. (Default: None)
facets (list, str) – A string or list of facet(s) (chemical potential limit(s)) for which to tabulate the defect formation energies, corresponding to ‘facet’ in {“facets”: [{‘facet’: [chempot_dict]}]} (the format generated by doped’s chemical potential parsing functions (see tutorials)). If not specified, will tabulate for each facet in chempots. (Default: None)
el_refs (dict) – Dictionary of elemental reference energies for the chemical potentials in the format: {element symbol: reference energy} (to determine the formal chemical potentials, when chempots has been manually specified as {element symbol: chemical potential}). Unnecessary if chempots is provided in format generated by doped (see tutorials). (Default: None)
fermi_level (float) – Value corresponding to the electron chemical potential. If “vbm” is supplied in DefectEntry.calculation_metadata, then fermi_level is referenced to the VBM. If “vbm” is NOT supplied in calculation_metadata, then fermi_level is referenced to the calculation’s absolute DFT potential (and should include the vbm value provided by a band structure calculation). Default = 0 (i.e. at the VBM)
- Returns:
pandas DataFrame or list of DataFrames