doped.analysis module
Code to analyse VASP defect calculations.
These functions are built from a combination of useful modules from pymatgen,
alongside substantial modification, in the efforts of making an efficient,
user-friendly package for managing and analysing defect calculations, with
publication-quality outputs.
- class doped.analysis.DefectParser(defect_entry: DefectEntry, defect_vr: Vasprun | None = None, bulk_vr: Vasprun | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, parse_projected_eigen: bool | None = None, **kwargs)[source]
Bases:
objectCreate a
DefectParserobject, which has methods for parsing the results of defect supercell calculations.Direct initiation with
DefectParser()is typically not recommended. RatherDefectParser.from_paths()ordefect_entry_from_paths()are preferred as shown in the doped parsing tutorials.- Parameters:
defect_entry (DefectEntry) – doped
DefectEntrydefect_vr (Vasprun) –
pymatgenVasprunobject for the defect supercell calculationbulk_vr (Vasprun) –
pymatgenVasprunobject for the reference bulk supercell calculationskip_corrections (bool) – Whether to skip calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction, based on the variance of the potential in the sampling region is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & orbitals from the bulk and defect calculations (so
DefectEntry.get_eigenvalue_analysis()can then be used with no further parsing). Will initially try to load orbital projections fromvasprun.xml(.gz)files (slightly slower but more accurate), or failing that fromPROCAR(.gz)files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set toFalseif parsing speed is crucial. Default isNone, which will attempt to load this data but with no warning if it fails (otherwise ifTruea warning will be printed).**kwargs – Keyword arguments to pass to
DefectParser()methods (load_FNV_data(),load_eFNV_data(),load_bulk_gap_data())point_symmetry_from_defect_entry()ordefect_from_structures, includingbulk_locpot_dict,bulk_site_potentials,use_MP,mpid,api_key,symprecoroxi_state. Primarily used byDefectsParserto expedite parsing by avoiding reloading bulk data for each defect.
- classmethod from_paths(defect_path: str, bulk_path: str | None = None, bulk_vr: Vasprun | None = None, bulk_procar: EasyunfoldProcar | Procar | None = None, dielectric: float | int | ndarray | list | None = None, charge_state: int | None = None, initial_defect_structure_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_band_gap_path: str | None = None, parse_projected_eigen: bool | None = None, **kwargs)[source]
Parse the defect calculation outputs in
defect_pathand return theDefectParserobject. By default, theDefectParser.defect_entry.nameattribute (later used to label defects in plots) is set to the defect_path folder name (if it is a recognised defect name), else it is set to the default doped name for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances).Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).
- Parameters:
defect_path (str) – Path to defect supercell folder (containing at least
vasprun.xml(.gz)).bulk_path (str) – Path to bulk supercell folder (containing at least
vasprun.xml(.gz)). Not required ifbulk_vris provided.bulk_vr (Vasprun) –
pymatgenVasprunobject for the reference bulk supercell calculation, if already loaded (can be supplied to expedite parsing). Default isNone.bulk_procar (Procar) –
easyunfold/pymatgenProcarobject, for the reference bulk supercell calculation if already loaded (can be supplied to expedite parsing). Default isNone.dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant. If not provided, charge corrections cannot be computed and so
skip_correctionswill be set to true.charge_state (int) – Charge state of defect. If not provided, will be automatically determined from defect calculation outputs, or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).
initial_defect_structure_path (str) – Path to the initial/unrelaxed defect structure. Only recommended for use if structure matching with the relaxed defect structure(s) fails (rare). Default is
None.skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default =
False.error_tolerance (float) – If the estimated error in the defect charge correction, based on the variance of the potential in the sampling region, is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_band_gap_path (str) – Path to bulk
OUTCARfile for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk band structure calculation instead. Alternatively, you can edit/add the “gap”/”vbm” entries inDefectParser.defect_entry.calculation_metadatato match the correct eigenvalues. IfNone, will calculate “gap”/”vbm” using the outputs at:DefectParser.defect_entry.calculation_metadata["bulk_path"]parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & orbitals from the bulk and defect calculations (so
DefectEntry.get_eigenvalue_analysis()can then be used with no further parsing). Will initially try to load orbital projections fromvasprun.xml(.gz)files (slightly slower but more accurate), or failing that fromPROCAR(.gz)files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set toFalseif parsing speed is crucial. Default isNone, which will attempt to load this data but with no warning if it fails (otherwise ifTruea warning will be printed).**kwargs – Keyword arguments to pass to
DefectParser()methods (load_FNV_data(),load_eFNV_data(),load_bulk_gap_data())point_symmetry_from_defect_entry()ordefect_from_structures, includingbulk_locpot_dict,bulk_site_potentials,use_MP,mpid,api_key,symprecoroxi_state. Primarily used byDefectsParserto expedite parsing by avoiding reloading bulk data for each defect.
- Returns:
DefectParserobject.
- load_FNV_data(bulk_locpot_dict=None)[source]
Load metadata required for performing Freysoldt correction (i.e. LOCPOT planar-averaged potential dictionary).
Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP LOCPOT files to be present in these directories. Can read compressed “LOCPOT.gz” files. The bulk_locpot_dict can be supplied if already parsed, for expedited parsing of multiple defects.
Saves the
bulk_locpot_dictanddefect_locpot_dictdictionaries (containing the planar-averaged electrostatic potentials along each axis direction) to the DefectEntry.calculation_metadata dict, for use with DefectEntry.get_freysoldt_correction().- Parameters:
bulk_locpot_dict (dict) – Planar-averaged potential dictionary for bulk supercell, if already parsed. If
None(default), will load fromLOCPOT(.gz)file indefect_entry.calculation_metadata["bulk_path"]- Returns:
bulk_locpot_dict for reuse in parsing other defect entries
- load_and_check_calculation_metadata()[source]
Pull metadata about the defect supercell calculations from the outputs, and check if the defect and bulk supercell calculations settings are compatible.
- load_bulk_gap_data(bulk_band_gap_path=None, use_MP=False, mpid=None, api_key=None)[source]
Get bulk band gap data from a bulk
vasprun.xml(.gz)file located in/atbulk_band_gap_path.Alternatively, one can specify query the Materials Project (MP) database for the bulk gap data, using
use_MP = True, in which case the MP entry with the lowest number ID and composition matching the bulk will be used, or the MP ID (mpid) of the bulk material to use can be specified. This is not recommended as it will correspond to a severely-underestimated GGA DFT bandgap!- Parameters:
bulk_band_gap_path (str) – Path to bulk
vasprun.xml(.gz)file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk band-structure calculation instead. If None, will useself.defect_entry.calculation_metadata["bulk_path"].use_MP (bool) – If True, will query the Materials Project database for the bulk gap data.
mpid (str) – If provided, will query the Materials Project database for the bulk gap data, using this Materials Project ID.
api_key (str) – Materials API key to access database.
- load_eFNV_data(bulk_site_potentials=None)[source]
Load metadata required for performing Kumagai correction (i.e. atomic site potentials from the OUTCAR files).
Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP OUTCAR files to be present in these directories. Can read compressed “OUTCAR.gz” files. The bulk_site_potentials can be supplied if already parsed, for expedited parsing of multiple defects.
Saves the
bulk_site_potentialsanddefect_site_potentialslists (containing the atomic site electrostatic potentials, from -1*np.array(Outcar.electrostatic_potential)) to DefectEntry.calculation_metadata, for use with DefectEntry.get_kumagai_correction().- Parameters:
bulk_site_potentials (dict) – Atomic site potentials for the bulk supercell, if already parsed. If None (default), will load from OUTCAR(.gz) file in defect_entry.calculation_metadata[“bulk_path”]
- Returns:
bulk_site_potentials for reuse in parsing other defect entries
- class doped.analysis.DefectsParser(output_path: str = '.', dielectric: float | int | ndarray | list | None = None, subfolder: str | None = None, bulk_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_band_gap_path: str | None = None, processes: int | None = None, json_filename: str | bool | None = None, parse_projected_eigen: bool | None = None, **kwargs)[source]
Bases:
objectA class for rapidly parsing multiple VASP defect supercell calculations for a given host (bulk) material.
Loops over calculation directories in
output_path(likely the sameoutput_pathused withDefectsSetfor file generation indoped.vasp) and parses the defect calculations into a dictionary of:{defect_name: DefectEntry}, where thedefect_nameis set to the defect calculation folder name (if it is a recognised defect name), else it is set to the defaultdopedname for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances). By default, searches for folders inoutput_pathwithsubfoldercontainingvasprun.xml(.gz)files, and tries to parse them asDefectEntrys.By default, tries multiprocessing to speed up defect parsing, which can be controlled with
processes. If parsing hangs, this may be due to memory issues, in which case you should reduceprocesses(e.g. 4 or less).Defect charge states are automatically determined from the defect calculation outputs if
POTCARs are set up withpymatgen(see docs Installation page), or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).Uses the (single)
DefectParserclass to parse the individual defect calculations. Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).- Parameters:
output_path (str) – Path to the output directory containing the defect calculation folders (likely the same
output_pathused withDefectsSetfor file generation indoped.vasp). Default = current directory.dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant, in the same xyz Cartesian basis as the supercell calculations. If not provided, charge corrections cannot be computed and so
skip_correctionswill be set toTrue.subfolder (str) – Name of subfolder(s) within each defect calculation folder (in the
output_pathdirectory) containing the VASP calculation files to parse (e.g.vasp_ncl,vasp_std,vasp_gametc.). If not specified,dopedchecks first forvasp_ncl,vasp_std,vasp_gamsubfolders with calculation outputs (vasprun.xml(.gz)files) and uses the highest level VASP type (ncl > std > gam) found assubfolder, otherwise uses the defect calculation folder itself with no subfolder (setsubfolder = "."to enforce this).bulk_path (str) – Path to bulk supercell reference calculation folder. If not specified, searches for folder with name “X_bulk” in the
output_pathdirectory (matching the defaultdopedname for the bulk supercell reference folder).skip_corrections (bool) – Whether to skip the calculation & application of finite-size charge corrections to the defect energies (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in any charge correction, based on the variance of the potential in the sampling region, is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_band_gap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk band structure calculation instead. Alternatively, you can edit/add the “gap” and “vbm” entries in
DefectParser.defect_entry.calculation_metadatato match the correct eigenvalues. IfNone(default), will calculate “gap”/”vbm” using the outputs at:DefectParser.defect_entry.calculation_metadata["bulk_path"]processes (int) – Number of processes to use for multiprocessing for expedited parsing. If not set, defaults to one less than the number of CPUs available.
json_filename (str) – Filename to save the parsed defect entries dict (
DefectsParser.defect_dict) to inoutput_path, to avoid having to re-parse defects when later analysing further and aiding calculation provenance. Can be reloaded using theloadfnfunction frommonty.serialization(and then input toDefectThermodynamicsetc.). IfNone(default), set as{Host Chemical Formula}_defect_dict.json. IfFalse, no json file is saved.parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & orbitals from the bulk and defect calculations (so
DefectEntry.get_eigenvalue_analysis()can then be used with no further parsing). Will initially try to load orbital projections fromvasprun.xml(.gz)files (slightly slower but more accurate), or failing that fromPROCAR(.gz)files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set toFalseif parsing speed is crucial. Default isNone, which will attempt to load this data but with no warning if it fails (otherwise ifTruea warning will be printed).**kwargs – Keyword arguments to pass to
DefectParser()methods (load_FNV_data(),load_eFNV_data(),load_bulk_gap_data())point_symmetry_from_defect_entry()ordefect_from_structures, includingbulk_locpot_dict,bulk_site_potentials,use_MP,mpid,api_key,symprecoroxi_state. Primarily used byDefectsParserto expedite parsing by avoiding reloading bulk data for each defect.
- defect_dict
Dictionary of parsed defect calculations in the format:
{"defect_name": DefectEntry}where the defect_name is set to the defect calculation folder name (if it is a recognised defect name), else it is set to the defaultdopedname for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances).- Type:
dict
- get_defect_thermodynamics(chempots: dict | None = None, el_refs: dict | None = None, vbm: float | None = None, band_gap: float | None = None, dist_tol: float = 1.5, check_compatibility: bool = True) DefectThermodynamics[source]
Generates a DefectThermodynamics object from the parsed
DefectEntryobjects in self.defect_dict, which can then be used to analyse and plot the defect thermodynamics (formation energies, transition levels, concentrations etc).Note that the DefectEntry.name attributes (rather than the defect_name key in the defect_dict) are used to label the defects in plots.
- Parameters:
chempots (dict) –
Dictionary of chemical potentials to use for calculating the defect formation energies. This can have the form of
{"limits": [{'limit': [chempot_dict]}]}(the format generated bydoped's chemical potential parsing functions (see tutorials)) which allows easy analysis over a range of chemical potentials - where limit(s) (chemical potential limit(s)) to analyse/plot can later be chosen using thelimitsargument.Alternatively this can be a dictionary of chemical potentials for a single limit (limit), in the format:
{element symbol: chemical potential}. If manually specifying chemical potentials this way, you can set theel_refsoption with the DFT reference energies of the elemental phases in order to show the formal (relative) chemical potentials above the formation energy plot, in which case it is the formal chemical potentials (i.e. relative to the elemental references) that should be given here, otherwise the absolute (DFT) chemical potentials should be given.If None (default), sets all chemical potentials to zero. Chemical potentials can also be supplied later in each analysis function. (Default: None)
el_refs (dict) – Dictionary of elemental reference energies for the chemical potentials in the format:
{element symbol: reference energy}(to determine the formal chemical potentials, whenchempotshas been manually specified as{element symbol: chemical potential}). Unnecessary ifchempotsis provided in format generated bydoped(see tutorials). (Default: None)vbm (float) – VBM energy to use as Fermi level reference point for analysis. If None (default), will use “vbm” from the calculation_metadata dict attributes of the parsed DefectEntry objects.
band_gap (float) – Band gap of the host, to use for analysis. If None (default), will use “gap” from the calculation_metadata dict attributes of the parsed DefectEntry objects.
dist_tol (float) – Threshold for the closest distance (in Å) between equivalent defect sites, for different species of the same defect type, to be grouped together (for plotting and transition level analysis). If the minimum distance between equivalent defect sites is less than
dist_tol, then they will be grouped together, otherwise treated as separate defects. (Default: 1.5)check_compatibility (bool) – Whether to check the compatibility of the bulk entry for each defect entry (i.e. that all reference bulk energies are the same). (Default: True)
- Returns:
doped DefectThermodynamics object (
DefectThermodynamics)
- doped.analysis.check_and_set_defect_entry_name(defect_entry: DefectEntry, possible_defect_name: str = '', bulk_symm_ops: list | None = None) None[source]
Check that
possible_defect_nameis a recognised format by doped (i.e. in the format “{defect_name}_{optional_site_info}_{charge_state}”).If the DefectEntry.name attribute is not defined or does not end with the charge state, then the entry will be renamed with the doped default name for the unrelaxed defect (i.e. using the point symmetry of the defect site in the bulk cell).
- Parameters:
defect_entry (DefectEntry) – DefectEntry object.
possible_defect_name (str) – Possible defect name (usually the folder name) to check if recognised by
doped, otherwise defect name is re-determined.bulk_symm_ops (list) – List of symmetry operations of the defect_entry.bulk_supercell structure (used in determining the unrelaxed point symmetry), to avoid re-calculating. Default is None (recalculates).
- doped.analysis.defect_entry_from_paths(defect_path: str, bulk_path: str, dielectric: float | int | ndarray | list | None = None, charge_state: int | None = None, initial_defect_structure_path: str | None = None, skip_corrections: bool = False, error_tolerance: float = 0.05, bulk_band_gap_path: str | None = None, **kwargs)[source]
Parse the defect calculation outputs in
defect_pathand return the parsedDefectEntryobject.By default, the
DefectEntry.nameattribute (later used to label the defects in plots) is set to the defect_path folder name (if it is a recognised defect name), else it is set to the defaultdopedname for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances).Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).
- Parameters:
defect_path (str) – Path to defect supercell folder (containing at least vasprun.xml(.gz)).
bulk_path (str) – Path to bulk supercell folder (containing at least vasprun.xml(.gz)).
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Ionic + static contributions to the dielectric constant, in the same xyz Cartesian basis as the supercell calculations. If not provided, charge corrections cannot be computed and so
skip_correctionswill be set to true.charge_state (int) – Charge state of defect. If not provided, will be automatically determined from the defect calculation outputs.
initial_defect_structure_path (str) – Path to the initial/unrelaxed defect structure. Only recommended for use if structure matching with the relaxed defect structure(s) fails (rare). Default is None.
skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction, based on the variance of the potential in the sampling region is greater than this value (in eV), then a warning is raised. (default: 0.05 eV)
bulk_band_gap_path (str) – Path to bulk OUTCAR file for determining the band gap. If the VBM/CBM occur at reciprocal space points not included in the bulk supercell calculation, you should use this tag to point to a bulk bandstructure calculation instead. Alternatively, you can edit/add the “gap” and “vbm” entries in self.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will use DefectEntry.calculation_metadata[“bulk_path”].
**kwargs – Keyword arguments to pass to
DefectParser()methods (load_FNV_data(),load_eFNV_data(),load_bulk_gap_data())point_symmetry_from_defect_entry()ordefect_from_structures, includingbulk_locpot_dict,bulk_site_potentials,use_MP,mpid,api_key,symprecoroxi_state.
- Returns:
Parsed
DefectEntryobject.
- doped.analysis.defect_from_structures(bulk_supercell, defect_supercell, return_all_info=False, bulk_voronoi_node_dict=None, oxi_state=None)[source]
Auto-determines the defect type and defect site from the supplied bulk and defect structures, and returns a corresponding
Defectobject.If
return_all_infois set to true, then also returns:relaxed defect site in the defect supercell
the defect site in the bulk supercell
defect site index in the defect supercell
bulk site index (index of defect site in bulk supercell)
guessed initial defect structure (before relaxation)
‘unrelaxed defect structure’ (also before relaxation, but with interstitials at their final relaxed positions, and all bulk atoms at their unrelaxed positions).
- Parameters:
bulk_supercell (Structure) – Bulk supercell structure.
defect_supercell (Structure) – Defect structure to use for identifying the defect site and type.
return_all_info (bool) – If True, returns additional python objects related to the site-matching, listed above. (Default = False)
bulk_voronoi_node_dict (dict) – Dictionary of bulk supercell Voronoi node information, for expedited site-matching. If None, will be re-calculated.
oxi_state (int, float, str) – Oxidation state of the defect site. If not provided, will be automatically determined from the defect structure.
- Returns:
doped Defect object.
If
return_all_infois True, then also:- defect_site (Site):
pymatgen Site object of the relaxed defect site in the defect supercell.
- defect_site_in_bulk (Site):
pymatgen Site object of the defect site in the bulk supercell (i.e. unrelaxed vacancy/substitution site, or final relaxed interstitial site for interstitials).
- defect_site_index (int):
index of defect site in defect supercell (None for vacancies)
- bulk_site_index (int):
index of defect site in bulk supercell (None for interstitials)
- guessed_initial_defect_structure (Structure):
pymatgen Structure object of the guessed initial defect structure.
- unrelaxed_defect_structure (Structure):
pymatgen Structure object of the unrelaxed defect structure.
- bulk_voronoi_node_dict (dict):
Dictionary of bulk supercell Voronoi node information, for further expedited site-matching.
- Return type:
defect (Defect)
- doped.analysis.defect_name_from_structures(bulk_structure, defect_structure)[source]
Get the doped/SnB defect name using the bulk and defect structures.
- Parameters:
bulk_structure (Structure) – Bulk (pristine) structure.
defect_structure (Structure) – Defect structure.
- Returns:
Defect name.
- Return type:
str