doped.analysis module

Code to analyse VASP defect calculations.

These functions are built from a combination of useful modules from pymatgen, alongside substantial modification, in the efforts of making an efficient, user-friendly package for managing and analysing defect calculations, with publication-quality outputs.

class doped.analysis.DefectParser(defect_entry: DefectEntry, defect_vr: Vasprun | None = None, bulk_vr: Vasprun | None = None, error_tolerance: float = 0.05, parse_projected_eigen: bool | None = None, **kwargs)[source]

Bases: object

Create a DefectParser object, which has methods for parsing the results of defect supercell calculations.

Direct initialisation with DefectParser() is typically not recommended. Rather DefectParser.from_paths() or defect_entry_from_paths() are preferred.

Parameters:

defect_entry (DefectEntry) – doped DefectEntry
defect_vr (Vasprun) – pymatgen Vasprun object for the defect supercell calculation.
bulk_vr (Vasprun) – pymatgen Vasprun object for the reference bulk supercell calculation.
error_tolerance (float) – If the estimated error in the defect charge correction, based on the variance of the potential in the sampling region is greater than this value (in eV), then a warning is raised. Default is 0.05 eV.
parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & magnetization from the bulk and defect calculations (so DefectEntry.get_eigenvalue_analysis() can then be used with no further parsing, and magnetization values can be pulled for SOC / non-collinear magnetism calculations). Will initially try to load orbital projections from vasprun.xml(.gz) files (slightly slower but more accurate), or failing that from PROCAR(.gz) files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set to False if parsing speed is crucial. Default is None, which will attempt to load this data but with no warning if it fails (otherwise if True a warning will be printed).
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()), point_symmetry_from_defect_entry(), parse_symmetry_and_degeneracy_metadata or defect_and_info_from_structures, including bulk_locpot_dict, bulk_site_potentials, use_MP, mpid, api_key, oxi_state, multiplicity, angle_tolerance, attempt_periodicity_restoration, user_charges, initial_defect_structure_path etc (see their docstrings). Primarily used by DefectsParser to expedite parsing by avoiding reloading bulk data for each defect. Note that bulk_symprec can be supplied as the symprec value to use for determining equivalent sites (and thus defect multiplicities / unrelaxed site symmetries), while an input symprec value will be used for determining relaxed site symmetries.

apply_corrections()[source]: Get and apply defect corrections, and warn if likely to be inappropriate (based on error tolerances).

Parse the defect calculation outputs in defect_path and return the DefectParser object. By default, the DefectParser.defect_entry.name attribute (later used to label defects in plots) is set to the defect_path folder name (if it is a recognised defect name), else it is set to the default doped` name for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances).

Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).

Parameters:

defect_path (PathLike) – Path to defect supercell folder (containing at least vasprun.xml(.gz)).
bulk_path (PathLike) – Path to bulk supercell folder (containing at least vasprun.xml(.gz)). Not required if bulk_vr is provided.
bulk_vr (Vasprun) – pymatgen Vasprun object for the reference bulk supercell calculation, if already loaded (can be supplied to expedite parsing). Default is None.
bulk_procar (Procar) – pymatgen Procar object, for the reference bulk supercell calculation if already loaded (can be supplied to expedite parsing). Default is None.
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Total dielectric constant (ionic + static contributions), in the same xyz Cartesian basis as the supercell calculations (likely but not necessarily the same as the raw output of a VASP dielectric calculation, if an oddly-defined primitive cell is used). If not provided, charge corrections cannot be computed and so skip_corrections will be set to True. See the Dielectric Constant tutorial section for information on calculating and converging the dielectric constant.
charge_state (int) – Charge state of defect. If not provided, will be automatically determined from defect calculation outputs, or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).
skip_corrections (bool) – Whether to skip the calculation and application of finite-size charge corrections to the defect energy (not recommended in most cases). Default = False.
error_tolerance (float) – If the estimated error in the defect charge correction, based on the variance of the potential in the sampling region, is greater than this value (in eV), then a warning is raised. Default is 0.05 eV.
bulk_band_gap_vr (PathLike or Vasprun) –
Path to a vasprun.xml(.gz) file, or a pymatgen Vasprun object, from which to determine the bulk band gap and band edge positions. If the VBM/CBM occur at k-points which are not included in the bulk supercell calculation, then this parameter should be used to provide the output of a bulk bandstructure calculation so that these are correctly determined. Alternatively, you can edit the "band_gap" and "vbm" entries in self.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will use DefectEntry.calculation_metadata["bulk_path"] (i.e. the bulk supercell calculation output).

Note that the "band_gap" and "vbm" values should only affect the reference for the Fermi level values output by doped (as this VBM eigenvalue is used as the zero reference), thus affecting the position of the band edges in the defect formation energy plots and doping window / dopability limit functions, and the reference of the reported Fermi levels.
parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & magnetization from the bulk and defect calculations (so DefectEntry.get_eigenvalue_analysis() can then be used with no further parsing, and magnetization values can be pulled for SOC / non-collinear magnetism calculations). Will initially try to load orbital projections from vasprun.xml(.gz) files (slightly slower but more accurate), or failing that from PROCAR(.gz) files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set to False if parsing speed is crucial. Default is None, which will attempt to load this data but with no warning if it fails (otherwise if True a warning will be printed).
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()), point_symmetry_from_defect_entry(), parse_symmetry_and_degeneracy_metadata or defect_and_info_from_structures, including bulk_locpot_dict, bulk_site_potentials, use_MP, mpid, api_key, oxi_state, multiplicity, angle_tolerance, attempt_periodicity_restoration, user_charges, initial_defect_structure_path etc (see their docstrings). Primarily used by DefectsParser to expedite parsing by avoiding reloading bulk data for each defect. Note that bulk_symprec can be supplied as the symprec value to use for determining equivalent sites (and thus defect multiplicities / unrelaxed site symmetries), while an input symprec value will be used for determining relaxed site symmetries.

Returns:

DefectParser object.

load_FNV_data(bulk_locpot_dict: dict | None = None)[source]

Load metadata required for performing Freysoldt correction (i.e. LOCPOT planar-averaged potential dictionary).

Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP LOCPOT files to be present in these directories. Can read compressed “LOCPOT.gz” files. The bulk_locpot_dict can be supplied if already parsed, for expedited parsing of multiple defects.

Saves the bulk_locpot_dict and defect_locpot_dict dictionaries (containing the planar-averaged electrostatic potentials along each axis direction) to the DefectEntry.calculation_metadata dict, for use with DefectEntry.get_freysoldt_correction().

Parameters:: bulk_locpot_dict (dict) – Planar-averaged potential dictionary for bulk supercell, if already parsed. If None (default), will try to load from the LOCPOT(.gz) file in defect_entry.calculation_metadata["bulk_path"].
Returns:: bulk_locpot_dict for reuse in parsing other defect entries.

load_and_check_calculation_metadata()[source]: Pull metadata about the defect supercell calculations from the outputs, and check if the defect and bulk supercell calculations settings are compatible.

Load the "band_gap", "vbm" and "cbm" values for the parsed DefectEntrys.

If bulk_band_gap_vr is provided, then these values are parsed from it, else taken from the parsed bulk supercell calculation.

"band_gap" and "vbm" are used by default when generating DefectThermodynamics objects, to be used in plotting & analysis.

Alternatively, one can specify query the Materials Project (MP) database for the bulk gap data, using use_MP = True, in which case the MP entry with the lowest number ID and composition matching the bulk will be used, or the MP ID (mpid) of the bulk material to use can be specified. This is not recommended as it will correspond to a severely-underestimated GGA DFT bandgap!

Parameters:

bulk_band_gap_vr (PathLike or Vasprun) –
Path to a vasprun.xml(.gz) file, or a pymatgen Vasprun object, from which to determine the bulk band gap and band edge positions. If the VBM/CBM occur at k-points which are not included in the bulk supercell calculation, then this parameter should be used to provide the output of a bulk bandstructure calculation so that these are correctly determined. Alternatively, you can edit the "band_gap" and "vbm" entries in self.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will use DefectEntry.calculation_metadata["bulk_path"] (i.e. the bulk supercell calculation output).

Note that the "band_gap" and "vbm" values should only affect the reference for the Fermi level values output by doped (as this VBM eigenvalue is used as the zero reference), thus affecting the position of the band edges in the defect formation energy plots and doping window / dopability limit functions, and the reference of the reported Fermi levels.
use_MP (bool) – If True, will query the Materials Project database for the bulk gap data.
mpid (str) – If provided, will query the Materials Project database for the bulk gap data, using this Materials Project ID.
api_key (str) – Materials Project API key to access database.

load_eFNV_data(bulk_site_potentials: list | None = None)[source]

Load metadata required for performing Kumagai correction (i.e. atomic site potentials from the OUTCAR files).

Requires “bulk_path” and “defect_path” to be present in DefectEntry.calculation_metadata, and VASP OUTCAR files to be present in these directories. Can read compressed OUTCAR.gz files. The bulk_site_potentials can be supplied if already parsed, for expedited parsing of multiple defects.

Saves the bulk_site_potentials and defect_site_potentials lists (containing the atomic site electrostatic potentials, from -1*np.array(Outcar.electrostatic_potential)) to DefectEntry.calculation_metadata, for use with DefectEntry.get_kumagai_correction().

Parameters:: bulk_site_potentials (list) – Atomic site potentials for the bulk supercell, if already parsed. If None (default), will load from OUTCAR(.gz) file in defect_entry.calculation_metadata["bulk_path"].
Returns:: bulk_site_potentials to reuse in parsing other defect entries.

Bases: object

A class for rapidly parsing multiple VASP defect supercell calculations for a given host (bulk) material.

Loops over calculation directories in output_path (likely the same output_path used with DefectsSet for file generation in doped.vasp) and parses the defect calculations into a dictionary of: {defect_name: DefectEntry}, where the defect_name is set to the defect calculation folder name (if it is a recognised defect name), else it is set to the default doped name for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances). By default, searches for folders in output_path with subfolder containing vasprun.xml(.gz) files, and tries to parse them as DefectEntrys.

By default, tries multiprocessing to speed up defect parsing, which can be controlled with processes. If parsing hangs, this may be due to memory issues, in which case you should manually reduce processes (e.g. <=4).

Defect charge states are automatically determined from the defect calculation outputs if POTCARs are set up with pymatgen (see docs Installation page), or if that fails, using the defect folder name (must end in “_+X” or “_-X” where +/-X is the defect charge state).

Uses the (single) DefectParser class to parse the individual defect calculations. Note that the bulk and defect supercells should have the same definitions/basis sets (for site-matching and finite-size charge corrections to work appropriately).

Parameters:

output_path (PathLike) – Path to the output directory containing the defect calculation folders (likely the same output_path used with DefectsSet for file generation in doped.vasp). Default is current directory.
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Total dielectric constant (ionic + static contributions), in the same xyz Cartesian basis as the supercell calculations (likely but not necessarily the same as the raw output of a VASP dielectric calculation, if an oddly-defined primitive cell is used). If not provided, charge corrections cannot be computed and so skip_corrections will be set to True. See the Dielectric Constant tutorial section for information on calculating and converging the dielectric constant.
subfolder (PathLike) – Name of subfolder(s) within each defect calculation folder (in the output_path directory) containing the VASP calculation files to parse (e.g. vasp_ncl, vasp_std, vasp_gam etc.). If not specified, doped checks, case-insensitively and in order, for "vasp_ncl", "singlepoint", "final", "relax", "vasp_std", "vasp_nkred_std", "vasp_gam" subfolders (following _SUBFOLDER_PRIORITY) with calculation outputs (vasprun.xml(.gz) files), and uses the first matching subfolder name as subfolder, otherwise uses the defect calculation folder itself with no subfolder (set subfolder = "." to enforce this).
bulk_path (PathLike) – Path to bulk supercell reference calculation folder. If not specified, searches for folder with name “X_bulk” in the output_path directory (matching the default doped name for the bulk supercell reference folder). Can be the full path, or the relative path from the output_path directory.
skip_corrections (bool) – Whether to skip the calculation & application of finite-size charge corrections to the defect energies (not recommended in most cases). Default is False.
error_tolerance (float) – If the estimated error in any charge correction, based on the variance of the potential in the sampling region, is greater than this value (in eV), then a warning is raised. Default is 0.05 eV. Note that this warning is skipped for defects which are predicted to not be stable for any Fermi level in the band gap (based on all parsed defects here), or are predicted to be shallow (perturbed host) states according to eigenvalue analysis and only be stable for Fermi levels within a small window to a band edge (taken as the smaller of error_tolerance or 10% of the band gap, by default, or can be set by a shallow_charge_stability_tolerance = X keyword argument).
bulk_band_gap_vr (PathLike or Vasprun) –
Path to a vasprun.xml(.gz) file, or a pymatgen Vasprun object, from which to determine the bulk band gap and band edge positions. If the VBM/CBM occur at k-points which are not included in the bulk supercell calculation, then this parameter should be used to provide the output of a bulk bandstructure calculation so that these are correctly determined. Alternatively, you can edit the "band_gap" and "vbm" entries in self.defect_entry.calculation_metadata to match the correct (eigen)values. If None, will use DefectEntry.calculation_metadata["bulk_path"] (i.e. the bulk supercell calculation output).

Note that the "band_gap" and "vbm" values should only affect the reference for the Fermi level values output by doped (as this VBM eigenvalue is used as the zero reference), thus affecting the position of the band edges in the defect formation energy plots and doping window / dopability limit functions, and the reference of the reported Fermi levels.
processes (int) – Number of processes to use for multiprocessing for expedited parsing. If not set, defaults to one less than the number of CPUs available. Set to 1 for no multiprocessing.
json_filename (PathLike) – Filename to save the parsed defect entries dict (DefectsParser.defect_dict) to in output_path, to avoid having to re-parse defects when later analysing further and aiding calculation provenance. Can be reloaded using the loadfn function from monty.serialization (and then input to DefectThermodynamics etc.). If None (default), set as {Host Chemical Formula}_defect_dict.json.gz. If False, no json file is saved.
parse_projected_eigen (bool) – Whether to parse the projected eigenvalues & magnetization from the bulk and defect calculations (so DefectEntry.get_eigenvalue_analysis() can then be used with no further parsing, and magnetization values can be pulled for SOC / non-collinear magnetism calculations). Will initially try to load orbital projections from vasprun.xml(.gz) files (slightly slower but more accurate), or failing that from PROCAR(.gz) files if present in the bulk/defect directories. Parsing this data can increase total parsing time by anywhere from ~5-25%, so set to False if parsing speed is crucial. Default is None, which will attempt to load this data but with no warning if it fails (otherwise if True a warning will be printed).
**kwargs – Keyword arguments to pass to DefectParser() methods (load_FNV_data(), load_eFNV_data(), load_bulk_gap_data()), point_symmetry_from_defect_entry(), parse_symmetry_and_degeneracy_metadata or defect_and_info_from_structures or get_dimer_bonds(), including bulk_locpot_dict, bulk_site_potentials, use_MP, mpid, api_key, oxi_state, multiplicity, angle_tolerance, attempt_periodicity_restoration, user_charges, initial_defect_structure_path, rtol etc. (see their docstrings); or for controlling shallow defect charge correction error warnings (see error_tolerance description) with shallow_charge_stability_tolerance. Note that bulk_symprec can be supplied as the symprec value to use for determining equivalent sites (and thus defect multiplicities / unrelaxed site symmetries), while an input symprec value will be used for determining relaxed site symmetries.

defect_dict

Dictionary of parsed defect calculations in the format: {"defect_name": DefectEntry} where the defect_name is set to the defect calculation folder name (if it is a recognised defect name), else it is set to the default doped name for that defect (using the estimated unrelaxed defect structure, for the point group and neighbour distances).

Type:: dict

get_defect_thermodynamics(chempots: dict | None = None, el_refs: dict | None = None, vbm: float | None = None, band_gap: float | None = None, dist_tol: float = 1.5, check_compatibility: bool = True, bulk_dos: FermiDos | None = None, skip_dos_check: bool = False, **kwargs) → DefectThermodynamics[source]

Generates a DefectThermodynamics object from the parsed DefectEntry objects in self.defect_dict, which can then be used to analyse and plot the defect thermodynamics (formation energies, transition levels, concentrations etc).

Note that the DefectEntry.name attributes (rather than the defect_name key in the defect_dict) are used to label the defects in plots.

See the DefectThermodynamics and accompanying methods docstrings in doped.thermodynamics for more.

Parameters:

chempots (dict) –
Dictionary of chemical potentials to use for calculating the defect formation energies. This can have the form of {"limits": [{'limit': [chempot_dict]}]} (the format generated by doped's chemical potential parsing functions (see tutorials)) which allows easy analysis over a range of chemical potentials – where limit(s) (chemical potential limit(s)) to analyse/plot can later be chosen using the limits argument.

Alternatively this can be a dictionary of chemical potentials for a single limit, in the format: {element symbol: chemical potential}. If manually specifying chemical potentials this way, you can set the el_refs option with the (QM/DFT) reference energies of the elemental phases in order to show the formal (relative) chemical potentials above the formation energy plot, in which case it is the formal chemical potentials (i.e. relative to the elemental references) that should be given here, otherwise the absolute (QM/DFT) chemical potentials should be given.

If None (default), sets all chemical potentials to zero. Chemical potentials can also be supplied later in each analysis function. (Default: None)
el_refs (dict) –
Dictionary of elemental reference energies for the chemical potentials in the format: {element symbol: reference energy} (to determine the formal chemical potentials, when chempots has been manually specified as {element symbol: chemical potential}). Unnecessary if chempots is provided in format generated by doped (see tutorials).

If None (default), sets all elemental reference energies to zero. Reference energies can also be supplied later in each analysis function, or set using DefectThermodynamics.el_refs = ... (with the same input options).
vbm (float) – VBM eigenvalue to use as Fermi level reference point for analysis. If None (default), will use "vbm" from the calculation_metadata dict attributes of the parsed DefectEntry objects, which by default is taken from the bulk supercell VBM (unless bulk_band_gap_vr is set during parsing). Note that vbm should only affect the reference for the Fermi level values output by doped (as this VBM eigenvalue is used as the zero reference), thus affecting the position of the band edges in the defect formation energy plots and doping window / dopability limit functions, and the reference of the reported Fermi levels.
band_gap (float) – Band gap of the host, to use for analysis. If None (default), will use “band_gap” from the calculation_metadata dict attributes of the parsed DefectEntry objects.
dist_tol (float) – Threshold for the closest distance (in Å) between equivalent defect sites, for different species of the same defect type, to be grouped together (for plotting, transition level analysis and defect concentration calculations). For the most part, if the minimum distance between equivalent defect sites is less than dist_tol, then they will be grouped together, otherwise treated as separate defects. See plot() and get_fermi_level_and_concentrations() docstrings for more information. (Default: 1.5)
check_compatibility (bool) – Whether to check the compatibility of the bulk entry for each defect entry (i.e. that all reference bulk energies are the same). (Default: True)
bulk_dos (FermiDos or Vasprun or PathLike) –
pymatgen FermiDos for the bulk electronic density of states (DOS), for calculating Fermi level positions and defect/carrier concentrations. Alternatively, can be a pymatgen Vasprun object or path to the vasprun.xml(.gz) output of a bulk DOS calculation in VASP. Can also be provided later when using get_equilibrium_fermi_level(), get_fermi_level_and_concentrations etc, or set using DefectThermodynamics.bulk_dos = ... (with the same input options).

Usually this is a static calculation with the primitive cell of the bulk material, with relatively dense k-point sampling (especially for materials with disperse band edges) to ensure an accurately-converged DOS and thus Fermi level. Using large NEDOS (>3000) and ISMEAR = -5 (tetrahedron smearing) are recommended for best convergence (wrt k-point sampling) in VASP. Consistent functional settings should be used for the bulk DOS and defect supercell calculations. See the Density of States (DOS) Calculations tips. (Default: None)
skip_dos_check (bool) – Whether to skip the warning about the DOS VBM differing from the defect entries VBM by >0.05 eV. Should only be used when the reason for this difference is known/acceptable. (Default: False)
**kwargs – Additional keyword arguments to pass to the DefectThermodynamics constructor.

Returns:

doped DefectThermodynamics object

doped.analysis.check_and_set_defect_entry_name(defect_entry: DefectEntry, possible_defect_name: str = '') → None[source]

Check that possible_defect_name is a recognised format by doped (i.e. in the format "{defect_name}_{optional_site_info}_{charge_state}").

If the DefectEntry.name attribute is not defined or does not end with charge state, then the entry will be renamed with the doped default name for the unrelaxed defect (i.e. using the point symmetry of the defect site in the bulk cell).

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
possible_defect_name (str) – Possible defect name (usually the folder name) to check if recognised by doped, otherwise defect name is re-determined.

doped.analysis.defect_and_info_from_structures(defect_supercell: Structure, bulk_supercell: Structure, skip_atom_mapping_check: bool = False, initial_defect_structure_path: str | Path | None = None, _parameter_order_warn: bool = True, **kwargs) → tuple[Defect, PeriodicSite, dict][source]

Generates a corresponding Defect object from the supplied bulk and defect supercells (using defect_from_structures), and returns the Defect object, the relaxed defect site in the defect supercell, and a dictionary of calculation metadata (including the defect site in the bulk supercell, defect site indices in the defect and bulk supercells, the guessed initial defect structure, and the unrelaxed defect structure).

Note that this assumes consistent cell definitions (lattice vectors and bases) for the input defect and bulk supercells, and does not perform any structural re-orientations.

Parameters:

defect_supercell (Structure) – Defect structure to use for identifying the defect site and type.
bulk_supercell (Structure) – Bulk supercell structure.
skip_atom_mapping_check (bool) – If True, skips the atom mapping check which ensures that the bulk and defect supercell lattice definitions are matched (important for accurate defect site determination and charge corrections). Can be used to speed up parsing when you are sure the cell definitions match (e.g. both supercells were generated with doped). Default is False.
initial_defect_structure_path (PathLike) – Path to the initial/unrelaxed defect structure. Only recommended for use if structure matching with the relaxed defect structure(s) fails (rare). Default is None.
**kwargs – Keyword arguments to pass to get_equiv_frac_coords_in_primitive (such as symprec, dist_tol_factor, fixed_symprec_and_dist_tol_factor, verbose) and/or Defect initialization (such as oxi_state, multiplicity, symprec, dist_tol_factor). Mainly intended for cases where fast site matching and Defect creation are desired (e.g. when analysing MD trajectories of defects), where providing these parameters can greatly speed up parsing. Setting oxi_state='N/A' and multiplicity=1 will skip their auto-determination and accelerate parsing, if these properties are not required.

Returns:

defect (Defect):

doped Defect object, defined in the primitive structure.

defect_site (PeriodicSite):

pymatgen PeriodicSite object of the relaxed defect site in the defect supercell.

defect_structure_metadata (dict):

Dictionary containing metadata about the defect structure, including:

guessed_initial_defect_structure: The guessed initial defect structure (before relaxation).
guessed_defect_displacement: Displacement from the guessed initial defect site to the final relaxed site (None for vacancies).
defect_site_index: Index of the defect site in the defect supercell (None for vacancies).
bulk_site_index: Index of the defect site in the bulk supercell (None for interstitials).
unrelaxed_defect_structure: The unrelaxed defect structure (similar to guessed_initial_defect_structure, but with interstitials at their final relaxed positions, and all bulk atoms at their unrelaxed positions).
bulk_site: The defect site in the bulk supercell (i.e. unrelaxed vacancy/substitution site, or final relaxed site for interstitials).

Return type:

tuple[Defect, PeriodicSite, dict]

doped.analysis.defect_from_structures(defect_supercell: Structure, bulk_supercell: Structure, return_all_info: bool = False, skip_atom_mapping_check: bool = False, _parameter_order_warn: bool = True, **kwargs) → Defect | tuple[Defect, PeriodicSite, PeriodicSite, int | None, int | None, Structure, Structure][source]

Auto-determines the defect type and defect site from the supplied bulk and defect structures, and returns a corresponding Defect object with the defect site in the primitive structure.

Note that this assumes consistent cell definitions (lattice vectors and bases) for the input defect and bulk supercells, and does not perform any structural re-orientations.

If return_all_info is set to true, then also returns:

relaxed defect site in the defect supercell
the defect site in the bulk supercell
defect site index in the defect supercell
bulk site index (index of defect site in bulk supercell)
guessed initial defect structure (before relaxation)
‘unrelaxed defect structure’ (also before relaxation, but with interstitials at their final relaxed positions, and all bulk atoms at their unrelaxed positions).

Parameters:

defect_supercell (Structure) – Defect structure to use for identifying the defect site and type.
bulk_supercell (Structure) – Bulk supercell structure.
return_all_info (bool) – If True, returns additional info related to the site-matching; see return signature. (Default: False)
skip_atom_mapping_check (bool) – If True, skips the atom mapping check which ensures that the bulk and defect supercell lattice definitions are matched (important for accurate defect site determination and charge corrections). Can be used to speed up parsing when you are sure the cell definitions match (e.g. both supercells were generated with doped). Default is False.
**kwargs – Keyword arguments to pass to get_equiv_frac_coords_in_primitive (such as symprec, dist_tol_factor, fixed_symprec_and_dist_tol_factor, verbose) and/or Defect initialization (such as oxi_state, multiplicity, symprec, dist_tol_factor). Mainly intended for cases where fast site matching and Defect creation are desired (e.g. when analysing MD trajectories of defects), where providing these parameters can greatly speed up parsing. Setting oxi_state='N/A' and multiplicity=1 will skip their auto-determination and accelerate parsing, if these properties are not required.

Returns:

doped Defect object, defined in the primitive structure.

If return_all_info is True, then also returns:

defect_site (PeriodicSite):: pymatgen PeriodicSite object of the relaxed defect site in the defect supercell.
defect_site_in_bulk (PeriodicSite):: pymatgen PeriodicSite object of the defect site in the bulk supercell (i.e. unrelaxed vacancy/substitution site, or final relaxed interstitial site for interstitials).
defect_site_index (int):: Index of defect site in defect supercell (None for vacancies)
bulk_site_index (int):: Index of defect site in bulk supercell (None for interstitials)
guessed_initial_defect_structure (Structure):: pymatgen Structure object of the guessed initial defect structure.
unrelaxed_defect_structure (Structure):: pymatgen Structure object of the unrelaxed defect structure.

Return type:

defect (Defect)

doped.analysis.defect_name_from_structures(defect_supercell: Structure, bulk_supercell: Structure, _parameter_order_warn: bool = True, **kwargs) → str[source]

Get the doped/SnB defect name using the bulk and defect structures.

Parameters:

defect_supercell (Structure) – Defect structure.
bulk_supercell (Structure) – Bulk (pristine) structure.
**kwargs – Keyword arguments to pass to defect_from_structures (such as oxi_state, multiplicity, symprec, dist_tol_factor, fixed_symprec_and_dist_tol_factor, verbose).

Returns:

Defect name.

Return type:

str

doped.analysis.defect_site_from_structures(defect_supercell: Structure, bulk_supercell: Structure, return_all_info: bool = False, _parameter_order_warn: bool = True) → PeriodicSite | tuple[PeriodicSite, str, PeriodicSite, int | None, int | None, Structure][source]

Auto-determines the defect site from the supplied bulk and defect structures, returning the corresponding PeriodicSite.

Note that this assumes consistent cell definitions (lattice vectors and bases) for the input defect and bulk supercells, and does not perform any structural re-orientations.

Parameters:

defect_supercell (Structure) – Defect structure to use for identifying the defect site.
bulk_supercell (Structure) – Bulk supercell structure.
return_all_info (bool) – If True, returns additional info related to the site-matching; see return signature. (Default: False)

Returns:

pymatgen PeriodicSite object for the relaxed defect site: in the defect supercell.

If return_all_info is True, then also returns:

defect_type (str):: The type of defect as a string (interstitial, vacancy or substitution).
defect_site_in_bulk (PeriodicSite):: pymatgen PeriodicSite object of the defect site in the bulk supercell (i.e. unrelaxed vacancy/substitution site, or final relaxed interstitial site for interstitials).
defect_site_index (int):: Index of defect site in defect supercell (None for vacancies)
bulk_site_index (int):: Index of defect site in bulk supercell (None for interstitials)
unrelaxed_defect_structure (Structure):: pymatgen Structure object of the unrelaxed defect structure.

Return type:

defect_site (PeriodicSite)

doped.analysis.guess_defect_position(defect_supercell: Structure, bulk_supercell: Structure | None = None, soap_n_jobs: int = 1, soap_r_cut: float = 5.0, soap_n_max: int = 6, soap_l_max: int = 4) → ndarray[source]

Guess the position (in Cartesian coordinates) of a defect in an input defect supercell, optionally using a bulk/reference supercell (but not required!).

This is achieved by computing cosine dissimilarities between site SOAP vectors and a reference, and then determining the centre of mass of the squared cosine dissimilarities.

If no bulk_supercell is provided (default), each site’s SOAP vector is compared to the mean SOAP vector of all sites of the same species in defect_supercell. If a bulk_supercell is provided, each defect supercell site’s SOAP vector is instead compared to the SOAP vector of its nearest site (by Cartesian distance, accounting for periodic boundary conditions) in the bulk supercell, which typically gives a stronger signal around the defect site. This assumes the defect and bulk supercells share the same lattice and are in the same origin frame (as is the case for supercells generated by doped).

For accurate defect site determination, the defect_from_structures function (or underlying code) is preferred. These coordinates are unlikely to directly match the defect position (especially in the presence of random noise), but should provide a pretty good estimate in most cases. If the defect is an extrinsic interstitial / substitution, then this will identify the exact defect site.

Performance: Creating SOAP descriptors (via dscribe) is usually the bottleneck. You can: (1) set soap_n_jobs > 1 to parallelise over site-centres; (2) tune soap_l_max / soap_n_max / soap_r_cut as needed. Default hyperparameters are a compact real-species dscribe SOAP (n_max=6, l_max=4).

Parameters:

defect_supercell (Structure) – Defect supercell structure.
bulk_supercell (Structure | None) – Optional bulk (pristine) reference supercell. When provided, site cosine dissimilarities are computed relative to the nearest matching bulk-supercell site (rather than the per-species mean in the defect supercell). Assumes defect_supercell and bulk_supercell share the same lattice/origin alignment. Default is None.
soap_n_jobs (int) – n_jobs passed to dscribe’s create() (parallelise over site centres). Default is 1 (no parallelisation).
soap_r_cut (float) – SOAP cut-off radius in Å (for dscribe), default 5.0.
soap_n_max (int) – SOAP radial basis size (for dscribe), default 6.
soap_l_max (int) – SOAP maximum angular momentum (for dscribe), default 4.

Returns:

Guessed position of the defect in Cartesian coordinates.

Return type:

np.ndarray

doped.analysis.parse_symmetry_and_degeneracy_metadata(defect_entry: DefectEntry, **kwargs)[source]

Determine the unrelaxed (‘bulk’) and relaxed defect point symmetries for the input DefectEntry, whether there is any periodicity-breaking in the supercell, and the corresponding orientational degeneracy factor.

If the supercell is detected to break the crystal periodicity, and attempt_periodicity_restoration is True (default), then periodicity will be attempted to be restored by stenciling the relaxed defect geometry into a supercell which retains periodicity, and then determining the point symmetry for that.

Results are stored in the calculation_metadata and degeneracy_factors property dicts of the DefectEntry.

Parameters:

defect_entry (DefectEntry) – The DefectEntry object to parse the symmetry and degeneracy metadata for. Parsed results are stored in the calculation_metadata and degeneracy_factors property dicts of the DefectEntry.
**kwargs – Additional keyword arguments to pass to the point_symmetry_from_defect_entry function, such as symprec, dist_tol_factor, fixed_symprec_and_dist_tol_factor, verbose and bulk_symprec. Also includes attempt_periodicity_restoration, which if True (default), will attempt to restore periodicity for periodicity-breaking defect supercells (mostly an edge case) by attempting to stencil the relaxed defect geometry into a supercell which retains periodicity, and then getting the point symmetry for that.

doped.analysis.shallow_dopant_binding_energy(eff_mass: float, dielectric: float | ndarray | list)[source]

Estimate the binding energy of a shallow dopant /defect in a semiconductor, using effective mass theory.

Discussion in the Perturbed Host States (Shallow Defects) tips section.

For delocalised, shallow states (a.k.a. perturbed host states), the hydrogenic effective mass model typically gives quite a good estimate of the binding energy, at least for dispersive 3D semiconductors.

Note that this formula can also be used to estimate the binding energy of a delocalised (Wannier-Mott) exciton, in which case the reduced effective mass of the electron-hole pair should be used, as:

\[μ_reduced = (m_e * m_h) / (m_e + m_h)\]

Parameters:

eff_mass (float) – Effective mass of the dopant.
dielectric (float or int or 3x1 matrix or 3x3 matrix) – Total dielectric constant (ionic + static contributions) of the semiconductor host.

Returns:

Binding energy of the shallow dopant, in eV.

Return type:

float