doped.utils package

Submodules

doped.utils.configurations module

Utility functions for generating and parsing configurational coordinate (CC) diagrams, for potential energy surfaces (PESs), Nudged Elastic Band (NEB), non- radiative recombination calculations etc.

doped.utils.configurations.apply_s2_to_s1_transformation(struct1: Structure, struct2: Structure, supercell_matrix: ndarray, trans_vector: ndarray, mapping: list[int | None], include_ignored_species: bool = True, ignored_species: list[str] | None = None, new_lattice: str | None = None) → Structure[source]

Apply a transformation (e.g. as determined by get_transformation_from_s2_to_s1) from struct2 to struct1, with a given supercell matrix, translation vector, and site mapping.

This will give a fully symmetry-equivalent orientation (i.e. will not change the actual geometry) of struct2, except if struct1 and struct2 have different inequivalent lattices (e.g. different space groups) and new_lattice is explicitly set to "struct1". This function uses an accelerated version of the get_s2_like_s1() method, extended to ensure the correct atomic indices matching and lattice vector definitions, as well as allowing for cases where mapping does not include all sites in struct2 (e.g. when using a subset of sites to do matching and determine the transformation matrix and translation vector, as in the stenciling workflow, without needing the ordering of sites in the Structure objects to match).

Templated from the pymatgen StructureMatcher class, to allow direct usage without repeating the expensive get_transformation call (e.g. when applying the same transformation to the bulk and defect supercells in defect stenciling).

Parameters:

struct1 (Structure) – Reference structure.
struct2 (Structure) – Structure to transform to be as similar as possible to struct1.
supercell_matrix (np.ndarray) – Supercell matrix for the transformation.
trans_vector (np.ndarray) – Fractional translation vector for the transformation.
mapping (list[int | None]) – Mapping of the sites in struct2 to the sites in struct1. The first len(struct1) items of the mapping vector are the indices of struct1’s corresponding sites in struct2 (or None if there is no corresponding site), and the other items are the remaining site indices of struct2.
include_ignored_species (bool) – Whether to include ignored species / sites not in mapping in the output structure. Default: True
ignored_species (list[str] | None) – List of species to ignore in struct1 (for mapping), should match that used for get_transformation_from_s2_to_s1 (if used to generate the transformation mapping). Default: None
new_lattice (str | None) – If "struct1", then the lattice of struct1 is used for the re-oriented structure, if "struct2", then the lattice of struct2 is used, or if "s2_like_s1", then the output lattice of StructureMatcher.get_s2_like_s1 (a symmetry-equivalent version of struct2.lattice) is used. Default is None, where new_lattice is set to "struct1" if struct1 and struct2 have equivalent lattices (expected to be the case for defect NEBs/CC diagrams) and using the struct1 lattice returns a fully symmetry-equivalent structure, or "s2_like_s1" otherwise. If new_lattice is explicitly set to "struct1" and this causes an inequivalent structure to be returned, a warning will be raised.

Returns:

struct2 transformed to struct1 as closely as possible.

Return type:

Structure

doped.utils.configurations.get_dQ(struct1: Structure, struct2: Structure, ignored_species: list[str] | None = None, reorient: bool = False, **sm_kwargs) → float[source]

Get the mass-weighted displacement (ΔQ in amu^(1/2)Å) between two structures, assuming matched atomic indices (unless reorient is set to True).

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
ignored_species (list[str] | None) – List of species to ignore when computing ΔQ (and re-orienting, if relevant). Default: None
reorient (bool) – If True, first re-orient struct2 to match struct1 using orient_s2_like_s1() (with ignored_species forwarded), then compute ΔQ. Useful when input structures are symmetry- equivalent but have mismatched orientations / site indices. Default: False
**sm_kwargs – Additional keyword arguments to forward to orient_s2_like_s1() (and hence StructureMatcher / StructureMatcher_scan_stol()), if/when re-orientation is performed.

Returns:

The mass-weighted displacement (ΔQ in amu^(1/2)Å) between the two structures. Returns np.inf if the structures are not matching.

Return type:

float

doped.utils.configurations.get_path_structures(struct1: Structure, struct2: Structure, n_images: int | ndarray | list[float] = 7, displacements: ndarray | list[float] | None = None, displacements2: ndarray | list[float] | None = None, reorient: bool | None = None, verbose: bool = False, **sm_kwargs) → dict[str, Structure] | tuple[dict[str, Structure], dict[str, Structure]][source]

Generate a series of interpolated structures along the linear path between struct1 and struct2, typically for use in NEB calculations or configuration coordinate (CC) diagrams.

Structures are output as a dictionary with keys corresponding to either the index of the interpolated structure (0-indexed; 00, 01 etc as for VASP NEB calculations) or the fractional displacement along the interpolation path between structures, and values corresponding to the interpolated structure. If displacements is set (and thus two sets of structures are generated), a tuple of such dictionaries is returned.

Note that for NEB calculations, the lattice vectors and order of sites (atomic indices) must be consistent in both struct1 and struct2. This is also desirable for CC diagrams, as the atomic indices are assumed to match for many parsing and plotting functions (e.g. in nonrad and CarrierCapture.jl), but is not strictly necessary – though is typically required for appropriate structure interpolation. By default (reorient=None), this function uses orient_s2_like_s1() to (attempt to) re-orient struct2 to match the lattice vectors and site ordering of struct1 as closely as possible, and warns if this re-orientation was actually required (i.e. if the input structures did not already correspond to the shortest linear interpolation path between them). In the case of NEB for defect migration between symmetry-equivalent sites, this is not desired, and so re-orientation will be skipped if it results in a near-zero final mass-weighted displacement (ΔQ) between the structures (<0.1 amu^(1/2)Å), reorient is None (default), and displacements is None (i.e. assuming an NEB / PES calculation). Otherwise set reorient explicitly to True/False to control this behaviour. See the doped configuration coordinate / NEB path generation tutorial for further discussion.

If only n_images is set (and displacements is None)(default), then only one set of interpolated structures is generated (in other words, assuming a standard NEB/PES calculation is being performed). If displacements (and possibly displacements2) is set, then two sets of interpolated structures are generated (in other words, assuming a CC / non-radiative recombination calculation is being performed, where the two sets of structures are to be calculated in separate charge/spin etc states).

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
n_images (int) – Number of images to interpolate between struct1 and struct2, or a list of fractional interpolation values (displacements) to use. Note that n_images is ignored if displacements is set (in which case CC / non-radiative recombination calculations are assumed – generating two sets of interpolated structures – otherwise a standard NEB / PES calculation is assumed – generating one set of structures). Default: 7
displacements (np.ndarray or list) – Displacements to use for struct1 along the linear transformation path to struct2. If set, then CC / non-radiative recombination calculations are assumed, and two sets of interpolated structures will be generated. If set and displacements2 is not set, then the same set of displacements is used for both sets of interpolated structures. Default: None
displacements2 (np.ndarray or list) – Displacements to use for struct2 along the linear transformation path to struct1. If not set and displacements is not None, then the same set of displacements is used for both sets of interpolated structures. Default: None
reorient (bool | None) –
Controls whether to automatically re-orient struct2 to match struct1 (using orient_s2_like_s1()) before generating the interpolated path structures, which ensures matched lattice vectors / atomic indices and the shortest linear interpolation path between the endpoints. One of:
- True: always re-orient, no warnings.
- False: never re-orient (use struct2 as provided), no warnings.
- None (default): re-orient struct2 and warn if re-orientation was actually necessary (i.e. if the mass-weighted displacement ΔQ decreased as a result). In NEB mode (displacements=None), if re-orientation reduces ΔQ below 0.1 amu^(1/2)Å this is assumed to be an NEB between symmetry-equivalent sites (where re-orientation is not desired); thus re-orientation is skipped and a different warning is raised.
verbose (bool) – If True and re-orientation is performed, orient_s2_like_s1() prints information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between struct1 and struct2 (pre and post re-orientation). Default: False
**sm_kwargs – Additional keyword arguments to forward to orient_s2_like_s1() (and hence StructureMatcher / StructureMatcher_scan_stol()), when re-orientation is performed.

Returns:

Dictionary of structures (for NEB/PES calculations), or tuple of two dictionaries of structures (for CC / non-radiative calculations, when displacements is not None).

Return type:

dict[str, Structure] | tuple[dict[str, Structure], dict[str, Structure]]

doped.utils.configurations.get_s2_like_s1(struct1: Structure, struct2: Structure, new_lattice: str | None = None, verbose: bool = False, check_mapping: bool = True, **sm_kwargs) → Structure

Re-orient struct2 to match the orientation of struct1 as closely as possible , with matching atomic indices as needed for VASP NEB calculations and other structural transformation analyses (e.g. configuration coordinate (CC) diagrams via nonrad, CarrierCapture.jl etc.).

This will give a fully symmetry-equivalent orientation (i.e. will not change the actual geometry) of struct2, except if struct1 and struct2 have different inequivalent lattices (e.g. different space groups) and new_lattice is explicitly set to "struct1".

This corresponds to minimising the root-mean-square displacement for the shortest linear path from struct1 to a symmetry-equivalent definition of struct2, with matched atomic indices and lattices as required by VASP NEB and nonrad functions. This function uses an accelerated version of the get_s2_like_s1() method, extended to ensure the correct atomic indices matching and lattice vector definitions.

If verbose=True, information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between the input and re-oriented structures is printed. This is the typical x-axis unit in configurational coordinate diagrams (see e.g. 10.1103/PhysRevB.90.075202).

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
new_lattice (str | None) – If "struct1", then the lattice of struct1 is used for the re-oriented structure, if "struct2", then the lattice of struct2 is used, or if "s2_like_s1", then the output lattice of StructureMatcher.get_s2_like_s1 (a symmetry-equivalent version of struct2.lattice) is used. Default is None, where new_lattice is set to "struct1" if struct1 and struct2 have equivalent lattices (expected to be the case for defect NEBs/CC diagrams) and using the struct1 lattice returns a fully symmetry-equivalent structure, or "s2_like_s1" otherwise. If new_lattice is explicitly set to "struct1" and this causes an inequivalent structure to be returned, a warning will be raised.
verbose (bool) – Print information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between the input and re-oriented structures. Default: False
check_mapping (bool) – If True (default), check the atom mapping between struct1 and the re-oriented struct2 (using check_atom_mapping_far_from_defect from doped.utils.parsing), warning if a significant mismatch remains throughout the cell after re-orientation. This typically indicates a mismatch in the lattice definitions (e.g. different tiling of primitive cells within identical supercell lattice vectors) between the two input structures, which cannot be resolved by reorientation alone. Set to False to skip the check.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher() / StructureMatcher_scan_stol() (e.g. ignored_species, comparator, max_stol, min_stol etc).

Returns:

struct2 re-oriented to match struct1 as closely as possible.

Return type:

Structure

doped.utils.configurations.get_transformation_from_s2_to_s1(struct1: Structure, struct2: Structure, **sm_kwargs) → tuple[ndarray, ndarray, list[int | None]][source]

Get the supercell transformation, fractional translation vector, and a mapping to transform struct2 to be similar to struct1.

Copied over from the pymatgen StructureMatcher class, to allow usage with the fast StructureMatcher_scan_stol() function from doped, along with caching to reduce redundancy; e.g. when looping over multiple defects for stenciling etc.

Parameters:

struct1 (Structure) – Reference structure
struct2 (Structure) – Structure to transform to be as similar as possible to struct1.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher() (e.g. ignored_species, comparator etc).

Returns:

(supercell_matrix, trans_vector, mapping) — the supercell transformation, fractional translation, and site mapping produced by StructureMatcher.get_transformation (in that order).

supercell_matrix: shape (3, 3) — supercell matrix for the transformation.
trans_vector: shape (3,) — fractional translation vector for the transformation.
mapping: Mapping of the sites in struct2 to the sites in struct1. The first len(struct1) items of the mapping vector are the indices of struct1’s corresponding sites in struct2 (or None if there is no corresponding site), and the other items are the remaining site indices of struct2.

Return type:

tuple[np.ndarray, np.ndarray, list[int | None]]

doped.utils.configurations.orient_s2_like_s1(struct1: Structure, struct2: Structure, new_lattice: str | None = None, verbose: bool = False, check_mapping: bool = True, **sm_kwargs) → Structure[source]

Re-orient struct2 to match the orientation of struct1 as closely as possible , with matching atomic indices as needed for VASP NEB calculations and other structural transformation analyses (e.g. configuration coordinate (CC) diagrams via nonrad, CarrierCapture.jl etc.).

This will give a fully symmetry-equivalent orientation (i.e. will not change the actual geometry) of struct2, except if struct1 and struct2 have different inequivalent lattices (e.g. different space groups) and new_lattice is explicitly set to "struct1".

This corresponds to minimising the root-mean-square displacement for the shortest linear path from struct1 to a symmetry-equivalent definition of struct2, with matched atomic indices and lattices as required by VASP NEB and nonrad functions. This function uses an accelerated version of the get_s2_like_s1() method, extended to ensure the correct atomic indices matching and lattice vector definitions.

If verbose=True, information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between the input and re-oriented structures is printed. This is the typical x-axis unit in configurational coordinate diagrams (see e.g. 10.1103/PhysRevB.90.075202).

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
new_lattice (str | None) – If "struct1", then the lattice of struct1 is used for the re-oriented structure, if "struct2", then the lattice of struct2 is used, or if "s2_like_s1", then the output lattice of StructureMatcher.get_s2_like_s1 (a symmetry-equivalent version of struct2.lattice) is used. Default is None, where new_lattice is set to "struct1" if struct1 and struct2 have equivalent lattices (expected to be the case for defect NEBs/CC diagrams) and using the struct1 lattice returns a fully symmetry-equivalent structure, or "s2_like_s1" otherwise. If new_lattice is explicitly set to "struct1" and this causes an inequivalent structure to be returned, a warning will be raised.
verbose (bool) – Print information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between the input and re-oriented structures. Default: False
check_mapping (bool) – If True (default), check the atom mapping between struct1 and the re-oriented struct2 (using check_atom_mapping_far_from_defect from doped.utils.parsing), warning if a significant mismatch remains throughout the cell after re-orientation. This typically indicates a mismatch in the lattice definitions (e.g. different tiling of primitive cells within identical supercell lattice vectors) between the two input structures, which cannot be resolved by reorientation alone. Set to False to skip the check.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher() / StructureMatcher_scan_stol() (e.g. ignored_species, comparator, max_stol, min_stol etc).

Returns:

struct2 re-oriented to match struct1 as closely as possible.

Return type:

Structure

Generate a series of interpolated structures along the linear path between struct1 and struct2, typically for use in NEB calculations or configuration coordinate (CC) diagrams, and write to folders.

Folder names are labelled by the index of the interpolated structure (0-indexed; 00, 01 etc as for VASP NEB calculations) or the fractional displacement along the interpolation path between structures (e.g. delQ_0.0, delQ_0.1, delQ_-0.1 etc), depending on the input n_images/displacements settings.

Note that for NEB calculations, the lattice vectors and order of sites (atomic indices) must be consistent in both struct1 and struct2. This is also desirable for CC diagrams, as the atomic indices are assumed to match for many parsing and plotting functions (e.g. in nonrad and CarrierCapture.jl), but is not strictly necessary – though is typically required for appropriate structure interpolation. By default (reorient=None), this function uses orient_s2_like_s1() to (attempt to) re-orient struct2 to match the lattice vectors and site ordering of struct1 as closely as possible, and warns if this re-orientation was actually required (i.e. if the input structures did not already correspond to the shortest linear interpolation path between them). In the case of NEB for defect migration between symmetry-equivalent sites, this is not desired, and so re-orientation will be skipped if it results in a near-zero final mass-weighted displacement (ΔQ) between the structures (<0.1 amu^(1/2)Å), reorient is None (default), and displacements is None (i.e. assuming an NEB / PES calculation). Otherwise set reorient explicitly to True/False to control this behaviour. See the doped configuration coordinate / NEB path generation tutorial for further discussion.

If only n_images is set (and displacements is None)(default), then only one set of interpolated structures is written (in other words, assuming a standard NEB/PES calculation is being performed). If displacements (and possibly displacements2) is set, then two sets of interpolated structures are written (in other words, assuming a CC / non-radiative recombination calculation is being performed, where the two sets of structures are to be calculated in separate charge/spin etc states).

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
output_dir (PathLike) – Directory to write the interpolated structures to. Defaults to “Configuration_Coordinate” if displacements is set, otherwise “NEB”.
n_images (int) – Number of images to interpolate between struct1 and struct2, or a list of fractional interpolation values (displacements) to use. Note that n_images is ignored if displacements is set (in which case CC / non-radiative recombination calculations are assumed – generating two sets of interpolated structures – otherwise a standard NEB / PES calculation is assumed – generating one set of structures). Default: 7
displacements (np.ndarray or list) – Displacements to use for struct1 along the linear transformation path to struct2. If set, then CC / non-radiative recombination calculations are assumed, and two sets of interpolated structures will be written to file. If set and displacements2 is not set, then the same set of displacements is used for both sets of interpolated structures. Default: None
displacements2 (np.ndarray or list) – Displacements to use for struct2 along the linear transformation path to struct1. If not set and displacements is not None, then the same set of displacements is used for both sets of interpolated structures. Default: None
reorient (bool | None) –
Controls whether to automatically re-orient struct2 to match struct1 (using orient_s2_like_s1()) before generating the interpolated path structures, which ensures matched lattice vectors / atomic indices and the shortest linear interpolation path between the endpoints. One of:
- True: always re-orient, no warnings.
- False: never re-orient (use struct2 as provided), no warnings.
- None (default): re-orient struct2 and warn if re-orientation was actually necessary (i.e. if the mass-weighted displacement ΔQ decreased as a result). In NEB mode (displacements=None), if re-orientation reduces ΔQ below 0.1 amu^(1/2)Å this is assumed to be an NEB between symmetry-equivalent sites (where re-orientation is not desired); thus re-orientation is skipped and a different warning is raised.
verbose (bool) – If True and re-orientation is performed, orient_s2_like_s1() prints information about the mass-weighted displacement (ΔQ in amu^(1/2)Å) between struct1 and struct2 (pre and post re-orientation). Default: False
**sm_kwargs – Additional keyword arguments to forward to orient_s2_like_s1() (and hence StructureMatcher / StructureMatcher_scan_stol()), when re-orientation is performed.

Returns:

Dictionary of structures (for NEB/PES calculations), or tuple of two dictionaries of structures (for CC / non-radiative calculations, when displacements is not None).

Return type:

dict[str, Structure] | tuple[dict[str, Structure], dict[str, Structure]]

doped.utils.displacements module

Code to analyse site displacements around defects.

doped.utils.displacements.calc_displacements_ellipsoid(defect_entry: DefectEntry, quantile: float = 0.8, relaxed_distances: bool = False, return_extras: bool = False, tolerance: float = 0.0005) → tuple[source]

Calculate displacements around a defect site and fit an ellipsoid to these displacements, returning a tuple of the ellipsoid’s center, radii, rotation matrix and dataframe of anisotropy information.

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
quantile (float) – The quantile threshold for selecting significant displacements (between 0 and 1). Default is 0.8.
relaxed_distances (bool) – Whether to use the atomic positions in the relaxed defect supercell for 'Distance to defect', 'Vector to site from defect' and 'Displacement wrt defect' values (True), or unrelaxed positions (i.e. the bulk structure positions)(False). Defaults to False.
return_extras (bool) – Whether to also return the disp_df (output from calc_site_displacements(defect_entry, relative_to_defect=True)) and the points used to fit the ellipsoid, corresponding to the Cartesian coordinates of the sites with displacements above the threshold, where the structure has been shifted to place the defect at the cell midpoint ([0.5, 0.5, 0.5]) in fractional coordinates. Default is False.
tolerance (float) – Tolerance for the minimum volume ellipsoid fitting algorithm. Default is 5e-4. Smaller is more precise, but slower.

Returns:

(ellipsoid_center, ellipsoid_radii, ellipsoid_rotation, anisotropy_df): A tuple containing the ellipsoid’s center, radii, rotation matrix, and a dataframe of anisotropy information, or (None, None, None, None) if fitting was unsuccessful.
(disp_df and points): If return_extras=True, also returns disp_df and the points used to fit the ellipsoid, appended to the return tuple.

Return type:

tuple

doped.utils.displacements.calc_site_displacements(defect_entry: DefectEntry, relative_to_defect: bool = True, relaxed_distances: bool = False, vector_to_project_on: list | None = None, threshold: float = 2.0) → DataFrame[source]

Calculates the site displacements in the defect supercell, relative to the bulk supercell, and returns a DataFrame of site displacement info.

The signed displacements are stored in the calculation_metadata of the DefectEntry object under the "site_displacements" key.

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
relative_to_defect (bool) – Whether to calculate the signed displacements along the line from the (relaxed) defect site to that atom. Negative values indicate the atom moves towards the defect (compressive strain), positive values indicate the atom moves away from the defect. The relative displacements are stored in the Displacement wrt defect key of the returned dictionary. Defaults to True.
relaxed_distances (bool) – Whether to use the atomic positions in the relaxed defect supercell for 'Distance to defect', 'Vector to site from defect' and 'Displacement wrt defect' values (True), or unrelaxed positions (i.e. the bulk structure positions)(False). Defaults to False.
vector_to_project_on (list) – Direction to project the site displacements along (e.g. [0, 0, 1]). If given, also calculates (absolute) displacements perpendicular to the projection vector. Defaults to None (displacements are given as vectors in Cartesian space).
threshold (float) – If the distance between a pair of matched sites is larger than this, then a warning will be thrown. Default is 2.0 Å.

Returns:

pandas DataFrame with site displacements (compared to pristine supercell), and other displacement-related information.

doped.utils.displacements.plot_displacements_ellipsoid(defect_entry: DefectEntry, plot_ellipsoid: bool = True, plot_anisotropy: bool = False, quantile: float = 0.8, use_plotly: bool = False, show_supercell: bool = True, style_file: str | Path | None = None) → tuple[source]

Plot the displacement ellipsoid and/or anisotropy around a relaxed defect.

Set use_plotly = True to get an interactive plotly plot, useful for analysis!

The supercell edges are also plotted if show_supercell = True (default).

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
plot_ellipsoid (bool) – If True, plot the fitted ellipsoid in the crystal lattice.
plot_anisotropy (bool) – If True, plot the anisotropy of the ellipsoid radii.
quantile (float) – The quantile threshold for selecting significant displacements (between 0 and 1). Default is 0.8.
use_plotly (bool) – Whether to use plotly for plotting. Default is False. Set to True to get an interactive plot.
show_supercell (bool) – Whether to show the supercell edges in the plot. Default is True.
style_file (PathLike) – Path to matplotlib style file. if not set, will use the doped default displacements style.

Returns:

Either a single plotly or matplotlib Figure, if only one of plot_ellipsoid or plot_anisotropy are True, or a tuple of plots if both are True.

doped.utils.displacements.plot_site_displacements(defect_entry: DefectEntry, relative_to_defect: bool = True, separated_by_direction: bool = False, relaxed_distances: bool = False, vector_to_project_on: list | None = None, use_plotly: bool = False, ax: Axes | Sequence[Axes] | None = None, fig: Figure | None = None, style_file: str | Path | None = None)[source]

Plots site displacements around a defect.

Set use_plotly = True to get an interactive plotly plot, useful for analysis!

The plot mode depends on the combination of options:

relative_to_defect=True (default): Single-panel signed displacement along the defect -> atom direction (negative = towards defect).
relative_to_defect=False: Single-panel absolute displacement vs. distance to defect.
vector_to_project_on=[x,y,z]: 2-panel plot showing displacement parallel and (absolute displacement) perpendicular to the given vector.
separated_by_direction=True: 3-panel plot showing the x, y, z displacement components separately.

separated_by_direction and vector_to_project_on are mutually exclusive, and if either is set then relative_to_defect is set to False.

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
relative_to_defect (bool) – Whether to plot the signed displacements along the line from the (relaxed) defect site to that atom. Negative values indicate the atom moves towards the defect (compressive strain), positive values indicate the atom moves away from the defect (tensile strain). Default is True.
separated_by_direction (bool) – Whether to plot site displacements separated into x, y, z components (3-panel figure). Default is False.
relaxed_distances (bool) – Whether to use the atomic positions in the relaxed defect supercell for 'Distance to defect', 'Vector to site from defect' and 'Displacement wrt defect' values (True), or unrelaxed positions (i.e. the bulk structure positions)(False). Defaults to False.
vector_to_project_on (list) – Direction to project the site displacements along (e.g. [0, 0, 1]). Produces a 2-panel figure showing displacement parallel and (absolute displacement) perpendicular to the given vector. Defaults to None (i.e. don’t project displacements).
use_plotly (bool) – Whether to use plotly for plotting. Default is False (i.e. use matplotlib for plotting). Set to True to get an interactive plot.
ax (matplotlib.axes.Axes or sequence of matplotlib.axes.Axes) – Optional matplotlib Axes to plot on. If None, a new figure and axes are created. For single-panel modes (default), provide a single Axes. For multi-panel modes, provide a matching sequence of Axes: 2 axes for vector_to_project_on, 3 axes for separated_by_direction. A ValueError is raised if the wrong number of axes is supplied. Only used with use_plotly=False. Default is None.
fig (plotly.graph_objects.Figure) – Optional plotly Figure to add traces to. If None, a new figure is created (including the required subplot layout and titles for multi-panel modes). When supplying an existing figure for multi-panel modes, it must already have the correct number of subplots configured (2 for vector_to_project_on, 3 for separated_by_direction). Only used w/use_plotly=True. Default is None.
style_file (PathLike) – Path to a matplotlib style file. If not set, uses the doped default displacement plotting style.

Returns:

plotly or matplotlib Figure.

doped.utils.efficiency module

Utility functions to improve the efficiency of common functions/workflows/calculations in doped.

class doped.utils.efficiency.DopedTopographyAnalyzer(structure: Structure, image_tol: float = 0.0001, max_cell_range: int = 1, constrained_c_frac: float = 0.5, thickness: float = 0.5)[source]

Bases: object

This is a modified version of TopographyAnalyzer to lean down the input options and make initialisation far more efficient (~2 orders of magnitude faster).

The original code was written by Danny Broberg and colleagues (10.1016/j.cpc.2018.01.004), which was then added to pymatgen before being cut.

Parameters:

structure (Structure) – Structure to analyse.
image_tol (float) – A tolerance distance for the analysis, used to determine if sites are periodic images of each other. Default (of 1e-4) is usually fine.
max_cell_range (int) – This is the range of periodic images to construct the Voronoi tessellation. A value of 1 means that we include all points from (x +- 1, y +- 1, z+- 1) in the Voronoi construction. This is because the Voronoi polyhedra extend beyond the standard unit cell because of PBC. Typically, the default value of 1 works fine for most structures and is fast. But for very small unit cells with high symmetry, this may need to be increased to 2 or higher. If there are < 5 atoms in the input structure and max_cell_range is 1, this will automatically be increased to 2.
constrained_c_frac (float) – Constrain the region where topology analysis is performed. Only sites with z fractional coordinates between constrained_c_frac +/- thickness are considered. Default of 0.5 (with thickness of 0.5) includes all sites in the unit cell.
thickness (float) – Constrain the region where topology analysis is performed. Only sites with z fractional coordinates between constrained_c_frac +/- thickness are considered. Default of 0.5 (with thickness of 0.5) includes all sites in the unit cell.

class doped.utils.efficiency.DopedVacancyGenerator(symprec: float = 0.01, angle_tolerance: float = 5)[source]

Bases: VacancyGenerator

Vacancy defects generator, subclassed from pymatgen-analysis-defects to improve efficiency (particularly when handling defect complexes).

Initialize the vacancy generator.

generate(structure: Structure, rm_species: set[str | Species] | list[str | Species] | None = None, **kwargs) → Generator[Vacancy, None, None][source]

Generate vacancy defects.

Parameters:

structure (Structure) – The structure to generate vacancy defects in.
rm_species (set[str | Species] | list[str | Species] | None) – List/set of species to be removed (i.e. to consider for vacancy generation). If None, considers all species.
**kwargs – Additional keyword arguments for the Vacancy constructor.

Returns:

Generator that yields a list of Vacancy objects.

Return type:

Generator[Vacancy, None, None]

class doped.utils.efficiency.Hashabledict[source]: Bases: dict

doped.utils.efficiency.StructureMatcher_scan_stol(struct1: Structure, struct2: Structure, func_name: str = 'get_s2_like_s1', min_stol: float | None = None, max_stol: float = 0.3, stol_factor: float = 0.5, **sm_kwargs)[source]

Utility function to scan through a range of stol values for StructureMatcher until a match is found between struct1 and struct2 (i.e. StructureMatcher.{func_name} returns a result).

The StructureMatcher.match() function (used in most StructureMatcher methods) speed is heavily dependent on stol, with smaller values being faster, so we can speed up evaluation by starting with small values and increasing until a match is found (especially with the doped efficiency tools which implement caching (and other improvements) to ensure no redundant work here).

Note that ElementComparator() is used by default here! (So sites with different species but the same element (e.g. “S2-” & “S0+”) will be considered match-able). This can be controlled with sm_kwargs['comparator'].

Note: If you know reduction to primitive cells is not possible/needed, then setting primitive_cell=False in sm_kwargs can significantly speed up matching here (by avoiding expensive reduction to primitive cells for large structures).

Parameters:

struct1 (Structure) – struct1 for StructureMatcher.match().
struct2 (Structure) – struct2 for StructureMatcher.match().
func_name (str) –
The name of the StructureMatcher method to return the result of StructureMatcher.{func_name}(struct1, struct2) for, such as:
- ”get_s2_like_s1” (default)
- ”get_rms_dist”
- ”fit”
- ”fit_anonymous”
- ”get_rms_anonymous”
min_stol (float) – Minimum stol value to try. Default is to use dopeds get_min_stol_for_s1_s2() function to estimate the minimum stol necessary, and start with 2x this value to achieve fast structure-matching in most cases.
max_stol (float) – Maximum stol value to try. Default: 0.3 (matching StructureMatcher default).
stol_factor (float) – Fractional increment to increase stol by each time (when a match is not found). Default value of 0.5 increases stol by 50% each time.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher().

Returns:

Result of StructureMatcher.{func_name}(struct1, struct2) or None if no match is found.

doped.utils.efficiency.array_to_tuple(array: ArrayLike | tuple) → tuple[source]: Convert an array-like input to tuple.

doped.utils.efficiency.cache_ready_PeriodicSite__eq__(self, other)[source]: Custom __eq__ method for PeriodicSite instances, using a cached equality function to speed up comparisons.

doped.utils.efficiency.cache_species(structure_cls)[source]: Context manager that makes Structure.species a cached property, which significantly speeds up pydefect eigenvalue parsing in large structures (due to repeated use of Structure.indices_from_symbol.

doped.utils.efficiency.cached_Structure_eq_func(self_hash, other_hash)[source]: Cached equality function for Structure instances.

doped.utils.efficiency.cached_allclose(a: tuple, b: tuple, rtol: float = 1e-05, atol: float = 1e-08)[source]: Cached version of np.allclose, taking tuples as inputs (so that they are hashable and thus cacheable).

doped.utils.efficiency.doped_Composition_eq_func(self_hash, other_hash)[source]: Update equality function for Composition instances, which breaks early for mismatches and also uses caching, making it orders of magnitude faster than pymatgens equality function.

doped.utils.efficiency.doped_Structure__eq__(self, other: IStructure) → bool[source]: Copied from pymatgen, but updated to break early once a mis-matching site is found, to speed up structure matching by ~2x.

doped.utils.efficiency.fast_Composition_eq(self, other)[source]: Fast equality function for Composition instances, breaking early for mismatches.

doped.utils.efficiency.get_all_distances(self, frac_coords1: ArrayLike, frac_coords2: ArrayLike) → ndarray[tuple[Any, ...], dtype[float64]][source]

Get the distances between two lists of coordinates taking into account periodic boundary conditions and the lattice.

See get_all_distances().

doped.utils.efficiency.get_dist_equiv_stol(dist: float, structure: Structure) → float[source]

Get the equivalent stol value for a given Cartesian distance (dist) in a given Structure.

stol is a site tolerance parameter used in pymatgen StructureMatcher functions, defined as the fraction of the average free length per atom := ( V / Nsites ) ** (1/3).

Parameters:

dist (float) – Cartesian distance in Å.
structure (Structure) – Structure to calculate stol for.

Returns:

Equivalent stol value for the given distance.

Return type:

float

doped.utils.efficiency.get_element_indices(structure: Structure, elements: list[Element | Species | str] | None = None, comparator: AbstractComparator | None = None) → dict[str, list[int]][source]

Convenience function to generate a dictionary of {element: [indices]} for a given Structure, where indices are the indices of the sites in the structure corresponding to the given elements (default is all elements in the structure).

Parameters:

structure (Structure) – Structure to get the indices from.
elements (list[Element | Species | str] | None) – List of elements to get the indices of. If None (default), all elements in the structure are used.
comparator (AbstractComparator | None) – Comparator to check if we should return the str(element) representation (which includes charge information if element is a Species), or just the element symbol (i.e. element.element.symbol) – which is the case when comparator is None (default) or ElementComparator / FrameworkComparator.

Returns:

Dictionary of {element: [indices]} for the given elements in the structure.

Return type:

dict[str, list[int]]

doped.utils.efficiency.get_element_min_max_bond_length_dict(structure: Structure, **sm_kwargs) → dict[source]

Get a dictionary of {element: (min_bond_length, max_bond_length)} for a given Structure, where min_bond_length and max_bond_length are the minimum and maximum smallest interatomic bond lengths for each element in the structure.

Parameters:

structure (Structure) – Structure to calculate bond lengths for.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher(). Just used to check if comparator has been set here (if ElementComparator/FrameworkComparator used, then we use Elements rather than Species as the keys), or if ignored_species is set (in which case these species are ignored when calculating bond lengths).

Returns:

Dictionary of {element: (min_bond_length, max_bond_length)}.

Return type:

dict

doped.utils.efficiency.get_min_stol_for_s1_s2(struct1: Structure, struct2: Structure, **sm_kwargs) → float[source]

Get the minimum possible stol value which will give a match between struct1 and struct2 using StructureMatcher, based on the ranges of per-element minimum interatomic distances in the two structures.

Parameters:

struct1 (Structure) – Initial structure.
struct2 (Structure) – Final structure.
**sm_kwargs – Additional keyword arguments to pass to StructureMatcher(). Just used to check if ignored_species or comparator has been set here.

Returns:

Minimum stol value for a match between struct1 and struct2. If a direct match is detected (corresponding to min stol = 0, then 1e-4 is returned).

Return type:

float

doped.utils.efficiency.get_voronoi_nodes(structure: Structure) → list[PeriodicSite][source]

Get the Voronoi nodes of a pymatgen Structure.

Maximises efficiency by mapping down to the primitive cell, doing Voronoi analysis (with the efficient DopedTopographyAnalyzer class), and then mapping back to the original structure (typically a supercell).

Parameters:: structure (Structure) – pymatgen Structure object.
Returns:: List of PeriodicSite objects representing the Voronoi nodes.
Return type:: list[PeriodicSite]

doped.utils.eigenvalues module

Helper functions for setting up PHS analysis.

Contains modified versions of functions from pydefect and vise (https://github.com/kumagai-group/pydefect / vise).

doped.utils.eigenvalues.band_edge_properties_from_vasprun(vasprun: Vasprun, integer_criterion: float = 0.1) → BandEdgeProperties[source]

Create a pydefect BandEdgeProperties object from a Vasprun object.

Parameters:

vasprun (Vasprun) – Vasprun object.
integer_criterion (float) – Threshold criterion for determining if a band is unoccupied (< integer_criterion), partially occupied (between integer_criterion and 1 - integer_criterion), or fully occupied (> 1 - integer_criterion). Default is 0.1.

Returns:

BandEdgeProperties object.

Generate metadata required for performing eigenvalue & orbital analysis, specifically pydefect BandEdgeOrbitalInfos, and EdgeInfo objects for the bulk VBM and CBM.

See the Perturbed Host States (Shallow Defects) tips section.

Parameters:

defect_vr (Vasprun) – Vasprun object of the defect supercell calculation. If defect_procar is not provided, then this must have the projected_eigenvalues attribute (i.e. from a calculation with LORBIT > 10 in the INCAR and parsed with parse_projected_eigen = True (default)).
bulk_vr (Vasprun) – Vasprun object of the bulk supercell calculation. If bulk_procar is not provided, then this must have the projected_eigenvalues attribute (i.e. from a calculation with LORBIT > 10 in the INCAR and parsed with parse_projected_eigen = True (default)).
defect_procar (PathLike, Procar) – Either a path to the VASP PROCAR(.gz) output file (with LORBIT > 10 in the INCAR) or a pymatgen Procar object, for the defect supercell calculation. Not required if the supplied defect_vr was parsed with parse_projected_eigen = True (default). Default is None.
bulk_procar (PathLike, Procar) – Either a path to the VASP PROCAR(.gz) output file (with LORBIT > 10 in the INCAR) or a pymatgen Procar object, for the reference bulk supercell calculation. Not required if the supplied bulk_vr was parsed with parse_projected_eigen = True (default). Default is None.
defect_supercell_site (PeriodicSite) – PeriodicSite object of the defect site in the defect supercell, from which the defect neighbours are determined for localisation analysis. If None (default), then the defect site is determined automatically from the defect and bulk supercell structures.
neighbor_cutoff_factor (float) – Sites within min_distance * neighbor_cutoff_factor of the defect site in the relaxed defect supercell are considered neighbours for localisation analysis, where min_distance is the minimum distance between sites in the defect supercell. Default is 1.3 (matching the pydefect default).

Returns:

pydefect BandEdgeOrbitalInfos, and EdgeInfo objects for the bulk VBM and CBM.

Get eigenvalue & orbital info (with automated classification of PHS states) for the band edge and in-gap electronic states for the input defect entry / calculation outputs, as well as a plot of the single-particle electronic eigenvalues and their occupation (if plot=True).

Can be used to determine if a defect is adopting a perturbed host state (PHS / shallow state), see the Perturbed Host States (Shallow Defects) tips section.

Note that the classification of electronic states as band edges or localised orbitals is based on the similarity of orbital projections and eigenvalues between the defect and bulk cell calculations (see similar_orb/energy_criterion argument descriptions below for more details). You may want to adjust the default values of these keyword arguments, as the defaults may not be appropriate in all cases. In particular, the P-ratio values can give useful insight, revealing the level of (de)localisation of the states.

Either a doped DefectEntry object can be provided, or the required VASP output files/objects for the bulk and defect supercell calculations (Vaspruns, or Vaspruns and Procars). If a DefectEntry is provided but eigenvalue data has not already been parsed (default in doped is to parse this data with DefectsParser/DefectParser, as controlled by the parse_projected_eigen flag), then this function will attempt to load the eigenvalue data from either the input Vasprun / Procar objects or files, or from the bulk/defect_paths in defect_entry.calculation_metadata. If so, will initially try to load orbital projections from vasprun.xml(.gz) files (more accurate due to less rounding errors), or failing that from PROCAR(.gz) files if present.

This function uses code from pydefect, so please cite the pydefect paper: https://doi.org/10.1103/PhysRevMaterials.5.123803

Parameters:

defect_entry (DefectEntry) – doped DefectEntry object. Default is None.
plot (bool) – Whether to plot the single-particle eigenvalues. (Default: True)
filename (str) – Filename to save the eigenvalue plot to (if plot = True). If None (default), plots are not saved.
ks_labels (bool) – Whether to add band index labels to the KS levels. (Default: False)
style_file (str) – Path to a mplstyle file to use for the plot. If None (default), uses the doped displacement plot style (doped/utils/displacement.mplstyle).
bulk_vr (PathLike, Vasprun) – Not required if defect_entry provided and eigenvalue data already parsed (default behaviour when parsing with doped, data in defect_entry.calculation_metadata["eigenvalue_data"]). Either a path to the VASP vasprun.xml(.gz) output file or a pymatgen Vasprun object, for the reference bulk supercell calculation. If None (default), tries to load the Vasprun object from defect_entry.calculation_metadata["run_metadata"]["bulk_vasprun_dict"] or, failing that, from a vasprun.xml(.gz) file at defect_entry.calculation_metadata["bulk_path"].
bulk_procar (PathLike, Procar) – Not required if defect_entry provided and eigenvalue data already parsed (default behaviour when parsing with doped, data in defect_entry.calculation_metadata["eigenvalue_data"]), or if bulk_vr was parsed with parse_projected_eigen = True (default). Either a path to the VASP PROCAR output file (with LORBIT > 10 in the INCAR) or a pymatgen Procar object, for the reference bulk supercell calculation. If None (default), tries to load from a PROCAR(.gz) file at defect_entry.calculation_metadata["bulk_path"].
defect_vr (PathLike, Vasprun) – Not required if defect_entry provided and eigenvalue data already parsed (default behaviour when parsing with doped, data in defect_entry.calculation_metadata["eigenvalue_data"]). Either a path to the VASP vasprun.xml(.gz) output file or a pymatgen Vasprun object, for the defect supercell calculation. If None (default), tries to load the Vasprun object from defect_entry.calculation_metadata["run_metadata"]["defect_vasprun_dict"] or, failing that, from a vasprun.xml(.gz) file at defect_entry.calculation_metadata["defect_path"].
defect_procar (PathLike, Procar) – Not required if defect_entry provided and eigenvalue data already parsed (default behaviour when parsing with doped, data in defect_entry.calculation_metadata["eigenvalue_data"]), or if defect_vr was parsed with parse_projected_eigen = True (default). Either a path to the VASP PROCAR output file (with LORBIT > 10 in the INCAR) or a pymatgen Procar object, for the defect supercell calculation. If None (default), tries to load from a PROCAR(.gz) file at defect_entry.calculation_metadata["defect_path"].
force_reparse (bool) – Whether to force re-parsing of the eigenvalue data, even if already present in the calculation_metadata dict.
ylims (tuple[float, float]) – Custom y-axis limits for the eigenvalue plot. If None (default), the y-axis limits are automatically set to +/-5% of the eigenvalue range.
legend_kwargs (dict) – Custom keyword arguments to pass to the ax.legend call in the eigenvalue plot (e.g. “loc”, “fontsize”, “framealpha” etc.). If set to False, then no legend is shown. Default is None.
similar_orb_criterion (float) – Threshold criterion for determining if the orbitals of two eigenstates are similar (for identifying band-edge and defect states). If the summed orbital projection differences, normalised by the total orbital projection coefficients, are less than this value, then the orbitals are considered similar. Default is to try with 0.2 (pydefect default), then if this fails increase to 0.35, and lastly 0.5.
similar_energy_criterion (float) – Threshold criterion for considering two eigenstates similar in energy, used for identifying band-edge (and defect states). Bands within this energy difference from the VBM/CBM of the bulk are considered potential band-edge states. Default is to try with the larger of either 0.25 eV or 0.1 eV + the potential alignment from defect to bulk cells as determined by the charge correction in defect_entry.corrections_metadata if present. If this fails, then it is increased to the pydefect default of 0.5 eV.

Returns:

pydefect BandEdgeStates object, containing the band-edge and defect eigenvalue information, and the eigenvalue plot (if plot=True).

doped.utils.eigenvalues.make_band_edge_orbital_infos(defect_vr: Vasprun, vbm: float, cbm: float, eigval_shift: float = 0.0, neighbor_indices: list[int] | None = None, defect_procar: Procar | None = None)[source]

Make BandEdgeOrbitalInfos from a Vasprun object.

Modified from pydefect to use projected orbitals stored in the Vasprun object.

Parameters:

defect_vr (Vasprun) – Defect Vasprun object.
vbm (float) – VBM eigenvalue in eV.
cbm (float) – CBM eigenvalue in eV.
eigval_shift (float) – Shift eigenvalues by this value in eV. Default is 0.0.
neighbor_indices (list[int]) – Indices of neighboring atoms to the defect site, for localisation analysis. Default is None.
defect_procar (Procar) – pymatgen Procar object, for the defect supercell, if projected eigenvalue/orbitals data is not provided in defect_vr.

Returns:

BandEdgeOrbitalInfos object.

doped.utils.eigenvalues.make_perfect_band_edge_state_from_vasp(vasprun: Vasprun, procar: Procar, integer_criterion: float = 0.1) → PerfectBandEdgeState[source]

Create a pydefect PerfectBandEdgeState object from just a Vasprun and Procar object, without the need for the Outcar input (as in pydefect).

Parameters:

vasprun (Vasprun) – Vasprun object.
procar (Procar) – Procar object.
integer_criterion (float) – Threshold criterion for determining if a band is unoccupied (< integer_criterion), partially occupied (between integer_criterion and 1 - integer_criterion), or fully occupied (> 1 - integer_criterion). Default is 0.1.

Returns:

PerfectBandEdgeState object.

doped.utils.legacy_corrections module

Functions for computing legacy finite-size charge corrections (Makov-Payne, Murphy-Hine, Lany-Zunger) for defect formation energies.

Mostly adapted from the deprecated AIDE package developed by the dynamic duo Adam Jackson and Alex Ganose.

Note that bandfilling corrections are no longer supported, as in most cases they shouldn’t be used (see https://doi.org/10.1038/s41578-025-00879-y). If for some reason bandfilling corrections are desired, they can be manually added to corrections attributes of DefectEntry objects. See https://github.com/materialsproject/pymatgen/pull/2193

doped.utils.legacy_corrections.get_murphy_image_charge_correction(lattice, dielectric_matrix, conv=0.3, factor=30, verbose=False)[source]

Calculates the anisotropic image charge correction by Sam Murphy in eV.

This a rewrite of the code ‘madelung.pl’ written by Sam Murphy (see [1]). The default convergence parameter of conv = 0.3 seems to work perfectly well. However, it may be worth testing convergence of defect energies with respect to the factor (i.e. cut-off radius).

Reference: S. T. Murphy and N. D. H. Hine, Phys. Rev. B 87, 094111 (2013).

Parameters:

lattice (list) – The defect cell lattice as a 3x3 matrix.
dielectric_matrix (list) – The dielectric tensor as 3x3 matrix.
conv (float) – A value between 0.1 and 0.9 which adjusts how much real space vs reciprocal space contribution there is.
factor – The cut-off radius, defined as a multiple of the longest cell parameter.
verbose (bool) – If True details of the correction will be printed.

Returns:

The image charge correction as a {charge: correction} dictionary.

doped.utils.legacy_corrections.lany_zunger_corrected_defect_dict(defect_dict: dict)[source]

Convert charge corrections from (e)FNV to Lany-Zunger in the input parsed defect dictionary.

This function is used to convert the finite-size charge corrections for parsed defect entries in a dictionary to the same dictionary but with the Lany-Zunger charge correction (0.65 * Makov-Payne image charge correction, with the same potential alignment).

Parameters:: defect_dict (dict) – Dictionary of parsed defect calculations. Must have 'freysoldt_meta' in DefectEntry.calculation_metadata for each charged defect (from DefectParser.load_FNV_data()).
Returns:: Parsed defect dictionary with Lany-Zunger charge corrections.

doped.utils.parsing module

Helper functions for parsing defect supercell calculations.

doped.utils.parsing.check_atom_mapping_far_from_defect(defect_supercell: Structure, bulk_supercell: Structure, defect_coords: ndarray, coords_are_cartesian: bool = False, displacement_tol: float = 0.5, fraction_tol: float = 0.2, warning: bool | str = 'verbose') → bool[source]

Check the displacement of atoms far from the determined defect site, and warn the user if they are large (often indicates a mismatch between the bulk and defect supercell definitions).

For sites of a given species outside the Wigner-Seitz radius of the defect (the radius of the largest sphere which can fit in the cell), a ‘large’ displacement is flagged if either the mean displacement exceeds displacement_tol Ångströms (capturing a systematic/global mismatch), or the fraction of such sites individually displaced by more than displacement_tol exceeds fraction_tol (capturing a partial mismatch without being triggered by single outlier sites).

Parameters:

defect_supercell (Structure) – The defect structure.
bulk_supercell (Structure) – The bulk structure.
defect_coords (np.ndarray) – The coordinates of the defect site.
coords_are_cartesian (bool) – Whether the defect coordinates are in Cartesian or fractional coordinates. Default is False (fractional).
displacement_tol (float) – The tolerance for the displacement of individual atoms far from the defect site, in Ångströms. Default is 0.5 Å.
fraction_tol (float) – The tolerance for the fraction of far-from-defect sites (of a given species) displaced by more than displacement_tol, above which a mismatch is flagged. Default is 0.2 (i.e. 20%).
warning (bool, str) – Whether to throw a warning if a mismatch is detected. If warning = "verbose" (default), the individual atomic displacements are included in the warning message.

Returns:

Returns False if a mismatch is detected, else True.

Return type:

bool

doped.utils.parsing.find_archived_fname(fname, raise_error=True)[source]: Find a suitable filename, taking account of possible use of compression software.

doped.utils.parsing.find_missing_idx(frac_coords1: list | ndarray, frac_coords2: list | ndarray, lattice: Lattice)[source]

Find the missing/outlier index between two sets of fractional coordinates (differing in size by 1), by grouping the coordinates based on the minimum distances between coordinates or, if that doesn’t give a unique match, the site combination that gives the minimum summed squared distances between paired sites.

The index returned is the index of the missing/outlier coordinate in the larger set of coordinates.

Parameters:

frac_coords1 (list | np.ndarray) – First set of fractional coordinates.
frac_coords2 (list | np.ndarray) – Second set of fractional coordinates.
lattice (Lattice) – The lattice object to use with the fractional coordinates.

Find the nearest coords in candidate_frac_coords to target_frac_coords.

If return_idx is True, also returns the index of the nearest coords in candidate_frac_coords to target_frac_coords.

Parameters:

candidate_frac_coords (list | np.ndarray) – Fractional coordinates (typically from a bulk supercell), to find the nearest coordinates to target_frac_coords.
target_frac_coords (list | np.ndarray) – The target coordinates to find the nearest coordinates to in candidate_frac_coords.
lattice (Lattice) – The lattice object to use with the fractional coordinates.
return_idx (bool) – Whether to also return the index of the nearest coordinates in candidate_frac_coords to target_frac_coords.

doped.utils.parsing.get_coords_and_idx_of_species(structure_or_sites: SiteCollection, species_name: str, frac_coords: bool = True, use_oxi_states: bool = False) → tuple[ndarray, ndarray][source]: Get arrays of the coordinates and indices of the given species in the structure/list of sites.

doped.utils.parsing.get_core_potentials_from_outcar(outcar_path: str | Path, dir_type: str = '', total_energy: list | float | None = None)[source]

Get the core potentials from the OUTCAR file, which are needed for the Kumagai-Oba (eFNV) finite-size correction.

This parser skips the full pymatgen Outcar initialisation/parsing, to expedite parsing and make it more robust (doesn’t fail if OUTCAR is incomplete, as long as it has the core potentials information).

Parameters:

outcar_path (PathLike) – The path to the OUTCAR file.
dir_type (str) – The type of directory the OUTCAR is in (e.g. bulk or defect) for informative error messages.
total_energy (list | float | None) – The already-parsed total energy for the structure. If provided, will check that the total energy of the OUTCAR matches this value / one of these values, and throw a warning if not.

Returns:

The core potentials from the last ionic step in the OUTCAR.

Return type:

np.ndarray

doped.utils.parsing.get_defect_type_and_composition_diff(defect: Structure | Composition, bulk: Structure | Composition, _parameter_order_warn: bool = True) → tuple[str, dict][source]

Get the difference in composition between a bulk structure and a defect structure.

Parameters:

defect (Structure | Composition) – The defect structure or composition.
bulk (Structure | Composition) – The bulk structure or composition.

Returns:

The defect type (interstitial, vacancy, substitution or complex) and the composition difference between the bulk and defect structures as a dictionary.

Return type:

tuple[str, dict[str, int]]

doped.utils.parsing.get_defect_type_and_site_indices(defect_supercell: Structure, bulk_supercell: Structure, site_tol: float | None = None, abs_tol: bool = False, use_oxi_states: bool = False, use_rms: bool = False) → tuple[str, list[int], list[int]][source]

Get the defect type, and indices of defect sites in the bulk (vacancies / substitutions) and defect (interstitials / substitutions) supercells.

Defect sites are determined by matching sites in the bulk and defect structures (by element and distances), according to site_tol.

Note that this assumes consistent cell definitions (lattice vectors and bases) for the input defect and bulk supercells, and does not perform any structural re-orientations.

Parameters:

defect_supercell (Structure) – The defect supercell structure.
bulk_supercell (Structure) – The bulk supercell structure.
site_tol (float | None) – The (fractional) tolerance for matching sites between the defect and bulk structures. If abs_tol is False (default), then the distance threshold for matching is set to the product of site_tol and the shortest bond length in the bulk structure for the given species, otherwise the value is used directly (as a length in Å). If None (default), the defect is assumed to be a point defect, and the largest site mismatch is assigned as the defect site.
abs_tol (bool) – Whether to use site_tol as an absolute distance tolerance (in Å) instead of a fractional tolerance (in terms of the shortest bond length in the structure). Default is False.
use_oxi_states (bool) – Whether to use the oxidation states of the sites in the bulk and defect structures when considering matching sites (such that e.g. Fe3+ and Fe2+ would be considered different species). Default is False.
use_rms (bool) – Site mapping (using linear assignment) – used to determine defect sites – will be that which minimises either the summed RMS distances (if use_rms is True) or just simple linear sum of distances (if False, default) between all paired sites.

Returns:

The type of defect as a string (interstitial, vacancy or: substitution).
missing_bulk_site_indices (list[int]):: Indices of sites in the bulk structure that do not match any site in the defect structure (according to site_tol choice).
additional_defect_site_indices (list[int]):: Indices of sites in the defect structure that do not match any site in the bulk structure (according to site_tol choice).

Return type:

defect_type (str)

doped.utils.parsing.get_dimer_bonds(structure: Structure, rtol: float = 1.05) → dict[str, list[float]][source]

Get a dictionary of all homoionic (dimer) bonds in the structure.

This function uses the get_homoionic_bonds and get_dimer_bond_length functions from shakenbreak to identify dimer bonds in the structure (where any pair of atoms of the same element with distance < rtol * get_dimer_bond_length(elt, elt) are considered a dimer bond), returning a dictionary of the site names and the dimer bond length.

Parameters:

structure (Structure) – The structure to get the dimer bond lengths for.
rtol (float) – The relative tolerance to use for classifying bonds as dimer bonds, where distances < rtol * get_dimer_bond_length(elt, elt) are considered dimer bonds. Default is 1.05.

Returns:

A dictionary of element names with values being sub-dictionaries of site names and their homoionic neighbours and distances (in Å) which are classified as dimer bonds. (e.g. {‘O’: {‘O(1)’: {‘O(3)’: ‘1.44 Å’}}})

Return type:

dict[str, list[float]]

doped.utils.parsing.get_locpot(locpot_path: str | Path)[source]: Read the LOCPOT(.gz) file as a pymatgen Locpot object.

doped.utils.parsing.get_magnetization_from_vasprun(vasprun: Vasprun) → int | float | ndarray[source]

Determine the total magnetization from a Vasprun object.

For spin-polarised calculations, this is the difference between the number of spin-up vs spin-down electrons. For non-spin-polarised calculations, there is no magnetization. For non-collinear (NCL) magnetization (e.g. spin-orbit coupling (SOC) calculations), the magnetization becomes a vector (spinor), in which case we take the vector norm as the total magnetization.

VASP does not write the total magnetization to vasprun.xml file (but does to the OUTCAR file), and so here we have to reverse-engineer it from the eigenvalues (for normal spin-polarised calculations) or the projected magnetization & eigenvalues (for NCL calculations). For NCL calculations, we sum the projected orbital magnetizations for all occupied states, weighted by the k-point weights and normalised by the total orbital projections for each band and k-point. This gives the best estimate of the total magnetization from the projected magnetization array, but due to incomplete orbital projections and orbital-dependent non-uniform scaling factors (i.e. completeness of orbital projects for s vs p vs d orbitals etc.), there can be inaccuracies up to ~30% in the estimated total magnetization for tricky cases.

Parameters:: vasprun (Vasprun) – The Vasprun object from which to extract the total magnetization.
Returns:: The total magnetization of the system.
Return type:: int or float or np.ndarray

doped.utils.parsing.get_matching_site(site: PeriodicSite | ndarray, structure: Structure, anonymous: bool = False, tol: float = 0.5) → PeriodicSite[source]

Get the (closest) matching PeriodicSite in structure for the input site, which can be a PeriodicSite or fractional coordinates.

If the closest matching site in structure is > tol Å (0.5 Å by default) away from the input site coordinates, an error is raised.

Automatically accounts for possible differences in assigned oxidation states, site property dicts etc.

Parameters:

site (PeriodicSite | np.ndarray) – The site for which to find the closest matching site in structure, either as a PeriodicSite or fractional coordinates array. If fractional coordinates, then anonymous is set to True.
structure (Structure) – The structure in which to search for matching sites to site.
anonymous (bool) – Whether to use anonymous matching, allowing different species/elements to match each other (i.e. just matching based on coordinates). Default is False if site is a PeriodicSite, and True if site is fractional coordinates.
tol (float) – A distance tolerance (in Å), where an error will be thrown if the closest matching site is > tol Å away from the input site. Default is 0.5 Å.

Returns:

The closest matching site in structure to the input site.

Return type:

PeriodicSite

doped.utils.parsing.get_nelect_from_vasprun(vasprun: Vasprun) → int | float[source]

Determine the number of electrons (NELECT) from a Vasprun object.

Parameters:: vasprun (Vasprun) – The Vasprun object from which to extract NELECT.
Returns:: The number of electrons in the system.
Return type:: int or float

doped.utils.parsing.get_neutral_nelect_from_vasprun(vasprun: Vasprun, skip_potcar_init: bool = False) → int[source]

Determine the number of electrons (NELECT) from a Vasprun object, corresponding to a neutral charge state for the structure.

Parameters:

vasprun (Vasprun) – The Vasprun object from which to extract NELECT.
skip_potcar_init (bool) – Whether to skip the initialisation of the POTCAR statistics (i.e. the auto-charge determination) and instead try to reverse engineer NELECT using the DefectDictSet.

Returns:

The number of electrons in the system for a neutral charge state.

Return type:

int

doped.utils.parsing.get_outcar(outcar_path: str | Path)[source]: Read the OUTCAR(.gz) file as a pymatgen Outcar object.

doped.utils.parsing.get_procar(procar_path: str | Path) → Procar[source]

Read the PROCAR(.gz) file as a pymatgen Procar object.

Previously, pymatgen Procar parsing did not support SOC calculations, however this was updated in https://github.com/materialsproject/pymatgen/pull/3890 to use code from easyunfold (https://smtg-bham.github.io/easyunfold – a package for unfolding electronic band structures for symmetry-broken / defect / dopant systems, with many plotting & analysis tools).

doped.utils.parsing.get_site_mappings(struct1: Structure, struct2: Structure, species: str | Element | Species | DummySpecies | None = None, allow_duplicates: bool = False, threshold: float = 2.0, anonymous: bool = False, ignored_species: list[str] | None = None, frac_coords: bool = True, use_rms: bool = False) → list[tuple[float | None, int | None, int | None]][source]

Get the site mappings between two structures (from struct1 to struct2), based on the shortest distances between sites.

The two structures may have different species orderings.

NOTE: if frac_coords = True (default), this assumes that both structures have the same lattice definitions (i.e. that they match, and aren’t rigidly translated/rotated with respect to each other), which is mostly the case unless we have a mismatching defect/bulk supercell (in which case the check_atom_mapping_far_from_defect warning should be thrown anyway during parsing).

Parameters:

struct1 (Structure) – The input structure.
struct2 (Structure) – The template structure.
species (str) – If provided, only sites of this species will be considered when matching sites. Default is None (all species).
allow_duplicates (bool) – If True, allow multiple sites in struct1 to be matched to the same site in struct2. Default is False.
threshold (float) – If the distance between a pair of matched sites is larger than this, then a warning will be thrown. Default is 2.0 Å.
anonymous (bool) – If True, the species of the sites will not be considered when matching sites. Default is False (only matching species can be matched together).
ignored_species (list[str]) – A list of species to ignore when matching sites. Default is no species ignored.
frac_coords (bool) – Whether to match sites based on their fractional coordinate distances (i.e. assuming PBC with matching lattice definitions, using the lattice of struct1)(default). If False, instead matches sites based on distances between their Cartesian coordinates, with no consideration of PBC.
use_rms (bool) – The returned site mapping (using linear assignment – only applicable when allow_duplicates is False) will be that which minimises either the summed RMS distances (if use_rms is True) or just simple linear sum of distances (if False, default) between all paired sites.

Returns:

A list of lists containing the distance, index in struct1 and index in struct2 for each matched site.

Return type:

list

doped.utils.parsing.get_vasprun(vasprun_path: str | Path, parse_mag: bool = True, **kwargs)[source]: Read the vasprun.xml(.gz) file as a pymatgen Vasprun object.

doped.utils.parsing.get_wigner_seitz_radius(lattice: Structure | Lattice) → float[source]

Calculates the Wigner-Seitz radius of the structure, which corresponds to the maximum radius of a sphere fitting inside the cell.

Templated on the calc_max_sphere_radius function from pydefect, but rewritten to avoid calling vise which causes hanging on Windows. (https://github.com/SMTG-Bham/doped/issues/147).

Parameters:: lattice (Structure | Lattice) – The lattice of the structure (either a pymatgen Structure or Lattice object).
Returns:: The Wigner-Seitz radius of the structure.
Return type:: float

doped.utils.parsing.parse_projected_eigen(elem: Element, parse_mag: bool = True) → tuple[dict[Spin, ndarray], ndarray | None][source]

Parse the projected eigenvalues from a Vasprun object (used during initialisation), but excluding the projected magnetization for efficiency.

Note that following SK’s PRs to pymatgen (#4359, #4360), parsing of projected eigenvalues adds minimal additional cost to Vasprun parsing (~1-5%), while parsing of projected magnetization can add ~30% cost.

This is a modified version of _parse_projected_eigen from Vasprun, which allows skipping of projected magnetization parsing in order to expedite parsing in doped, as well as some small adjustments to maximise efficiency.

Parameters:

elem (Element) – The XML element to parse, with projected eigenvalues/magnetization.
parse_mag (bool) – Whether to parse the projected magnetization. Default is True.

Returns:

A dictionary of projected eigenvalues for each spin channel (up/down), and the projected magnetization (if parsed).

Return type:

tuple[dict[Spin, np.ndarray], np.ndarray | None]

doped.utils.parsing.reorder_s2_like_s1(s1_structure: Structure, s2_structure: Structure, threshold=5.0) → Structure[source]

Reorder the atoms of a (relaxed) structure, s2_structure, to match the ordering of the atoms in s1_structure.

s1/s2 structures may have a different species orderings.

NOTE: This assumes that both structures have the same lattice definitions (i.e. that they match, and aren’t rigidly translated/rotated with respect to each other), which is mostly the case unless we have a mismatching defect/bulk supercell (in which case the check_atom_mapping_far_from_defect warning should be thrown anyway during parsing).

Parameters:

s1_structure (Structure) – The template structure.
s2_structure (Structure) – The structure to reorder, to match s1_structure.
threshold (float) – If the distance between a pair of matched sites is larger than this value in Å, then a warning will be thrown. Default is 5.0 Å.

Returns:

s2_structure reordered to match s1_structure.

Return type:

Structure

doped.utils.parsing.spin_degeneracy_from_vasprun(vasprun: Vasprun, charge_state: int | None = None) → int[source]

Get the spin degeneracy (multiplicity) of a system from a VASP vasprun output.

Spin degeneracy is determined by first getting the total magnetization and thus electron spin (S = N_μB/2 – where N_μB is the magnetization in Bohr magnetons (i.e. electronic units, as used in VASP), and using the spin multiplicity equation: g_spin = 2S + 1. The total magnetization N_μB is determined using get_magnetization_from_vasprun (see docstring for details), and if this fails, then simple spin behaviour is assumed with singlet (S = 0) behaviour for even-electron systems and doublet behaviour (S = 1/2) for odd-electron systems.

For non-collinear (NCL) magnetization (e.g. spin-orbit coupling (SOC) calculations), the magnetization N_μB becomes a vector (spinor), in which case we take the vector norm as the total magnetization. This can be non-integer in these cases (e.g. due to SOC mixing of spin states, as _S_ is no longer a good quantum number). As an approximation for these cases, we round N_μB to the nearest integer which would be allowed under collinear magnetism (i.e. even numbers for even-electron systems, odd numbers for odd-electron systems).

Parameters:

vasprun (Vasprun) – pymatgen Vasprun for which to determine spin degeneracy.
charge_state (int) – The charge state of the system, which can be used to determine the number of electrons. If None (default), automatically determines the number of electrons using get_nelect_from_vasprun(vasprun).

Returns:

Spin degeneracy of the system.

Return type:

int

doped.utils.parsing.total_charge_from_vasprun(vasprun: Vasprun) → int | None[source]

Determine the total charge state of a system from the vasprun, and compare to the expected charge state if provided.

Note that if the system is charged, then this function relies on access to POTCAR data, which can be setup with pymatgen as detailed on the installation page.

Parameters:: vasprun (Vasprun) – pymatgen Vasprun object for which to determine the total charge.
Returns:: The total charge state, or None if it cannot be determined.
Return type:: int or None

doped.utils.plotting module

Code for plotting defect formation energies and transition levels.

class doped.utils.plotting.TransitionLevel(TL_eV: float, charges: tuple[int, int], pos_meta: bool, neg_meta: bool, faded: bool = False)[source]

Bases: NamedTuple

A charge transition level (TL), between charge states q_pos and q_neg (q_neg = q_pos - 1 for single-electron TLs).

charges = (q_pos, q_neg) (more positive, then more negative charge state); pos_meta/neg_meta flag whether the more-positive/negative charge state is metastable. TL_eV is the TL position in eV from the VBM. faded is True if the TL should be drawn faded in the vertical TL diagram (a rendering flag, left at its False default outside of plotting).

Shared by get_transition_levels() (used for single-electron TLs) and the TL plotting routines.

Create new instance of TransitionLevel(TL_eV, charges, pos_meta, neg_meta, faded)

TL_eV: float: Alias for field number 0

charges: tuple[int, int]: Alias for field number 1

faded: bool: Alias for field number 4

neg_meta: bool: Alias for field number 3

pos_meta: bool: Alias for field number 2

class doped.utils.plotting.TransitionLevelLabel(x: float, y: float, ha: str, va: str, label: str, label_w: float, TL_eV: float, conn_y: float | None = None, conn_x: float | None = None)[source]

Bases: NamedTuple

A plot position for a charge transition level (TL) label.

(x, y) is the label anchor position with alignments ha/va; label and label_w are the label text and width; TL_eV is the TL position in eV from the VBM (same as for :class`TransitionLevel`); conn_y and conn_x are the source TL line y/column-edge x for an off-column label that needs a connector (both None for an inline label with no connector).

Create new instance of TransitionLevelLabel(x, y, ha, va, label, label_w, TL_eV, conn_y, conn_x)

TL_eV: float: Alias for field number 6

conn_x: float | None: Alias for field number 8

conn_y: float | None: Alias for field number 7

ha: str: Alias for field number 2

label: str: Alias for field number 4

label_w: float: Alias for field number 5

va: str: Alias for field number 3

x: float: Alias for field number 0

y: float: Alias for field number 1

doped.utils.plotting.doped_plot_style(style_file: str | Path | None = None, style: str = 'doped')[source]

Context manager applying a matplotlib plotting style, whether a user- supplied style_file or one of the doped defaults ("doped" or "displacement").

Installs doped’s custom font if needed, applies the chosen mplstyle within a plt.style.context (so artists built inside the with block are styled), and then wraps the draw() and print_figure() methods of any figures created within the block so the style is re-applied on every (re-)render – including Jupyter’s deferred end-of-cell display, where a bare plt.style.context would have been restored before the figure is drawn. This avoids the need for a session-wide plt.style.use, so the user’s global matplotlib style is left unchanged.

Parameters:

style_file (PathLike) – Path to a .mplstyle file. If None (default), uses doped’s bundled {style}.mplstyle (in doped/utils).
style (str) – Name of the bundled doped style to use when style_file is None; either "doped" (default) or "displacement".

Yields:

PathLike – The resolved style-file path.

doped.utils.plotting.format_defect_name(defect_species: str, include_site_info: bool = False, include_charge: bool = True, wout_charge: bool | None = None) → str | None[source]

Format defect name using LaTeX styling, intended for plot labelling/titles.

For example, converts "Cd_i_C3v_0" to "$Cd_{i}^{0}$" or "$Cd_{i_{C3v}}^{0}$", if include_site_info is True), or "$Cd_i$" if include_charge = False.

Note that capitalised "V" is treated as Vanadium when separated from the following element by an underscore (e.g. "V_Sb"), but as a vacancy when directly concatenated (e.g. "VSb"); lowercase "v" or "Va"/"Vac" are always vacancies. Likewise "I" is treated as Iodine (not an interstitial; lowercase "i" or "Int" are used for interstitials).

Parameters:

defect_species (str) – Name of defect including charge state (e.g. "Cd_i_C3v_0").
include_site_info (bool) – Whether to include site info in name (e.g. "$Cd_{i}^{0}$" or "$Cd_{i_{C3v}}^{0}$"). Defaults to False.
include_charge (bool) – Whether to include the charge state in the formatted defect_species name. Defaults to True.
wout_charge (bool) – Deprecated alias for not include_charge (i.e. whether to exclude the charge state). Will be removed in doped v4.1; use include_charge instead.

Returns:

Formatted defect name, or None if it could not be parsed.

Return type:

str | None

doped.utils.plotting.format_defect_names(defect_names: list[str], include_charge: bool = False, include_site_info: bool | None = None) → list[str][source]

Format a list of defect names into LaTeX-like labels for plotting (e.g. plot legends or transition level diagram column headers).

Each name is formatted with format_defect_name() (e.g. "Cd_i_C3v_0" -> "$Cd_{i}^{0}$"), and the labels are then made unique so that no two defects share the same label. The handling of crystallographic site info, used to disambiguate defects of the same type at inequivalent sites, is controlled by include_site_info. If names for different defects are still not unique after formatting, and potentially including site info, then “-a”, “-b”, “-c”… suffixes are appended to differentiate defects.

Parameters:

defect_names (list[str]) – List of defect names to format (e.g. ["Cd_i_C3v_0", "Cd_Te", "v_Cd_-1", ...]), as taken from DefectEntry.name or the keys of a DefectThermodynamics transition-level dictionary.
include_charge (bool) – Whether to include the charge states in the formatted defect names. Defaults to False.
include_site_info (bool, None) –
Whether to include crystallographic site info in the formatted names (e.g. "$Cd_{i_{C3v}}$" rather than "$Cd_{i}$"):
- None (default): site info is omitted, then added only to defect names that would otherwise collide (to disambiguate them), then “-a”, “-b”, “-c”… suffixes are appended as a last resort if names still collide.
- False: site info is omitted from all names, and “-a”, “-b”, “-c”… suffixes are appended if names collide.
- True: site info is shown on all names (when available), and “-a”, “-b”, “-c”… suffixes are appended if names still collide.

Returns:

The formatted, unique defect labels, in the same order as the input defect_names.

Return type:

list[str]

Produce defect formation energy vs Fermi level plot (i.e. defect formation energy / transition level diagram).

This function is not intended to be directly called. The recommended usage is plot() – see docstring for details.

Parameters:

defect_thermodynamics (DefectThermodynamics) – DefectThermodynamics object containing defect entries to plot.
abs_chempots (dict) – Dictionary of {Element: value} giving the absolute chemical potential of each element.
el_refs (dict) – Dictionary of {Element: value} giving the reference energy of each element.
all_entries (bool, str) – Whether to plot the formation energy lines of all defect entries, rather than the default of showing only the equilibrium states at each Fermi level position (traditional). If instead set to “faded”, will plot the equilibrium states in bold, and all unstable states in faded grey. (Default: False)
include_site_info (bool, None) – Whether to include site info in defect names in the plot legend (e.g. $Cd_{i_{C3v}}$ rather than $Cd_{i}$ ). If None (default), site info is omitted unless needed to disambiguate non-grouped defects with the same name (i.e. inequivalent sites for the same defect type). If False, site info is never included. If True, site info is shown for all defect names. In all cases, if duplicate defect names remain, “-a”, “-b”, “-c” etc. are appended to the names to differentiate them.
chempot_table (bool) – Whether to print the chemical potential table above the plot. (Default: True)
defect_subset (list[str], str) – If provided, only defects whose name contains at least one of the given substrings are plotted (e.g. ["v_", "Te_Cd"] would keep all vacancies plus Te_Cd). A bare string is treated as a single-element list. (Default: None – all defects)
colormap (str, matplotlib.colors.Colormap) – Colormap to use for the formation energy lines, either as a string (which can be a colormap name from https://matplotlib.org/stable/users/explain/colors/colormaps or from https://www.fabiocrameri.ch/colourmaps – append ‘S’ if using a sequential colormap from the latter) or a Colormap / ListedColormap object. If None (default), uses tab10 with alpha=0.75 (if 10 or fewer lines to plot), tab20 (if 20 or fewer lines) or batlow (if more than 20 lines).
linestyles (str, list[str]) – Linestyles to use for the formation energy lines, either as a single linestyle (str) or list of linestyles (list[str]) in the order of appearance of lines in the plot legend. Default is "-"; i.e. solid linestyle for all entries.
xlim – Tuple (min,max) giving the range of the x-axis (Fermi level). May want to set manually when including transition level labels, to avoid crossing the axes. Default is to plot from -0.3 to +0.3 eV above the band gap.
ylim – Tuple (min,max) giving the range for the y-axis (formation energy). May want to set manually when including transition level labels, to avoid crossing the axes. Default is from 0 to just above the maximum formation energy value in the band gap.
fermi_level (float) – If set, plots a dashed vertical line at this Fermi level value, typically used to indicate the equilibrium Fermi level position. (Default: None)
title (str) – Title for the plot. (Default: None)
auto_labels (bool) – Whether to automatically label the transition levels with their charge states. If there are many transition levels, this can be quite ugly. (Default: False)
filename (PathLike) – Filename to save the plot to. (Default: None (not saved)).

Returns:

matplotlib Figure object.

doped.utils.plotting.get_colormap(colormap: str | Colormap | None = None, default: str = 'batlow') → Colormap[source]

Get a colormap from a string or a Colormap object.

If _alpha_X in the colormap name, sets the alpha value to X (0-1).

cmcrameri colour maps citation: https://zenodo.org/records/8409685

Parameters:

colormap (str, matplotlib.colors.Colormap) – Colormap to use, either as a string (which can be a colormap name from https://www.fabiocrameri.ch/colourmaps or https://matplotlib.org/stable/users/explain/colors/colormaps), or a Colormap / ListedColormap object. If None (default), uses default colormap (which is "batlow" by default). Append “S” to the colormap name if using a sequential colormap from https://www.fabiocrameri.ch/colourmaps.
default (str) – Default colormap to use if colormap is None. Defaults to "batlow" from https://www.fabiocrameri.ch/colourmaps.

doped.utils.plotting.get_legend_font_size() → float[source]

Convenience function to get the current matplotlib legend font size, in points (pt).

Returns:: Current legend font size in points (pt).
Return type:: float

doped.utils.plotting.get_linestyles(linestyles: str | list[str] = '-', num_lines: int = 1) → list[str][source]

Get a list of linestyles to use for plotting, from a string or list of strings (linestyles).

If a list is provided which doesn’t match the number of lines, the list is repeated until it does.

Parameters:

linestyles (str, list[str]) – Linestyles to use for plotting. If a string, uses that linestyle for all lines. If a list, uses each linestyle in the list for each line. Defaults to "-".
num_lines (int) – Number of lines to plot (and thus number of linestyles to output in list). Defaults to 1.

doped.utils.plotting.plot_chemical_potential_table(ax: Axes, chempots: dict[str, float], cellLoc: Literal['left', 'center', 'right'] = 'left', el_refs: dict[str, float] | None = None) → Table[source]

Plot a table of chemical potentials above the plot in ax.

Parameters:

ax (plt.Axes) – Axes object to plot the table in.
chempots (dict) – Dictionary of chemical potentials of the form {Element: value}.
cellLoc (str) – Alignment of text in cells. Default is “left”.
el_refs (dict) – Dictionary of elemental reference energies of the form {Element: value}. If provided, the chemical potentials are given with respect to these reference energies.

Returns:

The matplotlib.table.Table object (which has been added to the ax object).

doped.utils.plotting.transition_level_diagram(defect_thermodynamics: DefectThermodynamics, all: bool | str = 'faded', defect_subset: list[str] | str | None = None, include_site_info: bool | None = None, ylim: tuple[float, float] | None = None, show_charge_labels: bool = True, show_band_labels: bool | None = None, label_fontsize: float | None = None, column_width: float = 0.4, figsize: tuple[float, float] | None = None, filename: str | Path | None = None)[source]

Produce a vertical transition level diagram for a DefectThermodynamics object, with one column per defect and short horizontal lines marking each charge transition level position within the host band gap.

The valence band maximum (self.vbm) is at 0 eV (blue shaded region) and the conduction band minimum (self.vbm + self.band_gap) is shown in the orange shaded region at the top of the plot. Within each defect column, each transition level is drawn as a short horizontal line, labelled with the charge state transition (e.g. (+1/0))(if show_charge_labels is True; default). Metastable charge states are denoted with a * in the label, as in the DefectThermodynamics methods.

Parameters:

defect_thermodynamics (DefectThermodynamics) – DefectThermodynamics object containing the defects to plot.
all (bool, str) –
Controls inclusion of single-electron transition levels involving metastable defect charge states (denoted with * in the labels). Mostly equivalent to all in get_transition_levels(). Allowed values:
- "faded" (default): show all single-electron TLs, with metastable-containing TLs drawn as faded lines without labels (keeps the plot uncluttered).
- "faded_labels": same as "faded" but with labels drawn for the faded metastable TLs too.
- True: show all single-electron TLs at full opacity.
- False: show only the thermodynamic ground-state transition levels (i.e. those visible on the standard defect formation energy diagram).
defect_subset (list[str], str) – If provided, only defects whose name contains at least one of the given substrings are plotted (e.g. ["v_", "Te_Cd"] would keep all vacancies plus Te_Cd). A bare string is treated as a single-element list. (Default: None – all defects)
include_site_info (bool, None) – Whether to include site info in defect names in the column headers (e.g. $V_{Cd_{Td}}$ rather than $V_{Cd}$ ). If None (default), site info is omitted unless needed to disambiguate non-grouped defects with the same name (i.e. inequivalent sites for the same defect type). If False, site info is never included. If True, site info is shown for all defect names. In all cases, if duplicate defect names remain, “-a”, “-b”, “-c” etc. are appended to the names to differentiate them.
ylim (tuple) – Energy axis limits in eV (relative to VBM at 0). Defaults to (-0.05 * band_gap, 1.05 * band_gap).
show_charge_labels (bool) – Whether to label each transition level with its charge states (e.g. "(+1/0)"). Defaults to True.
show_band_labels (bool) – Whether to draw the “VBM” and “CBM” labels in the blue/orange band-edge shaded zones. If None (default), they are shown only if they would not overlap any transition level label (with the right side of the plot tried first, then the left); if both sides would clash they are hidden. True forces them on the right; False hides them.
label_fontsize (float) – Font size for the charge transition level labels. Defaults to ~90% of the current font.size rcParam. Can be a useful parameter to tune for busy plots.
column_width (float) – Width (in axes units) of the horizontal line segments inside each defect column, on a scale where the column spacing is 1. Defaults to 0.4. Can be a useful parameter to tune for busy plots.
figsize (tuple) – (width, height) of the figure in inches. Defaults to a width that scales with the number of defects.
filename (PathLike) – If set, save the figure to this path. (Default: None)

Returns:

matplotlib Figure object.

doped.utils.stenciling module

Utility functions to re-generate a relaxed defect structure in a different supercell.

Re-generate a (relaxed) defect structure in a (arbitrarily) different supercell.

This function takes the relaxed defect structure of the input DefectEntry (from DefectEntry.defect_supercell), or input Structure objects (if defect_entry is a tuple, see argument descriptions) and re-generates it in the target_supercell structure (which may be smaller or larger than the original supercell), using the bulk supercell to intelligently pad out the additional missing positions in the new supercell as needed. The defect is placed at the closest possible position to target_frac_coords (default is the supercell centre = [0.5, 0.5, 0.5]). In most cases, the generated supercell should correspond to the same supercell basis / tiling as the input target_supercell. In some cases, however, this is not possible (in which case a warning is thrown), and so this function also returns the corresponding bulk supercell for the generated supercell (which should be the same for each generated defect supercell given the same target_supercell and base supercell for defect_entry, see notes below).

target_supercell should be the same host crystal structure, just with different supercell dimensions. The stenciling algorithm is typically robust to small differences in lattice parameters (i.e. volume per atom) and bond lengths, but large differences in these between the original and target bulk supercells can cause issues.

Briefly, this function works by:

Translating the defect site to the centre of the original supercell (and applying to the bulk supercell as well).
Identifying a super-supercell of the original supercell which fully encompasses the target supercell (regardless of orientation).
Generate this super-supercell, using one copy of the original defect supercell (DefectEntry.defect_supercell), with the rest of the sites (outside of the original defect supercell box, with the defect translated to the centre) populated using the bulk supercell (DefectEntry.bulk_supercell). Same for translated bulk supercell.
Orient these super-supercells to (try) match the orientation of target_supercell, to (try) avoid outputting stenciled supercells with different primitive cell tiling (i.e. different structural bases) to the target supercell.
Translate the defect site in these super-supercells to the Cartesian coordinates of the centre of target_supercell, then stencil out all sites in the target_supercell portion of the super-supercell, accounting for possible site displacements in the relaxed defect supercell (e.g. if target_supercell has a different shape and does not fully encompass the original defect supercell). This is done by scanning over possible combinations of sites near the boundary regions of the target_supercell portion (using edge_tol and min_dist_tol_factor etc), and identifying the combination which maximises the minimum inter-atomic distance in the new supercell (i.e. the most bulk-like arrangement). Same for bulk super-supercell.
Re-orient this new stenciled supercell (and the corresponding bulk) to (attempt to) match the orientation and site positions of target_supercell.
If target_frac_coords is not False, scan over all symmetry operations of target_supercell and apply that which places the defect site closest to target_frac_coords, to both the defect and bulk supercells.

Note: While this function tries to ensure that the generated defect/bulk supercell position basis exactly matches that of target_supercell, this is not guaranteed. A mismatch can arise due to different tiling of primitive cells within target_supercell and the original defect supercell, which are symmetry-equivalent for the bulk (pristine) supercell but not for the defect supercell (due to the broken periodicity/symmetry). The generated supercell is guaranteed to have the exact same lattice parameters etc. This is perfectly fine if it occurs (in which case a warning will be thrown, unless check_bulk is False), just will require the use of a matching bulk/reference supercell when parsing (rather than the input target_supercell) to avoid any issues with finite-size corrections and defect site-matching – doped will also throw a warning about this when parsing if a non-matching bulk supercell is used anyway. This function returns the corresponding bulk supercell which should be used for parsing defect calculations with the generated supercell. Of course, if generating multiple defects in the same target_supercell with this issue occurring, only one such bulk supercell calculation should be required (should correspond to the same bulk supercell in each case).

We note that the algorithm employed here is not guaranteed to be deterministic. It can depend on site orderings, small differences in cell definitions etc. For these small differences, however, the returned stenciled supercell should still be effectively the same, just with small differences in atomic positions near the cell boundaries due to defect-induced strain (which for reasonably large original & stenciling supercells should be negligible).

Parameters:

defect_entry (DefectEntry | tuple[Structure, |Structure|(, np.ndarray)?]) – A DefectEntry object for which to re-generate the relaxed structure (taken from DefectEntry.defect_supercell) in the target_supercell lattice. Alternatively, a tuple of (defect_supercell, bulk_supercell) structures can be provided, in which case the defect fractional coordinates are determined automatically (by comparing the two supercells). The defect fractional coordinates can optionally be provided as a third element with the tuple input option (as a numpy array or list), to skip auto-determination in this case.
target_supercell (Structure) – The supercell structure to re-generate the relaxed defect structure in.
check_bulk (bool) – Whether to check if the generated defect/bulk supercells have different atomic position bases to target_supercell (as described above) – if so, a warning will be printed (unless check_bulk is False). Default is True.
target_frac_coords (np.ndarray | list[float] | bool) – The fractional coordinates to target for defect placement in the new supercell. If just set to True (default), will try to place the defect nearest to the centre of the superset cell (i.e. target_frac_coords = [0.5, 0.5, 0.5]), as is default in doped defect generation. Note that defect placement is harder in this case than in generation with DefectsGenerator, as we are not starting from primitive cells, and we are working with relaxed geometries. If False, does not attempt any defect re-positioning.
edge_tol_range (float | range | list | np.ndarray | None) – A range or list of tolerances (in Angstrom) for site displacements at the edge of the stenciled supercell to scan over, when determining the best match of sites to stencil out in the new supercell (of target_supercell dimension). Default is None, in which case the default range is used (0.2 to 4 Å in 0.2 Å increments). Once a stenciling solution which satisfies the min_dist_tol_factor tolerance is found for a given edge_tol, the search is terminated. See stencil_target_cell_from_big_cell for further details.
min_dist_tol_factor_range (float | range | list | np.ndarray | None) – A range or list of tolerance factors for the minimum interatomic distance (relative to the bulk_min_bond_length) at the edge region of the stenciled defect supercell to scan over, when determining the best match of sites to stencil out in the new supercell. Default is None, in which case the default range is used (0.95 to 0.5 in 0.05 increments). See stencil_target_cell_from_big_cell for further details.
min_dist_warning_tol_factor (float) – A tolerance factor for the minimum interatomic distance (relative to the bulk_min_bond_length) in the stenciled defect supercell to use as a warning threshold. If the minimum interatomic distance near the edge of the stenciled defect supercell is less than this threshold, a warning is issued. Default is 0.9 (i.e. 90% of the bulk_min_bond_length).
orientation_template_radii_range (float | range | list | np.ndarray | None) – A range or list of scale factors (relative to the Wigner-Seitz radius of target_supercell) to scan over when constructing the template sub-structures of target_supercell and the bulk super-supercell used for pre-stenciling orientation matching (to try to ensure matching supercell atomic bases (i.e. tiling of primitive cells in the supercells) in the output stenciled cells with target_supercell). Default is None, in which case the default test range of [0.8, 1.0, 0.6, 0.4, 1.2] is used. It is expected that this parameter should rarely be required to tune.
show_pbar (bool) – Whether to show a tqdm progress bar. Default is True.

Returns:

The re-generated defect supercell in the target_supercell lattice, and the corresponding bulk/reference supercell for the generated defect supercell (see explanations above).

Return type:

tuple[Structure, Structure]

doped.utils.stenciling.is_within_frac_bounds(lattice: Lattice, cart_coords: ndarray | list[float], tol: float = 1e-05) → bool[source]

Check if a given Cartesian coordinate is inside the unit cell defined by the lattice object.

Parameters:

lattice (Lattice) – Lattice object defining the unit cell.
cart_coords (np.ndarray | list[float]) – The Cartesian coordinates to check.
tol (float) – A tolerance (in Angstrom / cartesian units) for coordinates to be considered within the unit cell. If positive, expands the bounds of the unit cell by this amount, if negative, shrinks the bounds.

Returns:

Whether the Cartesian coordinates are within the fractional bounds of the unit cell, accounting for tol.

Return type:

bool

Given the input big_supercell and target_supercell (which should be fully encompassed by the former), stencil out the sites in big_supercell which correspond to the sites in target_supercell (i.e. are within the Cartesian bounds of target_supercell).

Note that this function assumes that the defect is roughly centred within big_supercell (i.e. near [0.5, 0.5, 0.5])! The midpoints of target_supercell and big_supercell are then aligned within this function, before stenciling.

We need to ensure the appropriate number of sites (and their composition) are taken, and that the sites we choose are appropriate for the new supercell (i.e. that if we have e.g. an in-plane contraction, we don’t take duplicate atoms that then correspond to tiny inter-atomic distances in the new supercell due to imperfect stenciling under PBC – so we can’t simply take the atoms that are closest to the defect). So, here we scan over possible choices of atoms to include, and take the combination which maximises the minimum inter-atomic distance in the new supercell, when accounting for PBCs.

Note that differences between the lattice parameters (volumes per atom) of big_supercell and target_supercell can cause issues with the stenciling approach herein, particularly when target_composition is not supplied and/or for defective big_supercell cells.

Parameters:

big_supercell (Structure) – The supercell structure which fully encompasses target_supercell, from which to stencil out the sites.
target_supercell (Structure) – The supercell structure giving the cell dimensions to stencil out from big_supercell.
target_composition (Composition | None) – Expected composition of the output stenciled cell (used to determine candidate sites to stencil out). Auto-determined by comparing big_supercell and target_supercell if None (default).
edge_tol_range (float | range | list | np.ndarray | None) – A range or list of tolerances (in Angstrom) for site displacements at the edge of the stenciled supercell to scan over, when determining the best match of sites to stencil out in the new supercell (of target_supercell dimension). Default is None, in which case the default range is used (0.2 to 4 Å in 0.2 Å increments). In the stenciling search, we scan over edge_tol_range first, before iterating over min_dist_tol_factor_range. Once a stenciling solution which satisfies the min_dist_tol_factor tolerance is found for a given edge_tol, the search is terminated.
bulk_min_bond_length (float) – The minimum interatomic distance in the bulk supercell. Default is None, in which case it is calculated from target_supercell.
min_dist_tol_factor_range (float | range | list | np.ndarray | None) – A range or list of tolerance factors for the minimum interatomic distance (relative to the bulk_min_bond_length) at the edge region of the stenciled defect supercell to scan over, when determining the best match of sites to stencil out in the new supercell. Default is None, in which case the default range is used (0.95 to 0.5 in 0.05 increments). In the stenciling search, we scan over edge_tol_range first, before iterating over min_dist_tol_factor_range. Once a stenciling solution which satisfies the min_dist_tol_factor tolerance is found for a given edge_tol, the search is terminated.
min_dist_warning_tol_factor (float) – A tolerance factor for the minimum interatomic distance (relative to the bulk_min_bond_length) in the stenciled defect supercell to use as a warning threshold. If the minimum interatomic distance near the edge of the stenciled defect supercell is less than this threshold, a warning is issued. Default is 0.9 (i.e. 90% of the bulk_min_bond_length).
pbar (tqdm) – tqdm progress bar object to update (for internal doped usage). Default is None.

Returns:

The stenciled supercell structure.

Return type:

Structure

doped.utils.supercells module

Utility code and functions for generating & analysing defect supercells.

doped.utils.supercells.cell_metric(cell_matrix: ndarray, target: str = 'SC', rms: bool = True, eff_cubic_length: float | None = None) → float[source]

Calculates the deviation of the given cell matrix from an ideal simple cubic (if target = “SC”) or face-centred cubic (if target = “FCC”) matrix, by evaluating the root mean square (RMS) difference of the vector lengths from that of the idealised values (i.e. the corresponding SC/FCC lattice vector lengths for the given cell volume).

For target = “SC”, the idealised lattice vector length is the effective cubic length (i.e. the cube root of the volume), while for “FCC” it is 2^(1/6) (~1.12) times the effective cubic length.

This is an expanded version of the cell metric function in ASE (get_deviation_from_optimal_cell_shape), described in https://ase-lib.org/examples_generated/tutorials/defects.html which previously did not account for rotational invariance (now fixed; https://gitlab.com/ase/ase/-/merge_requests/3404, https://gitlab.com/ase/ase/-/merge_requests/3616).

Parameters:

cell_matrix (np.ndarray) – Cell matrix for which to calculate the cell metric.
target (str) – Target cell shape, for which to calculate the normalised deviation score from. Either “SC” for simple cubic or “FCC” for face-centred cubic. Default = “SC”
rms (bool) – Whether to return the root mean square (RMS) difference of the vector lengths from that of the idealised values (default), or just the mean square difference (to reduce computation time when scanning over many possible matrices). Default = True
eff_cubic_length (float) – Effective cubic length of the cell matrix (to reduce computation time during looping). Default = None

Returns:

Cell metric (0 is perfect score).

Return type:

float

doped.utils.supercells.find_ideal_supercell(cell: ndarray, target_size: int, limit: int = 2, clean: bool = True, return_min_dist: bool = False, verbose: bool = False) → ndarray | tuple[ndarray, float][source]

Given an input cell matrix (e.g. Structure.lattice.matrix or Atoms.cell) and chosen target_size (size of supercell in number of cells), finds an ideal supercell matrix (P) that yields the largest minimum image distance (i.e. minimum distance between periodic images of sites in a lattice), while also being as close to cubic as possible.

Supercell matrices are searched for by first identifying the ideal (fractional) transformation matrix (P) that would yield a perfectly cubic supercell with volume equal to target_size, and then scanning over all matrices where the elements are within +/-limit of the ideal P matrix elements (rounded to the nearest integer). For relatively small target_sizes (<100) and/or cells with mostly similar lattice vector lengths, the default limit of +/-2 performs very well. For larger target_sizes, cells with very different lattice vector lengths, and/or cases where small differences in minimum image distance are very important, a larger limit may be required (though typically only improves the minimum image distance by 1-6%).

This is also known as the Shortest Vector Problem (SVP), and has no known analytical solution, requiring enumeration type approaches. https://wikipedia.org/wiki/Lattice_problem#Shortest_vector_problem_%28SVP%29

Note that this function is used by default to generate defect supercells with the doped DefectsGenerator class, unless specific supercell settings are used.

Parameters:

cell (np.ndarray) – Unit cell matrix for which to find a supercell.
target_size (int) – Target supercell size (in number of cells).
limit (int) – Supercell matrices are searched for by first identifying the ideal (fractional) transformation matrix (P) that would yield a perfectly SC/FCC supercell with volume equal to target_size, and then scanning over all matrices where the elements are within +/-limit of the ideal P matrix elements (rounded to the nearest integer). (Default = 2)
clean (bool) – Whether to return the supercell matrix which gives the ‘cleanest’ supercell (according to _lattice_matrix_sort_func; most symmetric, with mostly positive diagonals and c >= b >= a). (Default = True)
return_min_dist (bool) – Whether to return the minimum image distance (in Å) as a second return value. (Default = False)
verbose (bool) – Whether to print out extra information about the supercell search. (Default = False)

Returns:

The supercell transformation matrix (P), and if return_min_dist is True, the minimum image distance (in Å).

Return type:

np.ndarray | tuple[np.ndarray, float]

doped.utils.supercells.get_min_image_distance(structure: Structure) → float[source]

Get the minimum image distance (i.e. minimum distance between periodic images of sites in a lattice) for the input structure.

This is also known as the Shortest Vector Problem (SVP), and has no known analytical solution, requiring enumeration type approaches. https://wikipedia.org/wiki/Lattice_problem#Shortest_vector_problem_%28SVP%29

Parameters:: structure (Structure) – Structure object.
Returns:: Minimum image distance.
Return type:: float

doped.utils.supercells.get_pmg_cubic_supercell_dict(struct: Structure, uc_range: tuple = (1, 200)) → dict[source]

Get a dictionary of (near-)cubic supercell matrices for the given structure and range of numbers of unit cells (in the supercell).

Returns a dictionary of format:

{Number of Unit Cells:
    {"P": transformation matrix,
     "min_dist": minimum image distance}
}

for (near-)cubic supercells generated by the pymatgen CubicSupercellTransformation class. If a (near-)cubic supercell cannot be found for a given number of unit cells, then the corresponding dict value will be set to an empty dict.

Parameters:

struct (Structure) – Structure to generate supercells for.
uc_range (tuple) – Range of numbers of unit cells to search over.

Returns:

{Number of Unit Cells: {"P": transformation matrix, "min_dist": minimum image distance}}

Return type:

dict

doped.utils.supercells.min_dist(structure: Structure, ignored_species: list[str] | None = None) → float[source]

Return the minimum interatomic distance in a structure (ignoring any zero distances).

Uses numpy vectorisation for fast computation.

Parameters:

structure (Structure) – The structure to check.
ignored_species (list[str]) – A list of species symbols to ignore when calculating the minimum interatomic distance. Default is None (don’t ignore any species).

Returns:

The minimum interatomic distance in the structure.

Return type:

float

doped.utils.symmetry module

Utility code and functions for symmetry analysis of structures and defects.

doped.utils.symmetry.apply_symm_op_to_site(symm_op: SymmOp, site: PeriodicSite, fractional: bool = False, rotate_lattice: Lattice | bool = True, just_unit_cell_frac_coords: bool = False) → PeriodicSite[source]

Apply the given symmetry operation to the input site (not in place) and return the new site.

By default, also rotates the lattice accordingly. If you want to apply the symmetry operation but keep the same lattice definition, set rotate_lattice=False.

Parameters:

symm_op (SymmOp) – pymatgen SymmOp object.
site (PeriodicSite) – pymatgen PeriodicSite object.
fractional (bool) – If the SymmOp is in fractional or Cartesian (default) coordinates (i.e. to apply to site.frac_coords or site.coords). Default: False
rotate_lattice (Lattice | bool) – Either a pymatgen Lattice object (to use as the new lattice basis of the transformed site, which can be provided to reduce computation time when looping) or True/False. If True (default), the SymmOp rotation matrix will be applied to the input site lattice, or if False, the original lattice will be retained.
just_unit_cell_frac_coords (bool) – If True, just returns the fractional coordinates of the transformed site (rather than the site itself), within the unit cell. Default: False

Returns:

Site with the symmetry operation applied.

Return type:

PeriodicSite

doped.utils.symmetry.apply_symm_op_to_struct(symm_op: SymmOp, struct: Structure, fractional: bool = False, rotate_lattice: bool = True) → Structure[source]

Apply a symmetry operation to a structure and return the new structure.

This differs from pymatgen’s apply_operation method in that it does not apply the operation in place as well (i.e. does not modify the input structure), which avoids the use of unnecessary and slow Structure.copy() calls, making the structure manipulation / symmetry analysis functions more efficient. Also fixes an issue when applying fractional symmetry operations.

By default, also rotates the lattice accordingly. If you want to apply the symmetry operation to the sites but keep the same lattice definition, set rotate_lattice=False.

Parameters:

symm_op – pymatgen SymmOp object.
struct – pymatgen Structure object.
fractional – If the SymmOp is in fractional or Cartesian (default) coordinates (i.e. to apply to site.frac_coords or site.coords). Default: False
rotate_lattice – If the lattice of the input structure should be rotated according to the symmetry operation. Default: True.

Returns:

Structure with the symmetry operation applied.

Return type:

Structure

doped.utils.symmetry.are_equivalent_lattices(lattice_1: Lattice | Structure, lattice_2: Lattice | Structure, ltol: float = 0.005, atol: float = 1) → bool[source]

Check if two lattices are (symmetry-)equivalent, allowing for different cell sizes.

Parameters:

lattice_1 (Lattice | Structure) – The first lattice to check for equivalence.
lattice_2 (Lattice | Structure) – The second lattice to check for equivalence.
ltol (float) – Fractional tolerance for matching lattice vector lengths. Defaults to 5e-3 (i.e. 0.5% tolerance).
atol (float) – Tolerance for matching angles. Defaults to 1 degree.

Returns:

True if the two lattices are (symmetry-)equivalent, False otherwise.

Return type:

bool

doped.utils.symmetry.cached_simplify(eq)[source]: Cached simplification function for sympy equations, for efficiency.

doped.utils.symmetry.cached_solve(equation, variable)[source]: Cached solve function for sympy equations, for efficiency.

doped.utils.symmetry.cluster_coords(fcoords: ArrayLike, structure: Structure | Lattice, dist_tol: float = 0.01, method: str = 'single', criterion: str = 'distance') → ndarray[source]

Cluster fractional coordinates based on their distances (using scipy functions) and return the cluster numbers (as an array matching the shape and order of fcoords).

method chooses the clustering algorithm to use with linkage() ("single" by default, matching the scipy default), along with a dist_tol distance tolerance in Å. "single" corresponds to the Nearest Point algorithm and is the recommended choice for method when dist_tol is small, but can be sensitive to how many fractional coordinates are included in fcoords (allowing for daisy-chaining of sites to give large spaced-out clusters), while "average" or "complete" (furthest point algorithm) are good choices to avoid this issue. "centroid"/"median"/"ward" should not be used for method as they assume a flat Euclidean space, which is violated with PBC distances.

See the scipy API docs for more info.

Parameters:

fcoords (ArrayLike) – Fractional coordinates to cluster.
structure (Structure | Lattice) – Structure or Lattice to which the fractional coordinates correspond.
dist_tol (float) – Distance tolerance for clustering, in Å (default: 0.01). For the most part, fractional coordinates with distances less than this tolerance will be clustered together (when method = "single", giving the Nearest Point algorithm, as is the default).
method (str) – Clustering algorithm to use with linkage(). Default is "single" (recommended for small dist_tol), while "average" or "complete" are recommended with medium/large dist_tol (e.g. for candidate interstitial site clustering or defect site clustering (for determining defect site competition)). "centroid"/"median"/"ward" should not be used for as they assume a flat Euclidean space, which is violated with PBC distances.
criterion (str) – Criterion to use for flattening hierarchical clusters from the linkage matrix, used with fcluster(). Default: "distance".

Returns:

Array of cluster numbers, matching the shape and order of fcoords (i.e. corresponding to the index/number of the cluster to which that fractional coordinate belongs).

Return type:

np.ndarray

doped.utils.symmetry.cluster_sites_by_dist_tol(sites: Iterable[PeriodicSite | ndarray], structure: Structure | Lattice, dist_tol: float = 0.01, method: str = 'single', criterion: str = 'distance') → list[PeriodicSite | ndarray][source]

Cluster sites based on their distances (using cluster_coords).

Parameters:

sites (Iterable[PeriodicSite | np.ndarray]) – Sites to cluster, as an iterable of PeriodicSite objects or fractional coordinates.
structure (Structure | Lattice) – Structure or Lattice to which the sites correspond.
dist_tol (float) – Distance tolerance for clustering, in Å (default: 0.01).
method (str) – Clustering algorithm to use with scipy's linkage() clustering function in cluster_coords. Default is "single", which is the scipy default and is typically recommended when dist_tol is small. See the docstrings and source code of cluster_coords() for more details.
criterion (str) – Criterion to use for flattening hierarchical clusters from the linkage matrix, used with fcluster(). Default: "distance".

Returns:

List of clustered sites, as PeriodicSite objects or fractional coordinates depending on the input sites type.

Return type:

list[PeriodicSite | np.ndarray]

doped.utils.symmetry.doped_cluster_frac_coords(fcoords: ArrayLike, structure: Structure, tol: float = 0.55, symm_pref_dist_factor: float = 0.85, method: str = 'average', criterion: str = 'distance') → ndarray[source]

Cluster fractional coordinates that are within a certain distance tolerance of each other, and return the cluster site.

Modified from the pymatgen-analysis-defects` function as follows: For each site cluster, the possible sites to choose from are the sites in the cluster and the cluster midpoint (average position). Of these sites, the site with the highest symmetry, and then largest min_dist (distance to any host lattice site), is chosen – if its min_dist is no more than symm_pref_dist_factor (0.85 by default) times the largest possible min_dist. This is because we want to favour the higher symmetry interstitial sites (as these are typically the more intuitive sites for placement, cleaner, easier for analysis etc, and work well when combined with ShakeNBreak or other structure-searching techniques to account for symmetry-breaking), but also interstitials are often lowest-energy when furthest from host atoms (i.e. in the largest interstitial voids – particularly for fully-ionised charge states), and so this approach tries to strike a balance between these two goals.

In pymatgen-analysis-defects, the average cluster position is used, which breaks symmetries and is less easy to manipulate in the following interstitial generation functions. pymatgen-analysis-defects also uses the default "single" method for site clustering, which can lead to large unwanted daisy-chaining effects, unintentionally grouping interstitials with distances far larger than tol.

Parameters:

fcoords (ArrayLike) – Fractional coordinates of points to cluster.
structure (Structure) – The host structure.
tol (float) – Distance tolerance for clustering Voronoi nodes. Default is 0.55 Å.
symm_pref_dist_factor (float) – Minimum acceptable ratio of distance to host atoms for symmetry-favoured sites vs distance-to-host-favoured sites, for which to prefer symmetry-favoured sites. Default is 0.85.
method (str) – Clustering algorithm to use with linkage(). Default is "average", which is typically better than the scipy default of "single for interstitial generation, as it avoids unintentional daisy-chaining effects. Another reasonable choice is "complete", which ensures that no two sites in a given cluster are more than tol apart. See the docstrings and source code of cluster_coords() for more details. "centroid"/"median"/"ward" should not be used for as they assume a flat Euclidean space, which is violated with PBC distances.
criterion (str) – Criterion to use for flattening hierarchical clusters from the linkage matrix, used with fcluster() Default is "distance".

Returns:

Clustered fractional coordinates.

Return type:

np.ndarray

doped.utils.symmetry.get_BCS_conventional_structure(structure: Structure, pbar: tqdm | None = None, return_wyckoff_dict: bool = False) → tuple[Structure, list[int]] | tuple[Structure, list[int], dict[str, list[list[Expr]]]][source]

Get the conventional crystal structure of the input structure, according to the Bilbao Crystallographic Server (BCS) definition.

Also returns an array of the lattice vector swaps (used with swap_axes) to convert from the spglib (SpaceGroupAnalyzer) conventional structure definition to the BCS definition.

Parameters:

structure (Structure) – Structure for which to get the corresponding BCS conventional crystal structure.
pbar (ProgressBar) – tqdm progress bar object, to update progress. Default is None.
return_wyckoff_dict (bool) – Whether to return the Wyckoff label dict (as {Wyckoff label: coordinates}).

Returns:

A tuple of the BCS conventional structure of the input structure, the lattice vector swapping array and, if return_wyckoff_dict is True, the Wyckoff label dict.

Return type:

tuple[Structure, np.ndarray] | tuple[Structure, np.ndarray, dict[str, np.ndarray]]

doped.utils.symmetry.get_all_equiv_sites(frac_coords: ArrayLike, structure: Structure, symprec: float = 0.01, dist_tol_factor: float = 1.0, species: str = 'X', just_frac_coords: bool = False, return_symprec_and_dist_tol_factor: bool = False, fixed_symprec_and_dist_tol_factor: bool = False, verbose: bool = False) → list[PeriodicSite | ndarray] | tuple[list[PeriodicSite | ndarray], float, float][source]

Get a list of all equivalent sites of the input fractional coordinates in structure.

Tries to use hashing and caching to accelerate if possible.

Parameters:

frac_coords (ArrayLike) – Fractional coordinates to get equivalent sites of.
structure (Structure) – Structure to use for the lattice, to which the fractional coordinates correspond, and for determining symmetry operations if not provided.
symprec (float) – Symmetry precision to use for determining symmetry operations. Default is 0.01. If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values), and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
dist_tol_factor (float) – Distance tolerance for clustering generated sites (to ensure they are truly distinct), as a multiplicative factor of symprec. Default is 1.0 (i.e. dist_tol = symprec, in Å). If fixed_symprec_and_dist_tol_factor is False (default), this value will also be automatically adjusted if necessary (up to 10x, down to 0.1x)(after symprec adjustments) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled dist_tol_factor (and symprec) values, and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
species (str) – Species to use for the equivalent sites (default: “X”).
just_frac_coords (bool) – If True, just returns the fractional coordinates of the equivalent sites (rather than pymatgen PeriodicSite objects). Default: False.
return_symprec_and_dist_tol_factor (bool) – If True, returns the final symmetry precision and distance tolerance factor used for the equivalent site generation (see symprec and dist_tol_factor argument descriptions). Default is False.
fixed_symprec_and_dist_tol_factor (bool) – If True, uses the provided symprec and dist_tol_factor values without any automatic adjustments (see symprec and dist_tol_factor argument descriptions). Default is False.
verbose (bool) – If True, prints information on the trialled symprec and dist_tol_factor values, and the identified equivalent sites. Default is False.

Returns:

List of equivalent sites of the input fractional coordinates in: structure, either as pymatgen PeriodicSite objects or as fractional coordinates (depending on the value of just_frac_coords).

If return_symprec_and_dist_tol_factor is True (default is False), also returns the final symprec and dist_tol_factor values used for the equivalent site generation.

Return type:

list[PeriodicSite | np.ndarray]

doped.utils.symmetry.get_clean_structure(structure: Structure, return_T: bool = False, dist_precision: float = 0.001, niggli_reduce: bool = True) → Structure | tuple[Structure, ndarray][source]

Get a ‘clean’ version of the input structure by searching over equivalent cells, and finding the most optimal according to _lattice_matrix_sort_func (most symmetric, with mostly positive diagonals and c >= b >= a).

Parameters:

structure (Structure) – Structure object.
return_T (bool) – Whether to return the transformation matrix from the original structure lattice to the new structure lattice (T * Orig = New). (Default = False)
dist_precision (float) – The desired distance precision in Å for rounding of lattice parameters and fractional coordinates. (Default: 0.001)
niggli_reduce (bool) – Whether to Niggli reduce the lattice before searching for the optimal lattice matrix. If this is set to False, we also skip the search for the best positive determinant lattice matrix. (Default: True)

Returns:

The ‘clean’ version of the input structure, or a tuple of the ‘clean’ structure and the transformation matrix from the original structure lattice to the new structure lattice (T * Orig = New).

Return type:

Structure | tuple[Structure, np.ndarray]

doped.utils.symmetry.get_conv_cell_site(defect_entry: DefectEntry) → PeriodicSite | None[source]

Gets an equivalent site of the defect entry in the conventional structure of the host material. If the conventional_structure attribute is not defined for defect_entry, then it is generated using SpacegroupAnalyzer and then reoriented to match the Bilbao Crystallographic Server’s conventional structure definition.

Parameters:: defect_entry – DefectEntry object.
Returns:: The equivalent site of the defect entry in the conventional structure of the host material, or None if not found.
Return type:: PeriodicSite | None

doped.utils.symmetry.get_distance_matrix(fcoords: ArrayLike, lattice: Lattice) → ndarray[source]

Get a matrix of the distances between the input fractional coordinates in the input lattice.

Parameters:

fcoords (ArrayLike) – Fractional coordinates to get distances between.
lattice (Lattice) – Lattice for the fractional coordinates.

Returns:

Matrix of distances between the input fractional coordinates in the input lattice.

Return type:

np.ndarray

doped.utils.symmetry.get_equiv_frac_coords_in_primitive(frac_coords: ArrayLike, primitive: Structure, supercell: Structure, symprec: float = 0.01, dist_tol_factor: float = 1.0, equiv_coords: bool = True, return_symprec_and_dist_tol_factor: bool = False, fixed_symprec_and_dist_tol_factor: bool = False, verbose: bool = False) → list[ndarray] | ndarray | tuple[list[ndarray] | ndarray, float, float] | None[source]

Get equivalent fractional coordinates of frac_coords (in supercell) in the given primitive cell.

Returns a list of equivalent fractional coords in the primitive cell if equiv_coords is True (default).

Note that there may be multiple possible symmetry-equivalent sites, all of which are returned if equiv_coords is True, otherwise the first site in the list (sorted using _frac_coords_sort_func) is returned.

Parameters:

frac_coords (ArrayLike) – Fractional coordinates in the supercell, for which to get equivalent coordinates in the primitive cell.
primitive (Structure) – Primitive cell structure.
supercell (Structure) – Supercell structure.
symprec (float) – Symmetry precision to use for determining symmetry operations. Default is 0.01. If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values), and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
dist_tol_factor (float) – Distance tolerance for clustering generated sites (to ensure they are truly distinct), as a multiplicative factor of symprec. Default is 1.0 (i.e. dist_tol = symprec, in Å). If fixed_symprec_and_dist_tol_factor is False (default), this value will also be automatically adjusted if necessary (up to 10x, down to 0.1x)(after symprec adjustments) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled dist_tol_factor (and symprec) values, and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
equiv_coords (bool) – If True, returns a list of equivalent fractional coords in the primitive cell. If False, returns the first equivalent fractional coordinates in the list, sorted using _frac_coords_sort_func. Default: True.
return_symprec_and_dist_tol_factor (bool) – If True, returns the final symmetry precision and distance tolerance factor used for the equivalent site generation (see symprec and dist_tol_factor argument descriptions). Default is False.
fixed_symprec_and_dist_tol_factor (bool) – If True, uses the provided symprec and dist_tol_factor values without any automatic adjustments (see symprec and dist_tol_factor argument descriptions). Default is False.
verbose (bool) – If True, prints information on the trialled symprec and dist_tol_factor values, and the identified equivalent sites. Default is False.

Returns:

List of equivalent fractional coordinates in the primitive cell, or the first equivalent fractional coordinate in the list (sorted using _frac_coords_sort_func), depending on the value of equiv_coords. If return_symprec_and_dist_tol_factor is True, also returns the final symprec and dist_tol_factor used for the equivalent site generation.

Return type:

list[np.ndarray] | np.ndarray | tuple[list[np.ndarray] | np.ndarray, float, float]

doped.utils.symmetry.get_min_dist_between_equiv_sites(site_1: PeriodicSite | Sequence[float] | Defect | DefectEntry, site_2: PeriodicSite | Sequence[float] | Defect | DefectEntry, structure: Structure | None = None, symprec: float = 0.01, dist_tol_factor: float = 1.0, return_symprec_and_dist_tol_factor: bool = False, fixed_symprec_and_dist_tol_factor: bool = False, verbose: bool = False) → float | tuple[float, float, float][source]

Get the minimum distance (in Å) between equivalent sites of two input site/Defect/DefectEntry objects in a structure.

Parameters:

site_1 (PeriodicSite | Sequence[float, float, float] | Defect | DefectEntry) – First site to get equivalent sites of, to determine minimum distance to equivalent sites of site_2. Can be a PeriodicSite object, a sequence of fractional coordinates, or a Defect/DefectEntry object.
site_2 (PeriodicSite | Sequence[float, float, float] | Defect | DefectEntry) – Second site to get equivalent sites of, to determine minimum distance to equivalent sites of site_1. Can be a PeriodicSite object, a sequence of fractional coordinates, or a Defect/DefectEntry object.
structure (Structure) – Structure to use for determining symmetry-equivalent sites of site_1 and site_2. Required if site_1 and site_2 are not Defect or DefectEntry objects. Default: None.
symprec (float) – Symmetry precision to use for determining symmetry operations. Default is 0.01. If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values), and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
dist_tol_factor (float) – Distance tolerance for clustering generated sites (to ensure they are truly distinct), as a multiplicative factor of symprec. Default is 1.0 (i.e. dist_tol = symprec, in Å). If fixed_symprec_and_dist_tol_factor is False (default), this value will also be automatically adjusted if necessary (up to 10x, down to 0.1x)(after symprec adjustments) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled dist_tol_factor (and symprec) values, and setting return_symprec_and_dist_tol_factor to True will return the final symprec (and dist_tol_factor) used for the equivalent site generation.
return_symprec_and_dist_tol_factor (bool) – If True, returns the final symmetry precision and distance tolerance factor used for the equivalent site generation (see symprec and dist_tol_factor argument descriptions). Default is False.
fixed_symprec_and_dist_tol_factor (bool) – If True, uses the provided symprec and dist_tol_factor values without any automatic adjustments (see symprec and dist_tol_factor argument descriptions). Default is False.
verbose (bool) – If True, prints information on the trialled symprec and dist_tol_factor values, and the identified equivalent sites. Default is False.

Returns:

Minimum distance (in Å) between equivalent sites of site_1 and site_2, or a tuple of (minimum distance, symprec, dist_tol_factor) if return_symprec_and_dist_tol_factor is True.

Return type:

float | tuple[float, float, float]

doped.utils.symmetry.get_orientational_degeneracy(defect_entry: DefectEntry | None = None, relaxed_point_group: str | None = None, bulk_site_point_group: str | None = None, symprec: float = 0.1, bulk_symprec: float = 0.01, **kwargs) → float[source]

Get the orientational degeneracy factor for a given relaxed DefectEntry, by supplying either the DefectEntry object or the bulk- site & relaxed defect point group symbols (e.g. “Td”, “C3v” etc.).

If a DefectEntry is supplied (and the point group symbols are not), this is computed by determining the relaxed defect point symmetry and the (unrelaxed) bulk site symmetry, and then getting the ratio of their point group orders (equivalent to the ratio of partition functions or number of symmetry operations (i.e. degeneracy)).

For interstitials, the bulk site symmetry corresponds to the point symmetry of the interstitial site with no relaxation of the host structure, while for vacancies/substitutions it is simply the symmetry of their corresponding bulk site. This corresponds to the point symmetry of DefectEntry.defect, or calculation_metadata["bulk_site"]/["unrelaxed_defect_structure"].

Note: This tries to use the defect_entry.defect_supercell to determine the relaxed site symmetry. However, it should be noted that this is not guaranteed to work in all cases; namely for non-diagonal supercell expansions, or sometimes for non-scalar supercell expansion matrices (e.g. a 2x1x2 expansion)(particularly with high-symmetry materials) which can mess up the periodicity of the cell. doped tries to automatically check if this is the case, and will warn you if so.

This can also be checked by using this function on your doped generated defects:

from doped.generation import get_defect_name_from_entry
for defect_name, defect_entry in defect_gen.items():
    print(defect_name,
          get_defect_name_from_entry(defect_entry, relaxed=False),
          get_defect_name_from_entry(defect_entry), "\n")

And if the point symmetries match in each case, then using this function on your parsed relaxed DefectEntry objects should correctly determine the final relaxed defect symmetry (and orientational degeneracy) – otherwise periodicity-breaking prevents this.

If periodicity-breaking prevents auto-symmetry determination, you can manually determine the relaxed defect and bulk-site point symmetries, and/or orientational degeneracy, from visualising the structures (e.g. using VESTA)(can use get_orientational_degeneracy() to obtain the corresponding orientational degeneracy factor for given defect/bulk-site point symmetries) and setting the corresponding values in the calculation_metadata['relaxed point symmetry']/['bulk site symmetry'] and/or degeneracy_factors['orientational degeneracy'] attributes. Note that the bulk-site point symmetry corresponds to that of DefectEntry.defect, or equivalently calculation_metadata["bulk_site"]/["unrelaxed_defect_structure"], which for vacancies/substitutions is the symmetry of the corresponding bulk site, while for interstitials it is the point symmetry of the final relaxed interstitial site when placed in the (unrelaxed) bulk structure. The degeneracy factor is used in the calculation of defect/carrier concentrations and Fermi level behaviour (discussion in https://doi.org/10.1039/D2FD00043A, https://doi.org/10.1039/D3CS00432E, https://doi.org/10.1038/s41578-025-00879-y…).

Parameters:

defect_entry (DefectEntry) – DefectEntry object. (Default = None)
relaxed_point_group (str | None) – Point group symmetry (e.g. “Td”, “C3v” etc.) of the relaxed defect structure, if already calculated / manually determined. Default is None (automatically calculated by doped).
bulk_site_point_group (str | None) – Point group symmetry (e.g. “Td”, “C3v” etc.) of the defect site in the bulk, if already calculated / manually determined. For vacancies/substitutions, this should match the site symmetry label from doped when generating the defect, while for interstitials it should be the point symmetry of the final relaxed interstitial site, when placed in the bulk structure. Default is None (automatically calculated by doped).
symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries with spglib, for the relaxed point symmetry. Default is 0.1 which matches that used by the Materials Project and is larger than the pymatgen default of 0.01 to account for residual structural noise in relaxed defect supercells. You may want to adjust for your system (e.g. if there are very slight octahedral distortions etc.). If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
bulk_symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries with spglib, for the unrelaxed (bulk site) point symmetry. Default is 0.01 which matches the pymatgen default. You may want to adjust for your system (e.g. if there are very slight octahedral distortions etc.). If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor, fixed_symprec_and_dist_tol_factor, and verbose, and to point_symmetry_from_defect_entry, such as attempt_periodicity_restoration.

Returns:

Orientational degeneracy factor for the defect.

Return type:

float

doped.utils.symmetry.get_primitive_structure(structure: Structure, ignored_species: list | None = None, clean: bool = True, return_all: bool = False, **kwargs)[source]

Get a consistent/deterministic primitive structure from a pymatgen Structure.

For some materials (e.g. zinc blende), there are multiple equivalent primitive cells (e.g. Cd (0,0,0) & Te (0.25,0.25,0.25); Cd (0,0,0) & Te (0.75,0.75,0.75) for F-43m CdTe), so for reproducibility and in line with most structure conventions/definitions, take the one with the cleanest lattice and structure definition, according to _struct_sort_func.

If ignored_species is set, then the sorting function used to determine the ideal primitive structure will ignore sites with species in ignored_species.

Parameters:

structure (Structure) – Structure to get the corresponding primitive structure of.
ignored_species (list | None) – List of species to ignore when determining the ideal primitive structure. (Default: None)
clean (bool) – Whether to return a ‘clean’ version of the primitive structure, with the lattice matrix in a standardised form. (Default: True)
return_all (bool) – Whether to return all possible primitive structures tested, sorted by the sorting function. (Default: False)
**kwargs – Additional keyword arguments to pass to the get_sga function (e.g. symprec etc).

Returns:

The primitive structure of the input structure, or a list of all possible primitive structures tested, sorted by the sorting function.

Return type:

Structure | list[Structure]

doped.utils.symmetry.get_sga(struct: Structure, symprec: float = 0.01) → SpacegroupAnalyzer[source]

Get a SpacegroupAnalyzer object of the input structure, dynamically adjusting symprec if needs be.

Note that by default, magnetic symmetry (i.e. MAGMOMs) are not used in symmetry analysis in doped, as noise in these values (particularly in structures from the Materials Project) often leads to incorrect symmetry determinations. To use magnetic moments in symmetry analyses, set the environment variable USE_MAGNETIC_SYMMETRY=1 (i.e. os.environ["USE_MAGNETIC_SYMMETRY"] = "1" in Python).

Parameters:

struct (Structure) – The input structure.
symprec (float) – The symmetry precision to use (default: 0.01).

Returns:

The symmetry analyzer object.

Return type:

SpacegroupAnalyzer

doped.utils.symmetry.get_sga_and_symprec(struct: Structure, symprec: float = 0.01) → tuple[SpacegroupAnalyzer, float][source]

Get a SpacegroupAnalyzer object of the input structure, dynamically adjusting symprec if needs be, and the final successful symprec used for SpacegroupAnalyzer initialisation.

Note that by default, magnetic symmetry (i.e. MAGMOMs) are not used in symmetry analysis in doped, as noise in these values (particularly in structures from the Materials Project) often leads to incorrect symmetry determinations. To use magnetic moments in symmetry analyses, set the environment variable USE_MAGNETIC_SYMMETRY=1 (i.e. os.environ["USE_MAGNETIC_SYMMETRY"] = "1" in Python).

Parameters:

struct (Structure) – The input structure.
symprec (float) – The symmetry precision to use (default: 0.01).

Returns:

Tuple of the SpacegroupAnalyzer object and the final symprec used.

Return type:

tuple[SpacegroupAnalyzer, float]

doped.utils.symmetry.get_spglib_conv_structure(sga: SpacegroupAnalyzer) → tuple[Structure, SpacegroupAnalyzer][source]

Get a consistent/deterministic conventional structure from a SpacegroupAnalyzer object. Also returns the corresponding SpacegroupAnalyzer (for getting Wyckoff symbols corresponding to this conventional structure definition).

For some materials (e.g. zinc blende), there are multiple equivalent primitive/conventional cells, so for reproducibility and in line with most structure conventions/definitions, take the one with the lowest summed norm of the fractional coordinates of the sites (i.e. favour Cd (0,0,0) and Te (0.25,0.25,0.25) over Cd (0,0,0) and Te (0.75,0.75,0.75) for F-43m CdTe; SGN 216).

doped.utils.symmetry.get_wyckoff(frac_coords: ArrayLike, struct: Structure, equiv_sites: bool = False, symprec: float = 0.01, **kwargs) → str | tuple[source]

Get the Wyckoff label of the input fractional coordinates in the input structure. If the symmetry operations of the structure have already been computed, these can be input as a list to speed up the calculation.

Parameters:

frac_coords (ArrayLike) – Fractional coordinates of the site to get the Wyckoff label of.
struct (Structure) – Structure for which frac_coords corresponds to.
equiv_sites (bool) – If True, returns a tuple of (Wyckoff label, list of equivalent sites). Default is False.
symprec (float) – Symmetry precision to use for determining symmetry operations. Default is 0.01. If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor, fixed_symprec_and_dist_tol_factor, and verbose.

Returns:

The Wyckoff label of the input fractional coordinates in the structure. If equiv_sites is True, also returns a list of equivalent sites in the structure.

Return type:

str | tuple

doped.utils.symmetry.get_wyckoff_dict_from_sgn(sgn: int) → dict[str, list[list[Expr]]][source]

Get dictionary of {Wyckoff label: coordinates} for a given space group number.

The database used here for Wyckoff analysis (wyckpos.dat) was obtained from code written by JaeHwan Shim @schinavro (ORCID: 0000-0001-7575-4788) (https://gitlab.com/ase/ase/-/merge_requests/1035) based on the tabulated datasets in https://github.com/xtalopt/randSpg (also found at https://github.com/spglib/spglib/blob/develop/database/Wyckoff.csv). By default, doped uses the Wyckoff functionality of spglib (along with symmetry operations in pymatgen) when possible, however.

Parameters:: sgn (int) – Space group number.
Returns:: Dictionary of Wyckoff labels and their corresponding coordinates.
Return type:: dict[str, list[list[float]]]

doped.utils.symmetry.get_wyckoff_label_and_equiv_coord_list(defect_entry: DefectEntry | None = None, conv_cell_site: PeriodicSite | None = None, sgn: int | None = None, wyckoff_dict: dict | None = None) → tuple[str, list[list[float]]][source]

Return the Wyckoff label and list of equivalent fractional coordinates within the conventional cell for the input defect_entry or conv_cell_site (whichever is provided, defaults to defect_entry if both), given a dictionary of Wyckoff labels and coordinates (wyckoff_dict).

If wyckoff_dict is not provided, it is generated from the spacegroup number (sgn) using get_wyckoff_dict_from_sgn(sgn). If sgn is not provided, it is obtained from the bulk structure of the defect_entry if provided.

doped.utils.symmetry.group_order_from_schoenflies(sch_symbol)[source]

Return the order of the point group from the Schoenflies symbol.

Useful for symmetry and orientational degeneracy analysis.

doped.utils.symmetry.is_periodic_image(sites_1: Iterable[PeriodicSite | ndarray], sites_2: Iterable[PeriodicSite | ndarray], frac_tol: float = 0.01, same_image: bool = False) → bool[source]

Determine if the PeriodicSite/frac_coords in sites_1 are a periodic image of those in sites_2.

This function determines if the set of fractional coordinates in sites_1 are periodic images of those in sites_2, with only unique site matches permitted (i.e. no repeat matches; each site can only have one match).

If same_image is True, then the sites must all be of the same periodic image translation (i.e. the same rigid translation vector), such that sites_1 can be rigidly translated by any combination of lattice vectors to match the set of fractional coordinates in sites_2.

Note that the this function tests if the full set of sites is a periodic image of the other, and not just that each site in sites_1 is (individually) a periodic image of a site in sites_2 (for which the PeriodicSite.is_periodic_image method could be used).

Parameters:

sites_1 (list) – List of PeriodicSites or frac_coords arrays.
sites_2 (list) – List of PeriodicSites or frac_coords arrays.
frac_tol (float) – Fractional coordinate tolerance for comparing sites.
same_image (bool) – If True, also check that the sites are the same periodic image translation (i.e. the same rigid translation vector). Default is False.

Returns:

True if sites_1 is a periodic image of sites_2, False otherwise.

Return type:

bool

doped.utils.symmetry.point_symmetry_from_defect(defect: Defect, symprec: float = 0.01, **kwargs) → str[source]

Get the defect site point symmetry from a Defect object.

Note that this is intended only to be used for unrelaxed, as-generated Defect objects (rather than parsed defects).

Parameters:

defect (Defect) – Defect object.
symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries. Default is 0.01. If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor, fixed_symprec_and_dist_tol_factor, and verbose.

Returns:

Defect point symmetry.

Return type:

str

doped.utils.symmetry.point_symmetry_from_defect_entry(defect_entry: DefectEntry, symprec: float | None = None, relaxed: bool = True, verbose: bool | None = None, return_periodicity_breaking: bool = False, attempt_periodicity_restoration: bool = True, **kwargs) → str | tuple[str, bool][source]

Get the defect site point symmetry from a DefectEntry object.

Note: If relaxed = True (default), then this tries to use the defect_entry.defect_supercell to determine the site symmetry. This will thus give the relaxed defect point symmetry if this is a DefectEntry created from parsed defect calculations. However, it should be noted that this is not guaranteed to work in all cases; namely for non-diagonal supercell expansions, or sometimes for non-scalar supercell expansion matrices (e.g. a 2x1x2 expansion)(particularly with high-symmetry materials) which can mess up the periodicity of the cell. doped tries to automatically check if this is the case, and will warn you if so.

This can also be checked by using this function on your doped generated defects:

from doped.generation import get_defect_name_from_entry
for defect_name, defect_entry in defect_gen.items():
    print(defect_name,
          get_defect_name_from_entry(defect_entry, relaxed=False),
          get_defect_name_from_entry(defect_entry), "\n")

And if the point symmetries match in each case, then using this function on your parsed relaxed DefectEntry objects should correctly determine the final relaxed defect symmetry – otherwise periodicity-breaking prevents this.

If periodicity-breaking prevents auto-symmetry determination, you can manually determine the relaxed defect and bulk-site point symmetries, and/or orientational degeneracy, from visualising the structures (e.g. using VESTA)(can use get_orientational_degeneracy() to obtain the corresponding orientational degeneracy factor for given defect/bulk-site point symmetries) and setting the corresponding values in the calculation_metadata['relaxed point symmetry']/['bulk site symmetry'] and/or degeneracy_factors['orientational degeneracy'] attributes. Note that the bulk-site point symmetry corresponds to that of DefectEntry.defect, or equivalently calculation_metadata["bulk_site"]/["unrelaxed_defect_structure"], which for vacancies/substitutions is the symmetry of the corresponding bulk site, while for interstitials it is the point symmetry of the final relaxed interstitial site when placed in the (unrelaxed) bulk structure. The degeneracy factor is used in the calculation of defect/carrier concentrations and Fermi level behaviour (discussion in https://doi.org/10.1039/D2FD00043A, https://doi.org/10.1039/D3CS00432E, https://doi.org/10.1038/s41578-025-00879-y…).

Parameters:

defect_entry (DefectEntry) – DefectEntry object.
symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries with spglib. Default is 0.01 for unrelaxed structures, 0.1 for relaxed (to account for residual structural noise, matching that used by the Materials Project). You may want to adjust for your system (e.g. if there are very slight octahedral distortions etc.). If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
relaxed (bool) – If False, determines the site symmetry using the defect site in the unrelaxed bulk supercell (i.e. the bulk site symmetry), otherwise tries to determine the point symmetry of the relaxed defect in the defect supercell. Default is True.
verbose (bool) – If None (default) or True, prints a warning if the supercell is detected to break the crystal periodicity (and hence not be able to return a reliable relaxed point symmetry). True corresponds to higher verbosity, where information on trialled symprec and dist_tol_factor values in equivalent site generation is also printed. Default is None.
return_periodicity_breaking (bool) – If True, also returns a boolean specifying if the supercell has been detected to break the crystal periodicity (and hence not be able to return a reliable relaxed point symmetry) or not. Mainly for internal doped usage, and always False if relaxed is False. Default is False.
attempt_periodicity_restoration (bool) – If True and periodicity-breaking is detected, will attempt to restore periodicity by stenciling the relaxed defect geometry into a supercell which retains periodicity, and then getting the point symmetry for that. Default is True.
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor and fixed_symprec_and_dist_tol_factor.

Returns:

Defect point symmetry (and if return_periodicity_breaking = True, a boolean specifying if the supercell has been detected to break the crystal periodicity).

Return type:

str

doped.utils.symmetry.point_symmetry_from_site(site: PeriodicSite | ndarray | list, structure: Structure, coords_are_cartesian: bool = False, symprec: float = 0.01, **kwargs) → str[source]

Get the point symmetry of a site in a structure.

Parameters:

site (PeriodicSite | np.ndarray | list) – Site for which to determine the point symmetry. Can be a PeriodicSite object, or a list or numpy array of the coordinates of the site (fractional coordinates by default, or Cartesian if coords_are_cartesian = True).
structure (Structure) – Structure object for which to determine the point symmetry of the site.
coords_are_cartesian (bool) – If True, the site coordinates are assumed to be in Cartesian coordinates. Default is False.
symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries with spglib. Default is 0.01. You may want to adjust for your system (e.g. if there are very slight octahedral distortions etc.). If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor, fixed_symprec_and_dist_tol_factor, and verbose.

Returns:

Site point symmetry.

Return type:

str

doped.utils.symmetry.point_symmetry_from_structure(structure: Structure, bulk_structure: Structure | None = None, symprec: float | None = None, relaxed: bool = True, verbose: bool | None = None, return_periodicity_breaking: bool = False, skip_atom_mapping_check: bool = False, **kwargs) → str | tuple[str, bool][source]

Get the point symmetry of a given structure.

Note: For certain non-trivial supercell expansions, the broken cell periodicity can break the site symmetry and lead to incorrect point symmetry determination (particularly if using non-scalar supercell matrices with high symmetry materials). If the unrelaxed bulk structure (bulk_structure) is also supplied, then doped will determine the defect site and then automatically check if this is the case, and warn you if so.

This can also be checked by using this function on your doped generated defects:

from doped.generation import get_defect_name_from_entry
for defect_name, defect_entry in defect_gen.items():
    print(defect_name,
          get_defect_name_from_entry(defect_entry, relaxed=False),
          get_defect_name_from_entry(defect_entry), "\n")

And if the point symmetries match in each case, then using this function on your parsed relaxed DefectEntry objects should correctly determine the final relaxed defect symmetry – otherwise periodicity-breaking prevents this.

If bulk_structure is supplied and relaxed is set to False, then returns the bulk site symmetry of the defect, which for vacancies/substitutions is the symmetry of the corresponding bulk site, while for interstitials it is the point symmetry of the final relaxed interstitial site when placed in the (unrelaxed) bulk structure.

Parameters:

structure (Structure) – Structure object for which to determine the point symmetry.
bulk_structure (Structure) – Structure object of the bulk structure, if known. Default is None. If provided and relaxed = True, will be used to check if the supercell is breaking the crystal periodicity (and thus preventing accurate determination of the relaxed defect point symmetry) and warn you if so.
symprec (float) – Symmetry precision to use for determining symmetry operations and thus point symmetries with spglib. Default is 0.01 for unrelaxed structures, 0.1 for relaxed (to account for residual structural noise, matching that used by the Materials Project). You may want to adjust for your system (e.g. if there are very slight octahedral distortions etc.). If fixed_symprec_and_dist_tol_factor is False (default), this value will be automatically adjusted (up to 10x, down to 0.1x) until the identified equivalent sites from spglib have consistent point group symmetries. Setting verbose to True will print information on the trialled symprec (and dist_tol_factor values).
relaxed (bool) – If False, determines the site symmetry using the defect site in the unrelaxed bulk supercell (i.e. the bulk site symmetry), otherwise tries to determine the point symmetry of the relaxed defect in the defect supercell. Default is True.
verbose (bool) – If None (default) or True, prints a warning if the supercell is detected to break the crystal periodicity (and hence not be able to return a reliable relaxed point symmetry). True corresponds to higher verbosity, where information on trialled symprec and dist_tol_factor values in equivalent site generation is also printed. Default is None.
return_periodicity_breaking (bool) – If True, also returns a boolean specifying if the supercell has been detected to break the crystal periodicity (and hence not be able to return a reliable relaxed point symmetry) or not. Default is False.
skip_atom_mapping_check (bool) – If True, skips the atom mapping check which ensures that the bulk and defect supercell lattice definitions are matched (important for accurate defect site determination and charge corrections). Can be used to speed up parsing when you are sure the cell definitions match (e.g. both supercells were generated with doped). Default is False.
**kwargs – Additional keyword arguments to pass to get_all_equiv_sites, such as dist_tol_factor and fixed_symprec_and_dist_tol_factor, and point_symmetry_from_defect_entry, such as attempt_periodicity_restoration.

Returns:

Structure point symmetry (and if return_periodicity_breaking = True, a boolean specifying if the supercell has been detected to break the crystal periodicity).

Return type:

str

doped.utils.symmetry.schoenflies_from_hermann(herm_symbol)[source]: Convert from Hermann-Mauguin to Schoenflies.

doped.utils.symmetry.summed_dist(struct_a: Structure, struct_b: Structure, ignored_species: list[str] | None = None) → float[source]

Get the summed distance between closest-matched sites of two structures, in Å.

Note that this assumes the lattices of the two structures are equal!

Parameters:

struct_a – pymatgen Structure object.
struct_b – pymatgen Structure object.
ignored_species – List of species to ignore when calculating the RMS distance (default: None).

Returns:

The summed distance between the sites of the two structures, in Å.

Return type:

float

doped.utils.symmetry.swap_axes(structure: Structure, axes: list[int] | tuple[int, ...]) → Structure[source]

Swap axes of the given structure.

The new order of the axes is given by the axes parameter. For example, axes=(2, 1, 0) will swap the first and third axes.

doped.utils.symmetry.translate_structure(structure: Structure, vector: ndarray, frac_coords: bool = True, to_unit_cell: bool = True) → Structure[source]

Translate a structure and its sites by a given vector (not in place).

Parameters:

structure – pymatgen Structure object.
vector – Translation vector, fractional or Cartesian.
frac_coords – Whether the input vector is in fractional coordinates. (Default: True)
to_unit_cell – Whether to translate the sites to the unit cell. (Default: True)

Returns:

pymatgen Structure object with translated sites.

Module contents

Submodule for utility functions in doped.

doped.utils.__init__ contains utility functions and context managers for handling of warnings and multi-processing.

exception doped.utils.ParameterOrderWarning[source]

Bases: FutureWarning

Warning about the (bulk, defect) -> (defect, bulk) parameter ordering change for some functions in doped v4.0.

TODO: Remove all parameter-order warning handling in v4.1.

doped.utils.get_mp_context()[source]: Get a multiprocessing context that is compatible with the current OS.

doped.utils.get_mp_processes(processes: int | None = None)[source]: Get the number of processes to use with Pool.

doped.utils.patch_vise_for_windows()[source]

Context manager to patch vise.defaults.UserSettings._make_yaml_file_list, so that it returns an empty list.

Fixes an issue where this function gives an infinite recursive search on Windows, causing hanging.

doped.utils.pool_manager(processes: int | None = None)[source]

Context manager for multiprocessing Pool, to throw a clearer error message when RuntimeErrors are raised multiprocessing within doped is used in a python script.

See the Errors with Python Scripts section.

Parameters:: processes (int | None) – Number of processes to use with Pool. If None, will use mp.cpu_count() - 1 (i.e. one less than the number of available CPUs).
Yields:: Pool – A Pool object with the specified number of processes.

doped.utils.suppress_logging(level=50)[source]: Context manager to catch and suppress logging messages.

doped.utils.vise_handling(level=50)[source]

Tame vise/pydefect side effects, by combining suppress_logging(), warnings.catch_warnings() and patch_vise_for_windows().

The steps are ordered to handle two things that must happen _before_ vise.defaults is first imported anywhere:

vise.util.logger.get_logger is replaced with logging.getLogger, to avoid repeated vise INFO messages (and duplicate handlers) under parallelism. Only vise.util.logger is imported for this, which does _not_ pull in vise.defaults, so the patch is in effect before vise.defaults builds its module-level logger.
Import vise.defaults now, within catch_warnings() so its warnings.simplefilter("ignore", UserWarning) is reverted on exit. Otherwise this fires the first time vise.defaults is imported – which, with multiprocessing parsing, happens lazily during _result unpickling_ with Pool (importing a pydefect BandEdgeStates object -> pydefect.defaults -> vise.defaults), i.e. _outside_ any vise_handling() block.

Calling this once at doped import (below) leaves vise.defaults in sys.modules with both fixes applied, so later imports are no-ops and never re-trigger warning suppression.

doped.utils.warn_once(message: str, category: type[Warning] = <class 'UserWarning'>, key: ~typing.Any = None) → None[source]

Emit message as a warning at most once per unique (message, category, key), unless warn_once.cache_clear() is called.

Unlike Python’s default “show once per location” behaviour, this is immune to the __warningregistry__ being reset whenever the warning filters are mutated (which dependencies such as pandas do internally, defeating the default dedup when warning repeatedly in a loop – e.g. over temperatures/conditions).

key is an optional cheap hashable used to check if the warning has already been called for a given object/situation; e.g. a DefectEntry name to warn once per defect entry.

Used for the periodicity-breaking supercell and missing-degeneracy-factor warnings, which can otherwise be emitted many times in thermodynamic analysis loops.