Generating defects with doped

If you are new to computational modelling of defects, it is highly recommended that you watch the defect calculations tutorial by Seán and Joe on YouTube. This gives a brief overview of much of the theory and rationale behind our typical defect calculation workflow.

In this tutorial we show how to generate defects and VASP input files with doped, and also perform the recommended ShakeNBreak structure-searching step at the beginning of the defect workflow. The table of contents of this tutorial is shown in the left-hand sidebar, and you can click on any of the links to jump to that section.

Tip

You can run this notebook interactively through Google Colab or Binder – just click the links! If running on Colab, then you’ll need to run !pip install doped in a cell to install the package, and !git clone https://github.com/SMTG-Bham/doped to download the example data (and update paths in the code cells accordingly).

Defect Generation

from pymatgen.core.structure import Structure
from doped.generation import DefectsGenerator
# Load our relaxed bulk (host) structure:
relaxed_primitive_CdTe = Structure.from_file("CdTe/relaxed_primitive_POSCAR")

# generate defects:
defect_gen = DefectsGenerator(relaxed_primitive_CdTe)
Generating DefectEntry objects: 100.0%|██████████| [00:12,   7.95it/s]                             
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Cd         [+1,0,-1,-2]       [0.000,0.000,0.000]  4a
v_Te         [+2,+1,0,-1]       [0.250,0.250,0.250]  4c

Substitutions    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_Te            [+4,+3,+2,+1,0]        [0.250,0.250,0.250]  4c
Te_Cd            [+2,+1,0,-1,-2,-3,-4]  [0.000,0.000,0.000]  4a

Interstitials    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_i_C3v         [+2,+1,0]              [0.625,0.625,0.625]  16e
Cd_i_Td_Cd2.83   [+2,+1,0]              [0.750,0.750,0.750]  4d
Cd_i_Td_Te2.83   [+2,+1,0]              [0.500,0.500,0.500]  4b
Te_i_C3v         [+4,+3,+2,+1,0,-1,-2]  [0.625,0.625,0.625]  16e
Te_i_Td_Cd2.83   [+4,+3,+2,+1,0,-1,-2]  [0.750,0.750,0.750]  4d
Te_i_Td_Te2.83   [+4,+3,+2,+1,0,-1,-2]  [0.500,0.500,0.500]  4b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 4 formula unit(s) of CdTe.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

The DefectsGenerator class is used to generate defects with doped. It has been set up to adopt reasonable defaults, such as supercell generation (tries to find the smallest possible supercell which is at least 10 Å in each direction; a typical threshold for minimising finite-size effects) and defect charge state guessing, but these are fully customisable using the class options shown in the docstring below and illustrated in the Custom Supercell Generation section of the advanced analysis tutorial. By default, defects are placed at the closest possible symmetry-equivalent site to the midpoint of the supercell, to aid visualisation e.g. in VESTA.

Important

As always with defect studies, you should carefully consider whether the default settings are appropriate for your system, and if not, you should customise them accordingly.

Some typical reasons for customising the default settings include:

  • Guessed Charges: The doped charge-state guessing algorithm has been developed to try balance efficiency (avoiding unnecessary calculations) and completeness (avoiding missing out on important defect species). However often defects behave in ways that we mightn’t expect at first (part of what makes them interesting!) and so no algorithm is able to predict exactly which charge states will/won’t be stable for your system. So, you should consider if your system is likely to have any unusual charge states (e.g. common in systems with low-symmetry (e.g. Sb₂Se₃), defect reconstructions (e.g. Sb₂S₃ & Al₂O₃ and/or amphoteric species (e.g. CdTe)) and if so, you should update the charge state ranges of your defects accordingly (example shown below).

  • Target Supercell: You may want to use a specific supercell for your calculations, perhaps to reproduce or build on calculations from a previous study with a specific supercell.

# run this to see the function documentation:
DefectsGenerator?
Init signature:
DefectsGenerator(
    structure: pymatgen.core.structure.Structure,
    extrinsic: Union[str, list, Dict, NoneType] = None,
    interstitial_coords: Optional[list] = None,
    generate_supercell: bool = True,
    charge_state_gen_kwargs: Optional[Dict] = None,
    supercell_gen_kwargs: Optional[Dict] = None,
    interstitial_gen_kwargs: Optional[Dict] = None,
    target_frac_coords: Optional[list] = None,
    processes: Optional[int] = None,
)
Docstring:      Class for generating doped DefectEntry objects.
Init docstring:
Generates doped DefectEntry objects for defects in the input host
structure. By default, generates all intrinsic defects, but extrinsic
defects (impurities) can also be created using the ``extrinsic``
argument.

Interstitial sites are generated using Voronoi tessellation by default (found
to be the most reliable), which can be controlled using the
``interstitial_gen_kwargs`` argument (passed as keyword arguments to the
``VoronoiInterstitialGenerator`` class). Alternatively, a list of interstitial
sites (or single interstitial site) can be manually specified using the
``interstitial_coords`` argument.

By default, supercells are generated for each defect using the doped
``get_ideal_supercell_matrix()`` function (see docstring), with default settings
of ``min_image_distance = 10`` (minimum distance between periodic images of 10 Å),
``min_atoms = 50`` (minimum 50 atoms in the supercell) and ``ideal_threshold = 0.1``
(allow up to 10% larger supercell if it is a diagonal expansion of the primitive
or conventional cell). This uses a custom algorithm in ``doped`` to efficiently
search over possible supercell transformations and identify that with the minimum
number of atoms (hence computational cost) that satisfies the minimum image distance,
number of atoms and ``ideal_threshold`` constraints. These settings can be controlled
by specifying keyword arguments with ``supercell_gen_kwargs``, which are passed to
``get_ideal_supercell_matrix()`` (e.g. for a minimum image distance of 15 Å with at
least 100 atoms, use:
``supercell_gen_kwargs = {'min_image_distance': 15, 'min_atoms': 100}``). If the
input structure already satisfies these constraints (for the same number of atoms as
the ``doped``-generated supercell), then it will be used.
Alternatively if ``generate_supercell = False``, then no supercell is generated
and the input structure is used as the defect & bulk supercell. (Note this
may give a slightly different (but fully equivalent) set of coordinates).

The algorithm for determining defect entry names is to use the pymatgen defect
name (e.g. ``v_Cd``, ``Cd_Te`` etc.) for vacancies/antisites/substitutions, unless
there are multiple inequivalent sites for the defect, in which case the point
group of the defect site is appended (e.g. ``v_Cd_Td``, ``Cd_Te_Td`` etc.), and if
this is still not unique, then element identity and distance to the nearest
neighbour of the defect site is appended (e.g. ``v_Cd_Td_Te2.83``, ``Cd_Te_Td_Cd2.83``
etc.). For interstitials, the same naming scheme is used, but the point group
is always appended to the pymatgen defect name.

Possible charge states for the defects are estimated using the probability of
the corresponding defect element oxidation state, the magnitude of the charge
state, and the maximum magnitude of the host oxidation states (i.e. how
'charged' the host is), with large (absolute) charge states, low probability
oxidation states and/or greater charge/oxidation state magnitudes than that of
the host being disfavoured. This can be controlled using the
``probability_threshold`` (default = 0.0075) or ``padding`` (default = 1) keys in
the ``charge_state_gen_kwargs`` parameter, which are passed to the
``_charge_state_probability()`` function. The input and computed values used to
guess charge state probabilities are provided in the
``DefectEntry.charge_state_guessing_log`` attributes. See docs for examples of
modifying the generated charge states.

Args:
    structure (Structure):
        Structure of the host material (as a pymatgen Structure object).
        If this is not the primitive unit cell, it will be reduced to the
        primitive cell for defect generation, before supercell generation.
    extrinsic (Union[str, List, Dict]):
        List or dict of elements (or string for single element) to be used
        for extrinsic defect generation (i.e. dopants/impurities). If a
        list is provided, all possible substitutional defects for each
        extrinsic element will be generated. If a dict is provided, the keys
        should be the host elements to be substituted, and the values the
        extrinsic element(s) to substitute in; as a string or list.
        In both cases, all possible extrinsic interstitials are generated.
    interstitial_coords (List):
        List of fractional coordinates (corresponding to the input structure),
        or a single set of fractional coordinates, to use as interstitial
        defect site(s). Default (when interstitial_coords not specified) is
        to automatically generate interstitial sites using Voronoi tessellation.
        The input interstitial_coords are converted to
        DefectsGenerator.prim_interstitial_coords, which are the corresponding
        fractional coordinates in DefectsGenerator.primitive_structure (which
        is used for defect generation), along with the multiplicity and
        equivalent coordinates, sorted according to the doped convention.
    generate_supercell (bool):
        Whether to generate a supercell for the output defect entries
        (using the custom algorithm in ``doped`` which efficiently searches over
        possible supercell transformations and identifies that with the minimum
        number of atoms (hence computational cost) that satisfies the minimum
        image distance, number of atoms and ``ideal_threshold`` constraints
        - which can be controlled with ``supercell_gen_kwargs``).
        If False, then the input structure is used as the defect & bulk supercell.
        (Note this may give a slightly different (but fully equivalent) set of coordinates).
    charge_state_gen_kwargs (Dict):
        Keyword arguments to be passed to the ``_charge_state_probability``
        function (such as ``probability_threshold`` (default = 0.0075, used for
        substitutions and interstitials) and ``padding`` (default = 1, used for
        vacancies)) to control defect charge state generation.
    supercell_gen_kwargs (Dict):
        Keyword arguments to be passed to the ``get_ideal_supercell_matrix``
        function (such as ``min_image_distance`` (default = 10), ``min_atoms``
        (default = 50), ``ideal_threshold`` (default = 0.1), ``force_cubic``
        - which enforces a (near-)cubic supercell output (default = False),
        or ``force_diagonal`` (default = False)).
    interstitial_gen_kwargs (Dict, bool):
        Keyword arguments to be passed to the ``VoronoiInterstitialGenerator``
        class (such as ``clustering_tol``, ``stol``, ``min_dist`` etc), or to
        ``InterstitialGenerator`` if ``interstitial_coords`` is specified.
        If set to False, interstitial generation will be skipped entirely.
    target_frac_coords (List):
        Defects are placed at the closest equivalent site to these fractional
        coordinates in the generated supercells. Default is [0.5, 0.5, 0.5]
        if not set (i.e. the supercell centre, to aid visualisation).
    processes (int):
        Number of processes to use for multiprocessing. If not set, defaults to
        one less than the number of CPUs available.

Attributes:
    defect_entries (Dict): Dictionary of {defect_species: DefectEntry} for all
        defect entries (with charge state and supercell properties) generated.
    defects (Dict): Dictionary of {defect_type: [Defect, ...]} for all defect
        objects generated.
    primitive_structure (Structure): Primitive cell structure of the host
        used to generate defects.
    supercell_matrix (Matrix): Matrix to generate defect/bulk supercells from
        the primitive cell structure.
    bulk_supercell (Structure): Supercell structure of the host
        (equal to primitive_structure * supercell_matrix).
    conventional_structure (Structure): Conventional cell structure of the
        host according to the Bilbao Crystallographic Server (BCS) definition,
        used to determine defect site Wyckoff labels and multiplicities.

    ``DefectsGenerator`` input parameters are also set as attributes.
File:           ~/Library/CloudStorage/OneDrive-ImperialCollegeLondon/Bread/Projects/Packages/doped/doped/generation.py
Type:           type
Subclasses:     

Information about the generated defects and supercell is accessible using the DefectsGenerator attributes, like defect_entries, bulk_supercell, defects etc.

Note

Note: The algorithm for determining defect entry names is to use the pymatgen defect name (e.g. v_Cd, Cd_Te etc.) for vacancies/antisites/substitutions, unless there are multiple inequivalent sites for the defect, in which case the point group of the defect site is appended (e.g. v_Cd_Td, Cd_Te_Td etc.), and if this is still not unique, then element identity and distance to the nearest neighbour of the defect site is appended (e.g. v_Cd_Td_Te2.83, Cd_Te_Td_Cd2.83 etc.). For interstitials, the same naming scheme is used, but the point group is always appended to the pymatgen defect name.

Remember that these are the point groups of the initial, unrelaxed defect structures! Which are likely to change during geometry relaxation (see e.g. 10.1039/D2FD00043A or 10.1039/D3CS00432E for a discussion on this and the corresponding effects on configurational degeneracies).

# show the generated defect entries:
defect_gen.defect_entries.keys()
dict_keys(['v_Cd_+1', 'v_Cd_0', 'v_Cd_-1', 'v_Cd_-2', 'v_Te_+2', 'v_Te_+1', 'v_Te_0', 'v_Te_-1', 'Cd_Te_+4', 'Cd_Te_+3', 'Cd_Te_+2', 'Cd_Te_+1', 'Cd_Te_0', 'Te_Cd_+2', 'Te_Cd_+1', 'Te_Cd_0', 'Te_Cd_-1', 'Te_Cd_-2', 'Te_Cd_-3', 'Te_Cd_-4', 'Cd_i_C3v_+2', 'Cd_i_C3v_+1', 'Cd_i_C3v_0', 'Cd_i_Td_Cd2.83_+2', 'Cd_i_Td_Cd2.83_+1', 'Cd_i_Td_Cd2.83_0', 'Cd_i_Td_Te2.83_+2', 'Cd_i_Td_Te2.83_+1', 'Cd_i_Td_Te2.83_0', 'Te_i_C3v_+4', 'Te_i_C3v_+3', 'Te_i_C3v_+2', 'Te_i_C3v_+1', 'Te_i_C3v_0', 'Te_i_C3v_-1', 'Te_i_C3v_-2', 'Te_i_Td_Cd2.83_+4', 'Te_i_Td_Cd2.83_+3', 'Te_i_Td_Cd2.83_+2', 'Te_i_Td_Cd2.83_+1', 'Te_i_Td_Cd2.83_0', 'Te_i_Td_Cd2.83_-1', 'Te_i_Td_Cd2.83_-2', 'Te_i_Td_Te2.83_+4', 'Te_i_Td_Te2.83_+3', 'Te_i_Td_Te2.83_+2', 'Te_i_Td_Te2.83_+1', 'Te_i_Td_Te2.83_0', 'Te_i_Td_Te2.83_-1', 'Te_i_Td_Te2.83_-2'])

If we wanted to manually add some extra charge states (e.g. some negative charge states for Cd_Te), we can do so using the add_charge_states method:

# Add some extra charge states for Cd_Te antisites
defect_gen.add_charge_states(defect_entry_name="Cd_Te", charge_states=[-2, -1])
# check our generated defect entries:
defect_gen
DefectsGenerator for input composition CdTe, space group F-43m with 52 defect entries created.
---------------------------------------------------------
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Cd         [+1,0,-1,-2]       [0.000,0.000,0.000]  4a
v_Te         [+2,+1,0,-1]       [0.250,0.250,0.250]  4c

Substitutions    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_Te            [+4,+3,+2,+1,0,-1,-2]  [0.250,0.250,0.250]  4c
Te_Cd            [+2,+1,0,-1,-2,-3,-4]  [0.000,0.000,0.000]  4a

Interstitials    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_i_C3v         [+2,+1,0]              [0.625,0.625,0.625]  16e
Cd_i_Td_Cd2.83   [+2,+1,0]              [0.750,0.750,0.750]  4d
Cd_i_Td_Te2.83   [+2,+1,0]              [0.500,0.500,0.500]  4b
Te_i_C3v         [+4,+3,+2,+1,0,-1,-2]  [0.625,0.625,0.625]  16e
Te_i_Td_Cd2.83   [+4,+3,+2,+1,0,-1,-2]  [0.750,0.750,0.750]  4d
Te_i_Td_Te2.83   [+4,+3,+2,+1,0,-1,-2]  [0.500,0.500,0.500]  4b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 4 formula unit(s) of CdTe.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

We see that -2 and -1 charge states have now been added for our Cd-on-Te antisite substitution. Again, the help documentation for this function can be viewed with by adding a question mark to the function/method name:

defect_gen.add_charge_states?
Signature: defect_gen.add_charge_states(defect_entry_name: str, charge_states: list)
Docstring:
Add additional ``DefectEntry``\s with the specified charge states to
``self.defect_entries``.

Args:
    defect_entry_name (str):
        Name of defect entry to add charge states to.
        Doesn't need to include the charge state.
    charge_states (list): List of charge states to add to defect entry (e.g. [-2, -3]).
File:      ~/Library/CloudStorage/OneDrive-ImperialCollegeLondon/Bread/Projects/Packages/doped/doped/generation.py
Type:      method
# likewise we can remove charge states with:
defect_gen.remove_charge_states(defect_entry_name="v_Cd", charge_states=[+1])
defect_gen
DefectsGenerator for input composition CdTe, space group F-43m with 51 defect entries created.
---------------------------------------------------------
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Cd         [0,-1,-2]          [0.000,0.000,0.000]  4a
v_Te         [+2,+1,0,-1]       [0.250,0.250,0.250]  4c

Substitutions    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_Te            [+4,+3,+2,+1,0,-1,-2]  [0.250,0.250,0.250]  4c
Te_Cd            [+2,+1,0,-1,-2,-3,-4]  [0.000,0.000,0.000]  4a

Interstitials    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_i_C3v         [+2,+1,0]              [0.625,0.625,0.625]  16e
Cd_i_Td_Cd2.83   [+2,+1,0]              [0.750,0.750,0.750]  4d
Cd_i_Td_Te2.83   [+2,+1,0]              [0.500,0.500,0.500]  4b
Te_i_C3v         [+4,+3,+2,+1,0,-1,-2]  [0.625,0.625,0.625]  16e
Te_i_Td_Cd2.83   [+4,+3,+2,+1,0,-1,-2]  [0.750,0.750,0.750]  4d
Te_i_Td_Te2.83   [+4,+3,+2,+1,0,-1,-2]  [0.500,0.500,0.500]  4b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 4 formula unit(s) of CdTe.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

If we want to generate defects with extrinsic elements, to investigate doping or potential impurities in the material, we can do so using the extrinsic option. This can be either a string of a single impurity element to consider, a list of multiple possible extrinsic species, or a dictionary in the form {host_element: extrinsic_element} if we only want to consider specific substitutions host/impurity substitutions. For example, to investigate Cu, Se and Na doping in CdTe, we can do:

# generate defects:
defect_gen = DefectsGenerator(relaxed_primitive_CdTe, extrinsic=["Cu", "Se", "Na"])
Generating DefectEntry objects: 100.0%|██████████| [00:12,   7.95it/s]                             
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Cd         [+1,0,-1,-2]       [0.000,0.000,0.000]  4a
v_Te         [+2,+1,0,-1]       [0.250,0.250,0.250]  4c

Substitutions    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_Te            [+4,+3,+2,+1,0]        [0.250,0.250,0.250]  4c
Te_Cd            [+2,+1,0,-1,-2,-3,-4]  [0.000,0.000,0.000]  4a
Cu_Cd            [+1,0,-1]              [0.000,0.000,0.000]  4a
Cu_Te            [+4,+3,+2,+1,0]        [0.250,0.250,0.250]  4c
Na_Cd            [0,-1]                 [0.000,0.000,0.000]  4a
Na_Te            [+3,+2,+1,0]           [0.250,0.250,0.250]  4c
Se_Cd            [+2,+1,0,-1,-2,-3,-4]  [0.000,0.000,0.000]  4a
Se_Te            [+1,0]                 [0.250,0.250,0.250]  4c

Interstitials    Guessed Charges        Conv. Cell Coords    Wyckoff
---------------  ---------------------  -------------------  ---------
Cd_i_C3v         [+2,+1,0]              [0.625,0.625,0.625]  16e
Cd_i_Td_Cd2.83   [+2,+1,0]              [0.750,0.750,0.750]  4d
Cd_i_Td_Te2.83   [+2,+1,0]              [0.500,0.500,0.500]  4b
Te_i_C3v         [+4,+3,+2,+1,0,-1,-2]  [0.625,0.625,0.625]  16e
Te_i_Td_Cd2.83   [+4,+3,+2,+1,0,-1,-2]  [0.750,0.750,0.750]  4d
Te_i_Td_Te2.83   [+4,+3,+2,+1,0,-1,-2]  [0.500,0.500,0.500]  4b
Cu_i_C3v         [+2,+1,0]              [0.625,0.625,0.625]  16e
Cu_i_Td_Cd2.83   [+2,+1,0]              [0.750,0.750,0.750]  4d
Cu_i_Td_Te2.83   [+2,+1,0]              [0.500,0.500,0.500]  4b
Na_i_C3v         [+1,0]                 [0.625,0.625,0.625]  16e
Na_i_Td_Cd2.83   [+1,0]                 [0.750,0.750,0.750]  4d
Na_i_Td_Te2.83   [+1,0]                 [0.500,0.500,0.500]  4b
Se_i_C3v         [0,-1,-2]              [0.625,0.625,0.625]  16e
Se_i_Td_Cd2.83   [0,-1,-2]              [0.750,0.750,0.750]  4d
Se_i_Td_Te2.83   [0,-1,-2]              [0.500,0.500,0.500]  4b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 4 formula unit(s) of CdTe.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

Tip

If we want to save our generated defects to a file, to then use later without having to rerun the above generation functions, or use in a different notebook, or share with another user, we can do so using the to_json and from_json methods for DefectsGenerator.

Determining the ground-state defect structures

At this point, it’s recommended that you use the ShakeNBreak approach to quickly identify the groundstate structures of your defects, before continuing on with the formation energy calculation workflow below. As detailed in the theory paper, skipping this step can result in drastically incorrect formation energies, transition levels, carrier capture (basically any property associated with defects). This approach is followed below, with a more in-depth explanation and tutorial given on the ShakeNBreak website.

Note

If for some reason you are not doing the recommended ShakeNBreak ground-state structure search for your defects, you can skip ahead to the ‘Prepare VASP relaxation calculation files’ section below, and follow the instructions listed in the Note there.

While the Python API approach described below is the recommended approach for using ShakeNBreak (offering more control and ability to analyses structures), it can also be run from the command line, with more info found at the CLI docs section and a summary GIF in the CLI section of the Welcome page.

To generate our distorted defect structures with ShakeNBreak (SnB), we can directly input our DefectsGenerator object to the SnB Distortions class:

from shakenbreak.input import Distortions
Dist = Distortions(  # Initialise the Distortions class with our generated CdTe defects
    defect_entries = defect_gen,
    #oxidation_states=oxidation_states,  # optionally explicitly specify oxidation states
)  # Distortions can also be initialised from a list of DefectEntry objects or Structure objects
Oxidation states were not explicitly set, thus have been guessed as {'Cd': 2.0, 'Te': -2.0}. If this is unreasonable you should manually set oxidation_states

Important

As with the doped functions, ShakeNBreak has been built and optimised to adopt reasonable defaults that work well for most host materials, however there is again a lot of customisation and control available, and you should carefully consider the appropriate settings and choices for your system.

The ShakeNBreak workflow is shown in more detail in the SnB Python API tutorial on the SnB docs, and here we just show a brief example of the main steps.

Generating VASP input files for the trial distorted structures

defects_dict, distortion_metadata = Dist.write_vasp_files()
Applying ShakeNBreak... Will apply the following bond distortions: ['-0.6', '-0.5', '-0.4', '-0.3', '-0.2', '-0.1', '0.0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6']. Then, will rattle with a std dev of 0.28 Å 


Defect: v_Cd
Number of missing electrons in neutral state: 2

Defect v_Cd in charge state: 0. Number of distorted neighbours: 2

Defect v_Cd in charge state: -1. Number of distorted neighbours: 1

Defect v_Cd in charge state: -2. Number of distorted neighbours: 0

Defect: v_Te
Number of extra electrons in neutral state: 2

Defect v_Te in charge state: +2. Number of distorted neighbours: 0

Defect v_Te in charge state: +1. Number of distorted neighbours: 1

Defect v_Te in charge state: 0. Number of distorted neighbours: 2

Defect v_Te in charge state: -1. Number of distorted neighbours: 3

Defect: Cd_Te
Number of extra electrons in neutral state: 4

Defect Cd_Te in charge state: +4. Number of distorted neighbours: 0

Defect Cd_Te in charge state: +3. Number of distorted neighbours: 1

Defect Cd_Te in charge state: +2. Number of distorted neighbours: 2

Defect Cd_Te in charge state: +1. Number of distorted neighbours: 3

Defect Cd_Te in charge state: 0. Number of distorted neighbours: 4

Defect Cd_Te in charge state: -1. Number of distorted neighbours: 3

Defect Cd_Te in charge state: -2. Number of distorted neighbours: 2

Defect: Te_Cd
Number of missing electrons in neutral state: 4

Defect Te_Cd in charge state: +2. Number of distorted neighbours: 2

Defect Te_Cd in charge state: +1. Number of distorted neighbours: 3

Defect Te_Cd in charge state: 0. Number of distorted neighbours: 4

Defect Te_Cd in charge state: -1. Number of distorted neighbours: 3

Defect Te_Cd in charge state: -2. Number of distorted neighbours: 2

Defect Te_Cd in charge state: -3. Number of distorted neighbours: 1

Defect Te_Cd in charge state: -4. Number of distorted neighbours: 0

Defect: Cd_i_C3v
Number of extra electrons in neutral state: 2

Defect Cd_i_C3v in charge state: +2. Number of distorted neighbours: 0

Defect Cd_i_C3v in charge state: +1. Number of distorted neighbours: 1

Defect Cd_i_C3v in charge state: 0. Number of distorted neighbours: 2

Defect: Cd_i_Td_Cd2.83
Number of extra electrons in neutral state: 2

Defect Cd_i_Td_Cd2.83 in charge state: +2. Number of distorted neighbours: 0

Defect Cd_i_Td_Cd2.83 in charge state: +1. Number of distorted neighbours: 1

Defect Cd_i_Td_Cd2.83 in charge state: 0. Number of distorted neighbours: 2

Defect: Cd_i_Td_Te2.83
Number of extra electrons in neutral state: 2

Defect Cd_i_Td_Te2.83 in charge state: +2. Number of distorted neighbours: 0

Defect Cd_i_Td_Te2.83 in charge state: +1. Number of distorted neighbours: 1

Defect Cd_i_Td_Te2.83 in charge state: 0. Number of distorted neighbours: 2

Defect: Te_i_C3v
Number of missing electrons in neutral state: 2

Defect Te_i_C3v in charge state: +4. Number of distorted neighbours: 2

Defect Te_i_C3v in charge state: +3. Number of distorted neighbours: 3

Defect Te_i_C3v in charge state: +2. Number of distorted neighbours: 4

Defect Te_i_C3v in charge state: +1. Number of distorted neighbours: 3

Defect Te_i_C3v in charge state: 0. Number of distorted neighbours: 2

Defect Te_i_C3v in charge state: -1. Number of distorted neighbours: 1

Defect Te_i_C3v in charge state: -2. Number of distorted neighbours: 0

Defect: Te_i_Td_Cd2.83
Number of missing electrons in neutral state: 2

Defect Te_i_Td_Cd2.83 in charge state: +4. Number of distorted neighbours: 2

Defect Te_i_Td_Cd2.83 in charge state: +3. Number of distorted neighbours: 3

Defect Te_i_Td_Cd2.83 in charge state: +2. Number of distorted neighbours: 4

Defect Te_i_Td_Cd2.83 in charge state: +1. Number of distorted neighbours: 3

Defect Te_i_Td_Cd2.83 in charge state: 0. Number of distorted neighbours: 2

Defect Te_i_Td_Cd2.83 in charge state: -1. Number of distorted neighbours: 1

Defect Te_i_Td_Cd2.83 in charge state: -2. Number of distorted neighbours: 0

Defect: Te_i_Td_Te2.83
Number of missing electrons in neutral state: 2

Defect Te_i_Td_Te2.83 in charge state: +4. Number of distorted neighbours: 2

Defect Te_i_Td_Te2.83 in charge state: +3. Number of distorted neighbours: 3

Defect Te_i_Td_Te2.83 in charge state: +2. Number of distorted neighbours: 4

Defect Te_i_Td_Te2.83 in charge state: +1. Number of distorted neighbours: 3

Defect Te_i_Td_Te2.83 in charge state: 0. Number of distorted neighbours: 2

Defect Te_i_Td_Te2.83 in charge state: -1. Number of distorted neighbours: 1

Defect Te_i_Td_Te2.83 in charge state: -2. Number of distorted neighbours: 0

Our distorted structures and VASP input files have now been generated in the folders with names matching the doped naming scheme:

!cat v_Cd_-1/Bond_Distortion_-10.0%/INCAR
# May want to change NCORE, KPAR, AEXX, ENCUT, IBRION, LREAL, NUPDOWN, ISPIN = Typical variable parameters
# ShakeNBreak INCAR with coarse settings to maximise speed with sufficient accuracy for qualitative structure searching = 
AEXX = 0.25
ALGO = Normal  # change to all if zhegv, fexcp/f or zbrent errors encountered (done automatically by snb-run)
EDIFF = 1e-05
EDIFFG = -0.01
ENCUT = 300
GGA = Pe
HFSCREEN = 0.208
IBRION = 2
ICHARG = 1
ICORELEVEL = 0  # needed if using the kumagai-oba (efnv) anisotropic charge correction scheme
ISIF = 2
ISMEAR = 0
ISPIN = 2
ISYM = 0  # symmetry breaking extremely likely for defects
LASPH = True
LCHARG = False
LHFCALC = True
LMAXMIX = 4
LORBIT = 11
LREAL = Auto
LVHAR = True
LWAVE = False
NCORE = 16
NEDOS = 2000
NELECT = 475.0
NELM = 40
NSW = 300
NUPDOWN = 1
PREC = Accurate
PRECFOCK = Fast
ROPT = 1e-3 1e-3
SIGMA = 0.05

Send to HPCs and run relaxations

Can use the snb-run CLI function to quickly run calculations; see the Submitting the geometry optimisations section of the SnB CLI tutorial for this.

For demonstration purposes in this example notebook, we’ll use some (fake) example data:
(Don’t do this if you’re actually running the calculations and have downloaded the data as instructed above)

Note

If you want to run the below functions interactively in python, you need to clone the ShakeNBreak GitHub repository and pip install the package from there (in order for the example data to be downloaded), which can be done with:

git clone https://github.com/SMTG-Bham/ShakeNBreak
cd ShakeNBreak && pip install .
import shakenbreak
snb_path = shakenbreak.__path__[0]

!rm -r v_Te* Te_* Cd_* # remove other defect folders for example case here
!cp -r {snb_path}/../tests/data/example_results/v_Cd* .
!cp {snb_path}/../tests/data/vasp/CdTe/distortion_metadata.json .

Plot energies vs distortion

To see if SnB found any energy-lowering distortions, we can plot the results using the functions in shakenbreak.plotting. Here we show the results for cadmium vacancies only as an example:

from shakenbreak import energy_lowering_distortions, plotting
defect_charges_dict = energy_lowering_distortions.read_defects_directories()
low_energy_defects = energy_lowering_distortions.get_energy_lowering_distortions(defect_charges_dict)
v_Cd
v_Cd_0: Energy difference between minimum, found with -0.6 bond distortion, and unperturbed: -0.76 eV.
Energy lowering distortion found for v_Cd with charge 0. Adding to low_energy_defects dictionary.
v_Cd_-1: Energy difference between minimum, found with 0.2 bond distortion, and unperturbed: -0.90 eV.
New (according to structure matching) low-energy distorted structure found for v_Cd_-1, adding to low_energy_defects['v_Cd'] list.
No energy lowering distortion with energy difference greater than min_e_diff = 0.05 eV found for v_Cd with charge -2.

Comparing and pruning defect structures across charge states...

These functions give us some info about whether any energy-lowering defect distortions were identified, and we can see the results clearer by plotting:

figs = plotting.plot_all_defects(defect_charges_dict, add_colorbar=True)
_images/096408a78f6d253813c57595e805b5e1a1e56b4b04f390d1f6b55fd0330fa4be.png _images/ee54370c11d46e7818613c62163c22acb2f09656f8c74150faece706b191c66a.png

For these example results, we find energy lowering distortions for VCd0 and VCd-1. We should re-test these distorted structures for the VCd charge states where these distortions were not found, in case they also give lower energies.

The get_energy_lowering_distortions() function above automatically performs structure comparisons to determine which distortions should be tested in other charge states of the same defect, and which have already been found (see docstring for more details).

# generates the new distorted structures and VASP inputs, to do our quick 2nd round of structure testing:
energy_lowering_distortions.write_retest_inputs(low_energy_defects)
Writing low-energy distorted structure to ./v_Cd_-1/Bond_Distortion_-60.0%_from_0
Writing low-energy distorted structure to ./v_Cd_-2/Bond_Distortion_-60.0%_from_0
Writing low-energy distorted structure to ./v_Cd_0/Bond_Distortion_20.0%_from_-1
Writing low-energy distorted structure to ./v_Cd_-2/Bond_Distortion_20.0%_from_-1

Again we run the calculations on the HPCs, then parse and plot the results either using the SnB CLI functions, or through the python API as exemplified here.

# fake example data!
import shakenbreak
snb_path = shakenbreak.__path__[0]
example_results_path = f"{snb_path}/../tests/data/example_results"

!cp {example_results_path}/v_Cd_0/v_Cd_0_additional_distortions.yaml ./v_Cd_0/v_Cd_0.yaml
!cp {example_results_path}/v_Cd_-1/v_Cd_-1_additional_distortions.yaml ./v_Cd_-1/v_Cd_-1.yaml
!cp {example_results_path}/v_Cd_-2/v_Cd_-2_additional_distortions.yaml ./v_Cd_-2/v_Cd_-2.yaml
!cp {example_results_path}/v_Cd_0/Bond_Distortion_-60.0%/CONTCAR ./v_Cd_-1/Bond_Distortion_-60.0%_from_0/
!cp {example_results_path}/v_Cd_0/Bond_Distortion_-60.0%/CONTCAR ./v_Cd_-2/Bond_Distortion_-60.0%_from_0/
!cp {example_results_path}/v_Cd_-1/Unperturbed/CONTCAR ./v_Cd_-2/Bond_Distortion_20.0%_from_-1/
!cp {example_results_path}/v_Cd_-1/Unperturbed/CONTCAR ./v_Cd_0/Bond_Distortion_20.0%_from_-1/
# re-parse with the same `get_energy_lowering_distortions()` function from before:
low_energy_defects = energy_lowering_distortions.get_energy_lowering_distortions(defect_charges_dict)
v_Cd
v_Cd_0: Energy difference between minimum, found with -0.6 bond distortion, and unperturbed: -0.76 eV.
Energy lowering distortion found for v_Cd with charge 0. Adding to low_energy_defects dictionary.
v_Cd_-1: Energy difference between minimum, found with -60.0%_from_0 bond distortion, and unperturbed: -1.20 eV.
Low-energy distorted structure for v_Cd_-1 already found with charge states ['0'], storing together.
v_Cd_-2: Energy difference between minimum, found with 20.0%_from_-1 bond distortion, and unperturbed: -1.90 eV.
New (according to structure matching) low-energy distorted structure found for v_Cd_-2, adding to low_energy_defects['v_Cd'] list.

Comparing and pruning defect structures across charge states...
Ground-state structure found for v_Cd with charges [0, -1] has also been found for charge state -2 (according to structure matching). Adding this charge to the corresponding entry in low_energy_defects[v_Cd].
Ground-state structure found for v_Cd with charges [-2] has also been found for charge state 0 (according to structure matching). Adding this charge to the corresponding entry in low_energy_defects[v_Cd].

Finally we can replot the results from all our distortion tests:

figs = plotting.plot_all_defects(defect_charges_dict)
figs["v_Cd_0"]
_images/1f7ab61d331ae0eda69de5174b928717514109d49e2e5d8f5029676863169896.png

We now continue our defect calculations using the ground-state CONTCARs we’ve obtained for each defect, with our fully-converged INCAR and KPOINTS settings (via the doped vasp.DefectsSet class below, to get our final defect formation energies (confident that we’ve identified the ground-state defect structure!). The energy_lowering_distortions.write_groundstate_structure() function automatically writes these lowest-energy structures to our defect folders:

# write SnB-relaxed groundstate structure to the vasp_nkred_std subfolders for each defect:
energy_lowering_distortions.write_groundstate_structure(
    groundstate_folder="vasp_nkred_std",  
    groundstate_filename="POSCAR",
)
!head v_Cd_0/vasp_nkred_std/POSCAR  # groundstate structure from -60% distortion relaxation
-60.0%_Bond__vac_1_Cd[0. 0. 0.]_-dNELECT
   1.0000000000000000     
    13.0867679999999993    0.0000000000000000    0.0000000000000000
     0.0000000000000000   13.0867679999999993    0.0000000000000000
     0.0000000000000000    0.0000000000000000   13.0867679999999993
   Cd   Te
    31    32
Direct
  0.0014403846070577  0.0152341826280604  0.4960600473735149
  0.0018443102488570  0.5161087673464303 -0.0040398656877614

We can also optionally analyse the defect distortions found with SnB using the compare_structures(), analyse_structure(), get_homoionic_bonds() and get_site_magnetization() functions from shakenbreak.analysis; see the distortion analysis section of the SnB Python API tutorial for more.

Prepare VASP calculation files with doped

To generate our VASP input files for the defect calculations, we can use the DefectsSet class in the vasp module of doped:

from doped.vasp import DefectsSet  # append "?" to function/class to see the help docstrings:
DefectsSet?
Init signature:
DefectsSet(
    defect_entries: Union[doped.generation.DefectsGenerator, Dict[str, doped.core.DefectEntry], List[doped.core.DefectEntry], doped.core.DefectEntry],
    soc: Optional[bool] = None,
    user_incar_settings: Optional[dict] = None,
    user_kpoints_settings: Union[dict, pymatgen.io.vasp.inputs.Kpoints, NoneType] = None,
    user_potcar_functional: Optional[Literal['PBE', 'PBE_52', 'PBE_54', 'LDA', 'LDA_52', 'LDA_54', 'PW91', 'LDA_US', 'PW91_US']] = 'PBE',
    user_potcar_settings: Optional[dict] = None,
    **kwargs,
)
Docstring:     
An object for generating input files for VASP defect calculations from
doped/pymatgen ``DefectEntry`` objects.
Init docstring:
Creates a dictionary of: {defect_species: DefectRelaxSet}.

DefectRelaxSet has the attributes:

- ``DefectRelaxSet.vasp_gam``:
    ``DefectDictSet`` for Gamma-point only relaxation. Usually not needed if
    ShakeNBreak structure searching has been performed (recommended), unless
    only Γ-point `k`-point sampling is required (converged) for your system,
    and no vasp_std calculations with multiple `k`-points are required
    (determined from kpoints settings).
- ``DefectRelaxSet.vasp_nkred_std``:
    ``DefectDictSet`` for relaxation with a kpoint mesh and using ``NKRED``.
    Not generated for GGA calculations (if ``LHFCALC`` is set to ``False`` in
    ``user_incar_settings``) or if only Gamma kpoint sampling is required.
- ``DefectRelaxSet.vasp_std``:
    ``DefectDictSet`` for relaxation with a kpoint mesh, not using ``NKRED``.
    Not generated if only Gamma kpoint sampling is required.
- ``DefectRelaxSet.vasp_ncl``:
    ``DefectDictSet`` for singlepoint (static) energy calculation with SOC
    included. Generated if ``soc=True``. If ``soc`` is not set, then by default
    is only generated for systems with a max atomic number (Z) >= 31
    (i.e. further down the periodic table than Zn).

where ``DefectDictSet`` is an extension of ``pymatgen``'s ``DictSet`` class for
defect calculations, with ``incar``, ``poscar``, ``kpoints`` and ``potcar``
attributes for the corresponding VASP defect calculations (see docstring).
Also creates the corresponding ``bulk_vasp_...`` attributes for singlepoint
(static) energy calculations of the bulk (pristine, defect-free) supercell.
This needs to be calculated once with the same settings as the final defect
calculations, for the later calculation of defect formation energies.

See the ``RelaxSet.yaml`` and ``DefectSet.yaml`` files in the
``doped/VASP_sets`` folder for the default ``INCAR`` settings, and
``PotcarSet.yaml`` for the default ``POTCAR`` settings.

Note that any changes to the default ``INCAR``/``POTCAR`` settings should
be consistent with those used for all defect and competing phase (chemical
potential) calculations.

Args:
    defect_entries (``DefectsGenerator``, dict/list of ``DefectEntry``\s, or ``DefectEntry``):
        Either a ``DefectsGenerator`` object, or a dictionary/list of
        ``DefectEntry``\s, or a single ``DefectEntry`` object, for which
        to generate VASP input files.
        If a ``DefectsGenerator`` object or a dictionary (->
        {defect_species: DefectEntry}), the defect folder names will be
        set equal to ``defect_species``. If a list or single ``DefectEntry``
        object is provided, the defect folder names will be set equal to
        ``DefectEntry.name`` if the ``name`` attribute is set, otherwise
        generated according to the ``doped`` convention (see doped.generation).
        Defect charge states are taken from ``DefectEntry.charge_state``.
    soc (bool):
        Whether to generate ``vasp_ncl`` DefectDictSet attribute for spin-orbit
        coupling singlepoint (static) energy calculations. If not set, then
        by default is set to True if the max atomic number (Z) in the
        structure is >= 31 (i.e. further down the periodic table than Zn).
    user_incar_settings (dict):
        Dictionary of user INCAR settings (AEXX, NCORE etc.) to override
        default settings. Highly recommended to look at output INCARs or the
        ``RelaxSet.yaml`` and ``DefectSet.yaml`` files in the ``doped/VASP_sets``
        folder, to see what the default INCAR settings are. Note that any
        flags that aren't numbers or True/False need to be input as strings
        with quotation marks (e.g. ``{"ALGO": "All"}``).
        (default: None)
    user_kpoints_settings (dict or Kpoints):
        Dictionary of user KPOINTS settings (in pymatgen DictSet() format)
        e.g. {"reciprocal_density": 123}, or a Kpoints object, to use for the
        ``vasp_std``, ``vasp_nkred_std`` and ``vasp_ncl`` DefectDictSets (Γ-only for
        ``vasp_gam``). Default is Gamma-centred, reciprocal_density = 100 [Å⁻³].
    user_potcar_functional (str):
        POTCAR functional to use. Default is "PBE" and if this fails, tries
        "PBE_52", then "PBE_54".
    user_potcar_settings (dict):
        Override the default POTCARs, e.g. {"Li": "Li_sv"}. See
        ``doped/VASP_setsPotcarSet.yaml`` for the default ``POTCAR`` set.
    **kwargs: Additional kwargs to pass to each ``DefectRelaxSet()``.

Attributes:
    defect_sets (Dict):
        Dictionary of {defect_species: ``DefectRelaxSet``}.
    defect_entries (Dict):
        Dictionary of {defect_species: DefectEntry} for the input defect
        species, for which to generate VASP input files.
    bulk_vasp_gam (DefectDictSet):
        DefectDictSet for a `bulk` Γ-point-only singlepoint (static)
        supercell calculation. Often not used, as the bulk supercell only
        needs to be calculated once with the same settings as the final
        defect calculations, which may be with ``vasp_std`` or ``vasp_ncl``.
    bulk_vasp_nkred_std (DefectDictSet):
        DefectDictSet for a singlepoint (static) `bulk` ``vasp_std`` supercell
        calculation (i.e. with a non-Γ-only kpoint mesh) and ``NKRED(X,Y,Z)``
        INCAR tag(s) to downsample kpoints for the HF exchange part of the
        hybrid DFT calculation. Not generated for GGA calculations (if
        ``LHFCALC`` is set to ``False`` in user_incar_settings) or if only Gamma
        kpoint sampling is required.
    bulk_vasp_std (DefectDictSet):
        DefectDictSet for a singlepoint (static) `bulk` ``vasp_std`` supercell
        calculation with a non-Γ-only kpoint mesh, not using ``NKRED``. Not
        generated if only Gamma kpoint sampling is required.
    bulk_vasp_ncl (DefectDictSet):
        DefectDictSet for singlepoint (static) energy calculation of the `bulk`
        supercell with SOC included. Generated if ``soc=True``. If ``soc`` is not
        set, then by default is only generated for systems with a max atomic
        number (Z) >= 31 (i.e. further down the periodic table than Zn).
    json_obj (Union[Dict, DefectsGenerator]):
        Either the DefectsGenerator object if input ``defect_entries`` is a
        ``DefectsGenerator`` object, otherwise the ``defect_entries`` dictionary,
        which will be written to file when ``write_files()`` is called, to
        aid calculation provenance.
    json_name (str):
        Name of the JSON file to save the ``json_obj`` to.

    Input parameters are also set as attributes.
File:           ~/Library/CloudStorage/OneDrive-ImperialCollegeLondon/Bread/Projects/Packages/doped/doped/vasp.py
Type:           type
Subclasses:     
defect_set = DefectsSet(
    defect_gen,  # our DefectsGenerator object, can also input individual DefectEntry objects if desired
    user_incar_settings={"ENCUT": 350},  # custom INCAR settings, any that aren't numbers or True/False need to be input as strings with quotation marks!
)
# We can also customise the KPOINTS and POTCAR settings (and others), see the docstrings above for more info

The DefectsSet class prepares a dictionary in the form {defect_species: DefectRelaxSet} and stores this in the defect_sets attribute, where DefectRelaxSet has attributes: vasp_nkred_std, vasp_std (if our final k-point sampling is non-Γ-only with multiple k-points), vasp_ncl (if SOC included; see docstrings for default behaviour) and/or vasp_gam (if our final k-point sampling is Γ-only). These attributes are DefectDictSet objects, subclasses of the pymatgen DictSet object, and contain information about the calculation inputs.

# for example, let's look at the generated inputs for a `vasp_std` calculation with `NKRED`, for Cd_Te_+2:
print(f"INCAR:\n{defect_set.defect_sets['Cd_Te_+2'].vasp_nkred_std.incar}")
print(f"KPOINTS:\n{defect_set.defect_sets['Cd_Te_+2'].vasp_nkred_std.kpoints}")
print(f"POTCAR (symbols):\n{defect_set.defect_sets['Cd_Te_+2'].vasp_nkred_std.potcar_symbols}")
INCAR:
# May want to change NCORE, KPAR, AEXX, ENCUT, IBRION, LREAL, NUPDOWN, ISPIN = Typical variable parameters
AEXX = 0.25
ALGO = Normal  # change to all if zhegv, fexcp/f or zbrent errors encountered
EDIFF = 1e-05
EDIFFG = -0.01
ENCUT = 350
GGA = Pe
HFSCREEN = 0.208
IBRION = 2
ICHARG = 1
ICORELEVEL = 0  # needed if using the kumagai-oba (efnv) anisotropic charge correction scheme
ISIF = 2
ISMEAR = 0
ISPIN = 2
ISYM = 0  # symmetry breaking extremely likely for defects
KPAR = 2
LASPH = True
LHFCALC = True
LMAXMIX = 4
LORBIT = 11
LREAL = Auto
LVHAR = True
NCORE = 16
NEDOS = 2000
NELECT = 490.0
NKRED = 2
NSW = 200
NUPDOWN = 0
PREC = Accurate
PRECFOCK = Fast
SIGMA = 0.05

KPOINTS:
KPOINTS from doped, with reciprocal_density = 100/Å⁻³
0
Gamma
2 2 2

POTCAR (symbols):
['Cd', 'Te']

Tip

The use of (subclasses of) pymatgen DictSet objects here allows these functions to be readily used with high-throughput frameworks such as atomate(2) or AiiDA.

We can then write these files to disk with the write_files() method:

defect_set.write_files()  # again add "?" to see the docstring and options
Generating and writing input files: 100%|██████████| 49/49 [00:08<00:00,  5.72it/s]

Note

The recommended defects workflow is to use the ShakeNBreak approach with the initial generated defect configurations, which allows us to identify the ground-state structures and also accelerates the defect calculations by performing the initial fast vasp_gam relaxations, which ‘pre-converge’ our structures by bringing us very close to the final converged relaxed structure, much quicker (10-100x) than if we performed the fully-converged vasp_std relaxations from the beginning.

As such, DefectsSet.write_files() assumes by default that the defect structures (POSCARs) have already been generated and pre-relaxed with ShakeNBreak, and that you will then copy in the ground-state POSCARs (as shown above) to continue with the final defect relaxations in these folders. If for some reason you are not following this recommended approach, you can either set vasp_gam = True in write_files() to generate the unperturbed defect structures and input files for vasp_gam relaxations, or you can use unperturbed_poscar = True to also write the unperturbed defect POSCARs to the defect folders.

!ls *v_Cd*/*vasp* # list the generated VASP input files
v_Cd_-1/vasp_ncl:
INCAR        KPOINTS      POTCAR       v_Cd_-1.json

v_Cd_-1/vasp_nkred_std:
INCAR        KPOINTS      POSCAR       POTCAR       v_Cd_-1.json

v_Cd_-1/vasp_std:
INCAR        KPOINTS      POTCAR       v_Cd_-1.json

v_Cd_-2/vasp_ncl:
INCAR        KPOINTS      POTCAR       v_Cd_-2.json

v_Cd_-2/vasp_nkred_std:
INCAR        KPOINTS      POSCAR       POTCAR       v_Cd_-2.json

v_Cd_-2/vasp_std:
INCAR        KPOINTS      POTCAR       v_Cd_-2.json

v_Cd_0/vasp_ncl:
INCAR       KPOINTS     POTCAR      v_Cd_0.json

v_Cd_0/vasp_nkred_std:
INCAR       KPOINTS     POSCAR      POTCAR      v_Cd_0.json

v_Cd_0/vasp_std:
INCAR       KPOINTS     POTCAR      v_Cd_0.json

We can see that for each defect species, we have a vasp_nkred_std, vasp_std and vasp_ncl folder with corresponding VASP input files. This follows our recommended defect workflow (see the YouTube defects tutorial, which is to first perform defect structure-searching with ShakeNBreak using vasp_gam, then copy in the ground-state POSCARs to continue the defect relaxations with vasp_std (multiple k-points) but using NKRED to reduce the cost of hybrid DFT calculations (typically good accuracy for forces)(-> vasp_nkred_std input files), then without NKRED (-> vasp_std folder), then if SOC is important (which it is in this case for CdTe) we do a final SOC singlepoint calculation with vasp_ncl.

We see here our ground-state POSCARs from ShakeNBreak are present in the vasp_nkred_std folders, as we used the groundstate_folder="vasp_nkred_std" option with write_groundstate_structure() above (equivalent to -d vasp_nkred_std with snb-groundstate if using the CLI).

!cat v_Te_0/vasp_nkred_std/INCAR
# May want to change NCORE, KPAR, AEXX, ENCUT, IBRION, LREAL, NUPDOWN, ISPIN = Typical variable parameters
AEXX = 0.25
ALGO = Normal  # change to all if zhegv, fexcp/f or zbrent errors encountered
EDIFF = 1e-05
EDIFFG = -0.01
ENCUT = 350
GGA = Pe
HFSCREEN = 0.208
IBRION = 2
ICHARG = 1
ICORELEVEL = 0  # needed if using the kumagai-oba (efnv) anisotropic charge correction scheme
ISIF = 2
ISMEAR = 0
ISPIN = 2
ISYM = 0  # symmetry breaking extremely likely for defects
KPAR = 2
LASPH = True
LHFCALC = True
LMAXMIX = 4
LORBIT = 11
LREAL = Auto
LVHAR = True
NCORE = 16
NEDOS = 2000
NELECT = 480.0
NKRED = 2
NSW = 200
NUPDOWN = 0
PREC = Accurate
PRECFOCK = Fast
SIGMA = 0.05

Note

The NELECT INCAR tag (-> number of electrons, sets the defect charge state in the calculation) is automatically determined based on the choice of POTCARs. The default in both doped and ShakeNBreak is to use the MPRelaxSet POTCAR choices, but if you’re using different ones, make sure to set potcar_settings in the VASP file generation functions, so that NELECT is then set accordingly. This requires the pymatgen config file $HOME/.pmgrc.yaml to be properly set up as detailed on the Installation docs page

As well as the INCAR, POTCAR and KPOINTS files for each calculation in the defects workflow, we also see that a {defect_species}.json file is created by default, which contains the corresponding DefectEntry python object, which can later be reloaded with DefectEntry.from_json() (useful if we later want to recheck some of the defect generation info). The DefectsGenerator object is also saved to JSON by default here, which can be reloaded with DefectsGenerator.from_json().

A folder with the input files for calculating the reference bulk (pristine, defect-free) supercell is also generated (this calculation is needed to later compute defect formation energies and charge corrections):

!ls CdTe_bulk/vasp_ncl  # only the final workflow VASP calculation is required for the bulk, see docstrings
INCAR   KPOINTS POSCAR  POTCAR

Chemical Potentials

As well as performing our defect supercell calculations, we also need to compute the chemical potentials of the elements in our system, in order to later calculate the defect formation energies. For more information on this, see our defect calculations tutorial video and/or this useful quick-start guide article on defect calculations here.

The workflow for computing and analysing the chemical potentials is described in the Competing Phases tutorial.