Advanced Defect Analysis

Here we describe some more targeted analysis you can do for your defect calculations parsed with doped (including comparing the relaxed configurations for different initial interstitial positions, structure & bond length analysis of defects, and plotting/analysis of the defect charge corrections), which may be useful for in certain cases.

Tip

You can run this notebook interactively through Google Colab or Binder – just click the links! If running on Colab, then you’ll need to run !pip install doped in a cell to install the package, and !git clone https://github.com/SMTG-Bham/doped to download the example data (and update paths in the code cells accordingly).

Defect-Induced Site Displacements (Strain)

Using the DefectEntry.plot_site_displacements() method, we can analyse the displacements of atoms around the defect site (i.e. defect-induced local strain) during geometry relaxation.

%matplotlib inline
from monty.serialization import loadfn
CdTe_defects_thermo = loadfn("CdTe/CdTe_thermo_wout_meta.json")  # load our DefectThermodynamics object

Absolute Displacements

Let’s look at the displacements of atoms around the Cd vacancy in CdTe, and how this changes with charge state:

from doped.utils.plotting import format_defect_name

v_Cd_entries = [entry for entry in CdTe_defects_thermo.defect_entries if "v_Cd" in entry.name]
for defect_entry in v_Cd_entries:
    fig = defect_entry.plot_site_displacements(separated_by_direction=False)
    fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                 fontsize=18)
_images/14f7354ffe15dbb29958b78be6e76f20ec9aed20cc2fafd77becdb55321e29cb.png _images/f4cdf9723da3a4c8d517bc2f1ae79cd86d24ce5850239bb697245e036de91b67.png _images/c897a62bb583b96a479fc79e59506beff00610451375fa6430222460219989f5.png

Separated by Direction

for defect_entry in v_Cd_entries:
    fig = defect_entry.plot_site_displacements(separated_by_direction=True)
    fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                 y=1.2, fontsize=22)
_images/b1c9295729deca41cbcac3912204767868fe0b11b697f57b57fc9a03dcda52c4.png _images/0e9ff1a9ec6f9cf5c47396438d054a0d6aaf8f38ac4aad0f1f1347ed90b4505f.png _images/d402fb560ab06be0be49dd15493950c5dd417d0269be07b66b1c2d8f558f2783.png

Here we see that \(V_{Cd}^{-2}\) has isotropic (symmetric) displacements of atoms around the vacancy site in the x/y/z directions, which makes sense as it adopts a tetrahedral (Td) geometry (as shown in the get_symmetries_and_degeneracies output below and discussed in detail in this paper). As expected, we see an exponential tail-off in the site displacement magnitudes as we move away from the defect, and it is the Te nearest neighbours which are most strongly perturbed.

For \(V_{Cd}^{-1}\), we have a Jahn-Teller-distorted \(C_{3v}\) geometry where a neighbouring Te atom displaces along the [111] direction away from the vacancy site (while the other Te atoms displace towards the vacancy but by a smaller degree), while for \(V_{Cd}^{0}\), we have a \(C_{2v}\) geometry where two neighbouring Te displace significantly towards the vacancy and each other (~1 Å) forming a dimer bond, while the other two Te move a smaller distance towards the vacancy (~0.2 Å) as seen in the displacement plots and illustrated below.

from doped.thermodynamics import DefectThermodynamics
v_Cd_thermo = DefectThermodynamics(v_Cd_entries)
v_Cd_thermo.get_symmetries_and_degeneracies()
Defect q Site_Symm Defect_Symm g_Orient g_Spin g_Total Mult
0 v_Cd 0 Td C2v 6.0 1 6.0 1.0
1 v_Cd -1 Td C3v 4.0 2 8.0 1.0
2 v_Cd -2 Td Td 1.0 1 1.0 1.0
_images/V_Cd_geometries.jpeg

The high symmetry of \(V_{Cd}^{-2}\) is evident from the displacement plots above, where it looks like there are much fewer atoms in the plots, however this is just because we have many symmetry-equivalent atoms in this case and so we end up with many overlapping points (and so much less distinct points). Then for \(C_{3v}\) \(V_{Cd}^{-1}\) we have more distinct sites appearing, and then more so for the lower-symmetry \(C_{2v}\) \(V_{Cd}^{0}\) structure.

Relative Displacements

Instead of plotting the absolute displacements in the x, y, z directions, we can also plot the displacements of atoms relative to the defect site (i.e. displacement of the atom along the line connecting itself to the defect). A negative displacement indicates that the atom moves towards the defect (compressive strain) while a positive displacement indicates that the atom moves away from the defect (tensile strain):

for defect_entry in v_Cd_entries:
    fig = defect_entry.plot_site_displacements(relative_to_defect=True)
    fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                 fontsize=18)
_images/86a25401e9654c1c52f3c785eabda791194f6abbdd96adc7e2729fce49260eb4.png _images/885ab0a79e29466bb2739d5023de9f6478a740fd7cc80f37cdd589e324dd1676.png _images/dac2b29fcb79127b9f6f65dca8459dc7dcb17cb045cae6979a6c7152f56a90bb.png

The above plots clearly show the different reconstructions of each charge state:

  • The formation of the neutral vacancy introduces two holes which localise in the Te-Te dimer. These Te atoms forming the dimer displace significantly towards the defect, while the two other NN Te atoms also move towards the vacancy, but to a smaller degree.

  • The singly charged vacancy introduces a single hole which localises in one of the Te NNs. This Te atom displaces away from the vacancy in the (-1,-1,-1) direction, while the other 3 Te atoms displace towards the vacancy.

  • The doubly charged vacancy keeps the \(T_d\) symmetry, with the 4 Te NNs displacing towards the vacancy by 0.2 Å (from the original Cd-Te bond distance of 2.83 Å to a distance of 2.61 Å) to allow for greater hybridization between dangling bonds.

Note

The data for the atomic site displacements in the relaxed defect supercell is stored in the DefectEntry.calculation_metadata["site_displacements"] attributes, which has a dictionary of site displacement vectors (relative to the unperturbed bulk positions) and their (relaxed) distances to the defect site, ordered by the atom type in the bulk formula, then by distance from the defect site.

Because of the high symmetry and thus overlapping displacement datapoints in the \(V_{Cd}^{-2}\) plot, we can also directly access the site displacement data (stored in DefectEntry.calculation_metadata["site_displacements"], or by directly using calc_site_displacements from doped.utils.displacements) to confirm the number of sites with a given displacement, or to perform other analyses:

import numpy as np
defect_entry = [entry for entry in v_Cd_entries if entry.name == "v_Cd_-2"][0]
# this is a dict of 'displacements' vectors and (original) 'distances' lists, 
# sorted by atom type, then by distance from the defect site:  
displacements_dict = defect_entry.calculation_metadata["site_displacements"]
Te_indices = [i for i, site in enumerate(defect_entry.defect_supercell) if site.specie.symbol == "Te"]
Te_displacements = np.array(displacements_dict["displacements"])[Te_indices]
for i, disp in enumerate(Te_displacements[:5]):
    print(f"Te atom {i} displacement vector: {disp}, magnitude: {np.linalg.norm(disp):.2f} Å")
Te atom 0 displacement vector: [ 0.12698994 -0.1269902  -0.1269885 ], magnitude: 0.22 Å
Te atom 1 displacement vector: [-0.12698929 -0.12698915  0.12698863], magnitude: 0.22 Å
Te atom 2 displacement vector: [-0.12698902  0.12698915 -0.12698785], magnitude: 0.22 Å
Te atom 3 displacement vector: [0.12698902 0.12698929 0.12698772], magnitude: 0.22 Å
Te atom 4 displacement vector: [-0.02291624  0.00363145  0.00363119], magnitude: 0.02 Å

Nice, we can see that it is 4 Te neighbours that each displace by the same magnitude along the {111} directions by 0.22 Å toward \(V_{Cd}^{-2}\), which thus retains the \(T_d\) symmetry of this site.

Displacements Along Specific Directions

We can also analyse the displacements along a specific direction. For instance, for the \(V_{Cd}^{-1}\) defect, we can plot the displacements of atoms along the (-1, -1, -1) direction (the vector along which one of the Te NNs displaces away from the vacancy):

defect_entry = v_Cd_entries[1]
fig = defect_entry.plot_site_displacements(vector_to_project_on=[-1,-1,-1])
_ = fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                y=1.2, fontsize=18)
_images/88a2f0b5d8d7d1a8c3377fa18a171eab7f17881d0c781636e302cdc4d334e6fb.png

which shows how one of the Te NNs displaces in the (-1,-1,-1) direction away from the vacancy, while the other 3 Te atoms displace mostly perpendicular to this direction (overlaped points in the right plot at a distance of 2.5 Å). This is also shown when orienting the defect environment along the (-1,-1,-1) direction, where we see that the Te atom behind the vacancy (in shaded black) has displaced away from the vacancy (from an original distance of 2.8 to 3.0 Å).

_images/v_Cd_-1_oeriented_along_m1m1m1.png

Similarly, for the neutral case, we can plot the displacements along the vector connecting the Te dimer ((1,1,0) vector). This will show two Te atoms significantly displacing along this vector, but in opposite directions (moving towards each other to form the dimer bond):

defect_entry = v_Cd_entries[0]  # Neutral state
vector_to_project_on = [1,1,0]  # Vector connecting Te dimer
fig = defect_entry.plot_site_displacements(vector_to_project_on=vector_to_project_on)
_ = fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                y=1.2, fontsize=18)
_images/21aed5c01af3f0b19af0e528cd65e7e580f82a517d773169f572ddfec92aa644.png

which indeed shows one Te moving by 1 Å along the (1,1,0) direction while the other Te moves by ~0.9 Å along the (-1,-1,0) direction, reducing the original Te-Te bond distance from 4.63 Å to 2.75 Å. This can also be visualised by orienting the defect environment along the (1,1,0) direction:

_images/v_Cd_0_oriented_along_110.png

Substitution Example

For illustration purposes, here we plot the site displacements for \(Te_{Cd}^{+1}\):

te_cd_entry = [entry for entry in CdTe_defects_thermo.defect_entries if entry.name == "Te_Cd_+1"][0]
plot = te_cd_entry.plot_site_displacements()
_images/e04d2e134741c510182dbf9d0c899cd306630240bd455949a9e49ae3a4acbc26.png

We see that the antisite Te atom displaces significantly from the original Cd lattice site (on which it is substituting, at \(x\) = 0 in the plot below), and then three of the neighbouring Te displace by ~0.2 Å while the fourth Te neighbour only displaces by ~0.03 Å, resulting in the \(C_{3v}\) symmetry of this defect:

te_cd_entry.calculation_metadata["relaxed point symmetry"]
'C3v'

Eigenvalue / Electronic Structure Analysis

As discussed on the docs Tips page, we can use the DefectEntry.get_eigenvalue_analysis() method to analyse the electronic structure of our underlying defect calculations, looking at the eigenvalues, orbital character and localisation, which allows the determination of deep or shallow (perturbed host states (PHS)) behaviour etc.

Let’s plot the eigenvalues for the vacancy defects in CdTe:

v_Cd_entries = [entry for entry in CdTe_defects_thermo.defect_entries if "v_Cd" in entry.name]
for defect_entry in v_Cd_entries:
    bes, fig = defect_entry.get_eigenvalue_analysis()
    fig.suptitle(format_defect_name(defect_entry.name, include_site_info_in_name=False),
                 x=0.53, y=1.04, fontsize=18)
_images/62722d3b2a04b7c1c10de77e80b9d018814f1518721ce1a67dc7a19873d4f33f.png _images/bb4c65e4dac503d9251f235412be8ccea2eadde1c7b675883e89060e1d48d7b3.png _images/868a64033a7a96ca09930e395fa526f6e530487fb2b2f6bc6ec8d094c5d11380.png

Nice, we see that these plots match the schematic depiction in this paper on vacancies in CdTe, where we have no in-gap states for the fully-ionised \(V_{Cd}^{-2}\) as expected, an in-gap hole polaron state for \(V_{Cd}^{-1}\), and an anti-bonding dimer state for \(V_{Cd}^{0}\) just above the CBM:

_images/V_Cd_ACS_Energy_Lett_Figure_1.jpeg

We can further verify these conclusions by looking at the band-edge states info output from these functions:

v_Cd_m1 = [entry for entry in CdTe_defects_thermo.defect_entries if entry.name == "v_Cd_-1"][0]
bes = v_Cd_m1.get_eigenvalue_analysis(plot=False)
print(bes)
 -- band-edge states info
Spin-up
     Index  Energy  P-ratio  Occupation  OrbDiff  Orbitals                K-point coords
VBM  565    1.639   0.20     1.00        0.04     Te-p: 0.52              ( 0.000,  0.000,  0.000)
CBM  567    3.136   0.06     0.00        0.09     Cd-s: 0.28, Te-s: 0.21  ( 0.000,  0.000,  0.000)
vbm has acceptor phs: False (0.000 vs. 0.2)
cbm has donor phs: False (0.000 vs. 0.2)
---
Localized Orbital(s)
Index  Energy  P-ratio  Occupation  Orbitals
566    2.030   0.68     0.00        Cd-p: 0.14, Cd-d: 0.16, Te-p: 3.66

Here we see that we have identified the localised Te p hole polaron state for \(V_{Cd}^{-1}\), at band index 566.

Important

Note that the automated band-edge states analysis isn’t guaranteed to work perfectly in all cases, and you may need to adjust parameters such as similar_orb_criterion and similar_energy_criterion in the get_eigenvalue_analysis method – doped will attempt to warn you if a failure is detected.

Eigenvalue Analysis & Shallow / Perturbed Host States

One of the most common reasons for performing this electronic structure analysis is to identify and analyse shallow defect states (i.e. ‘perturbed host states’ (PHS)). As discussed on the docs Tips page, current supercell correction schemes can not accurately account for finite-size errors obtained when calculating the energies of PHS in moderate supercells, so it is recommended to denote such shallow defects as PHS and conclude only qualitatively that their transition level is located near the corresponding band edge.

The optional argument parse_projected_eigen in DefectsParser (True by default) controls whether to load the projected orbitals, and in combination with defect_entry.get_eigenvalue_analysis() returns additional information about the nature of the band edges, allowing defect states (and whether they are deep or shallow (PHS)) to be automatically identified.

Below we give an example for the neutral copper vacancy in Cu₂SiSe₃, which was determined to be a PHS.

from doped.analysis import DefectParser
defect_entry = DefectParser.from_paths(defect_path = "Cu2SiSe3/v_Cu_0/vasp_std",
                                       bulk_path = "Cu2SiSe3/bulk/vasp_std",
                                       dielectric = [[8.73, 0, -0.48],[0., 7.78, 0],[-0.48, 0, 10.11]]
                                      ).defect_entry
bes, fig = defect_entry.get_eigenvalue_analysis()
_images/72e987a233a770bb8968c8243d95278a9ccdec27bb42832cd1886db55d8165f1.png

Here we can see we are getting partial unoccupation of our VBM (i.e. a shallow, delocalised hole state; a PHS). This is further confirmed by printing the band edge info which states that “vbm has acceptor phs” = True:

print(bes)
 -- band-edge states info
Spin-up
     Index  Energy  P-ratio  Occupation  OrbDiff  Orbitals                            K-point coords
VBM  347    3.539   0.05     1.00        0.04     Cu-d: 0.34, Se-p: 0.36              ( 0.000,  0.000,  0.000)
CBM  348    5.139   0.04     0.00        0.10     Se-s: 0.20, Se-p: 0.12, Si-s: 0.13  ( 0.000,  0.000,  0.000)
vbm has acceptor phs: False (0.000 vs. 0.2)
cbm has donor phs: False (0.000 vs. 0.2)
---
Localized Orbital(s)
Index  Energy  P-ratio  Occupation  Orbitals

Spin-down
     Index  Energy  P-ratio  Occupation  OrbDiff  Orbitals                            K-point coords
VBM  347    3.677   0.06     0.00        0.04     Cu-d: 0.34, Se-p: 0.36              ( 0.000,  0.000,  0.000)
CBM  348    5.142   0.04     0.00        0.10     Se-s: 0.20, Se-p: 0.12, Si-s: 0.13  ( 0.000,  0.000,  0.000)
vbm has acceptor phs: True (1.000 vs. 0.2)
cbm has donor phs: False (0.000 vs. 0.2)
---
Localized Orbital(s)
Index  Energy  P-ratio  Occupation  Orbitals

Important terms included in print(bes):

  1. P-ratio: The ratio of the summed projected orbital contributions of the defect & neighbouring sites to the total sum of orbital contributions from all atoms to that electronic state. A value close to 1 indicates a localised state.

  2. Occupation: Occupation of the electronic state / orbital.

  3. vbm has acceptor phs/cbm has donor phs: Whether a PHS has been automatically identified. Depends on how VBM-like/CBM-like the defect states are and the occupancy of the state. (X vs. 0.2) refers to the hole/electron occupancy at the band edge vs the default threshold of 0.2 for flagging as a PHS (but you should use your own judgement of course).

  4. Localized Orbital(s): Information about localised defect states, if present.

Additionally, Index refers to the band/eigenvalue index in the DFT calculation, Energy is its eigenvalue energy at the given K-point coords, Orbitals lists the projected orbital contributions to that state, and OrbDiff is the normalised difference in projected orbital contributions to the VBM/CBM states between the bulk and defect supercells.

print(f"{0.0001:7.2e}")
1.00e-04

Custom Supercell Generation

Here we’ll illustrate some of the ways in which we can customise the supercell generation process for defect calculations in doped, including:

  • Generation of optimal supercell based on minimum image distance and atom number constraints (doped default)

  • Manual generation of an arbitrary supercell

  • Enforced (near-)cubic supercell

  • Enforced diagonal expansion

We’ll use silicon (the “go-to”) as our example host material here.

Optimal Supercell (Default)

When choosing a simulation supercell for charged defects in materials, we typically want to maximise the minimum distance between periodic images of the defect (to reduce finite-size errors) while keeping the supercell to a tractable number of atoms/electrons to calculate.

As detailed in the JOSS paper, doped integrates an efficient algorithm for calculating minimum image distances and (if no specific supercell is chosen) then directly optimises the supercell choice for this goal – often identifying non-trivial ‘root 2’/’root 3’ type supercells. This leads to a significant reduction in the supercell size (and thus computational cost) required to achieve a threshold minimum image distance (mean improvements of 7-35% in minimum image distance compared pymatgen/ASE functions). Let’s see this in action:

from doped.generation import DefectsGenerator
from pymatgen.core.structure import Structure

si_poscar = Structure.from_file("../tests/data/Si_MP_conv_POSCAR")
defect_gen = DefectsGenerator(si_poscar)
Generating DefectEntry objects: 100.0%|██████████| [00:11,   8.75it/s]                             
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix identified
array([[3., 0., 0.],
       [0., 3., 0.],
       [0., 0., 3.]])
print(f"Here `doped` identifies a 3x3x3 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms, satisfying the default minimum image distance and atom number constraints of 10 Å and 50 atoms respectively.")
Here `doped` identifies a 3x3x3 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of 11.55 Å with 54 atoms, satisfying the default minimum image distance and atom number constraints of 10 Å and 50 atoms respectively.

Note

DefectsGenerator.supercell_matrix is defined with respect to the primitive unit cell of the host material, which may differ from the input structure (if it’s not the primitive cell).

# these default constraints and behaviour are noted in the docstring, which can be printed with:
DefectsGenerator?

Ok, so maybe we have a case where we want to be stricter with our minimum image distance requirement, and want to set it to 13 Å, and also make sure we have at least 120 atoms in the supercell. We can do this by just setting the min_image_distance and min_atoms entries in supercell_gen_kwargs with DefectsGenerator:

defect_gen = DefectsGenerator(si_poscar,
                              supercell_gen_kwargs={"min_image_distance": 13, "min_atoms": 120})
Generating DefectEntry objects: 100.0%|██████████| [00:13,   7.62it/s]                             
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix identified
array([[4., 0., 0.],
       [0., 4., 0.],
       [0., 0., 4.]])
print(f"Here `doped` identifies a 4x4x4 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms, satisfying our updated minimum image distance and atom number constraints of 13 Å and 120 atoms respectively.")
Here `doped` identifies a 4x4x4 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of 15.40 Å with 128 atoms, satisfying our updated minimum image distance and atom number constraints of 13 Å and 120 atoms respectively.

As detailed in the DefectsGenerator docstring, there is also the ideal_threshold = x parameter, which allows a supercell larger by a fraction x (default = 0.1 = 10%) than that which satisfies the min image distance / atom number constraints, if it is a perfect diagonal expansion of the primitive or conventional cell (because diagonal expansions often help simplify things like visualisation, symmetry analysis, finite-size error cancellation etc., so are typically preferred if the additional computational cost is relatively low).

But as always, this parameter can be adjusted. Perhaps we don’t care about this and just want the minimum supercell size that satisfies our min image distance / atom number constraints; we can just set this to 0:

defect_gen = DefectsGenerator(si_poscar,
                              supercell_gen_kwargs={"min_image_distance": 13,
                                                    "ideal_threshold": 0})
Generating DefectEntry objects: 100.0%|██████████| [00:15,   6.62it/s]                             
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix identified
array([[-4.,  2.,  2.],
       [ 2., -4.,  2.],
       [ 2.,  2.,  0.]])
print(f"Here `doped` identifies a non-diagonal (non-FCC) expansion of the primitive unit cell as the optimal supercell for diamond-like silicon, having a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms, satisfying our minimum image distance constraint of 13 Å, with no testing of larger supercells which may give diagonal expansions.")
Here `doped` identifies a non-diagonal (non-FCC) expansion of the primitive unit cell as the optimal supercell for diamond-like silicon, having a minimum image distance of 13.33 Å with 96 atoms, satisfying our minimum image distance constraint of 13 Å, with no testing of larger supercells which may give diagonal expansions.

Of course, we could also increase ideal_threshold from the default 0.1 if we were particularly keen on adopting diagonal supercell expansions when possible. If we only want diagonal supercell expansions, we can just set force_diagonal: True in supercell_gen_kwargs.

Manual Supercell Generation

Often we may be trying to perform defect calculations following a pre-defined workflow – from a paper in the literature or a previous project for instance. In these cases, to remove potential sources of variance in the results, we may want to ensure we’re using the exact same supercell specification, which may not be the optimal supercell for that system.

To use a specific supercell in doped, we can just load or generate this supercell manually and then input this structure to DefectsGenerator along with generate_supercell = False, in which case it will use the input structure as the supercell:

si_custom_supercell = si_poscar * [3, 2, 3]  # e.g. a 3x2x3 supercell
defect_gen = DefectsGenerator(si_custom_supercell, generate_supercell=False)
Generating DefectEntry objects: 100.0%|██████████| [00:05,  18.37it/s]
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix
array([[-3.,  3.,  3.],
       [ 2., -2.,  2.],
       [ 3.,  3., -3.]])

As mentioned above DefectsGenerator.supercell_matrix is defined with respect to the primitive unit cell of the host material rather than the input structure (which does not need to be the primitive cell). Here our original structure was the conventional cell of diamond-like silicon, and so a 3x2x3 expansion of it corresponds to a supercell matrix of [[-3, 3, 3], [2, -2, 2], [3, 3, -3]] with respect to the primitive cell.

print(f"Here our custom supercell has a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms.")
Here our custom supercell has a minimum image distance of 10.89 Å with 144 atoms.

Enforced (Near-)Cubic Supercell

If we want to enforce generation of a (near-)cubic supercell, we can just set force_cubic: True in supercell_gen_kwargs. This will generate the smallest supercell that is cubic or near-cubic (i.e. where the lattice vectors are all within a certain tolerance of each other) that also satisfies the minimum image distance and atom number constraints.

defect_gen = DefectsGenerator(si_poscar,
                              supercell_gen_kwargs={"force_cubic": True})
Generating DefectEntry objects: 100.0%|██████████| [00:04,  20.49it/s]
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix wrt primitive cell
array([[-2,  2,  2],
       [ 2, -2,  2],
       [ 2,  2, -2]])
print(f"Here our enforced-cubic supercell has a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms.")
Here our enforced-cubic supercell has a minimum image distance of 10.89 Å with 64 atoms.
defect_gen.bulk_supercell.lattice  # show the lattice vectors, cubic as expected
Lattice
    abc : 10.8874 10.8874 10.8874
 angles : 90.0 90.0 90.0
 volume : 1290.5431714516237
      A : 10.8874 0.0 0.0
      B : 0.0 10.8874 0.0
      C : 0.0 0.0 10.8874
    pbc : True True True

Enforced Diagonal Expansion

If we only want to allow diagonal supercell expansions (of the primitive cell), we can just set force_diagonal: True in supercell_gen_kwargs. This will generate the smallest supercell with a diagonal expansion matrix that also satisfies the minimum image distance and atom number constraints:

defect_gen = DefectsGenerator(si_poscar,
                              supercell_gen_kwargs={"force_diagonal": True})
Generating DefectEntry objects: 100.0%|██████████| [00:04,  21.02it/s]
Vacancies    Guessed Charges    Conv. Cell Coords    Wyckoff
-----------  -----------------  -------------------  ---------
v_Si         [+1,0,-1]          [0.000,0.000,0.000]  8a

Interstitials    Guessed Charges    Conv. Cell Coords    Wyckoff
---------------  -----------------  -------------------  ---------
Si_i_D3d         [+4,+3,+2,+1,0]    [0.625,0.625,0.625]  16d
Si_i_Td          [+4,+3,+2,+1,0]    [0.500,0.500,0.500]  8b

The number in the Wyckoff label is the site multiplicity/degeneracy of that defect in the conventional ('conv.') unit cell, which comprises 8 formula unit(s) of Si.
Note that Wyckoff letters can depend on the ordering of elements in the conventional standard structure, for which doped uses the spglib convention.

defect_gen.supercell_matrix  # show the supercell matrix wrt primitive cell
array([[3, 0, 0],
       [0, 3, 0],
       [0, 0, 3]])

Here we get the same supercell expansion as in our first example case with silicon above, as it found that a diagonal expansion of our FCC-like primitive cell gave the optimal supercell under the default minimum image distance and atom number constraints.

print(f"Here `doped` identifies a 3x3x3 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of {defect_gen.min_image_distance:.2f} Å with {defect_gen.bulk_supercell.num_sites} atoms, satisfying the default minimum image distance and atom number constraints of 10 Å and 50 atoms respectively.")
Here `doped` identifies a 3x3x3 expansion of the primitive unit cell as the optimal supercell for diamond-like silicon (giving a FCC-like supercell), having a minimum image distance of 11.55 Å with 54 atoms, satisfying the default minimum image distance and atom number constraints of 10 Å and 50 atoms respectively.

Point Symmetry Analysis

As shown in the thermodynamics analysis tutorial, if we have parsed our defect calculations with doped, we can automatically determine the corresponding point symmetries of the relaxed defect and original bulk site, which gives us the orientational degeneracy factor for the defect concentrations (regardless of whether the calculation inputs were generated with doped) – as shown below:

from monty.serialization import loadfn
CdTe_thermo = loadfn("CdTe/CdTe_LZ_thermo_wout_meta.json")  # load our DefectThermodynamics object
CdTe_thermo.get_symmetries_and_degeneracies()
Defect q Site_Symm Defect_Symm g_Orient g_Spin g_Total Mult
0 v_Cd 0 Td C2v 6.0 1 6.0 1.0
1 v_Cd -1 Td C3v 4.0 2 8.0 1.0
2 v_Cd -2 Td Td 1.0 1 1.0 1.0
3 v_Te +2 Td Td 1.0 1 1.0 1.0
4 v_Te +1 Td Td 1.0 2 2.0 1.0
5 v_Te 0 Td C2v 6.0 1 6.0 1.0
6 v_Te -1 Td Cs 12.0 2 24.0 1.0
7 v_Te -2 Td Cs 12.0 1 12.0 1.0
8 Cd_Te +2 Td Td 1.0 1 1.0 1.0
9 Cd_Te +1 Td C2v 6.0 2 12.0 1.0
10 Cd_Te 0 Td Cs 12.0 1 12.0 1.0
11 Cd_Te -1 Td Td 1.0 2 2.0 1.0
12 Cd_Te -2 Td C1 24.0 1 24.0 1.0
13 Te_Cd +2 Td Td 1.0 1 1.0 1.0
14 Te_Cd +1 Td C3v 4.0 2 8.0 1.0
15 Te_Cd 0 Td C3v 4.0 1 4.0 1.0
16 Te_Cd -1 Td Cs 12.0 2 24.0 1.0
17 Te_Cd -2 Td Cs 12.0 1 12.0 1.0
18 Cd_i_Td_Cd2.83 +2 Td Td 1.0 1 1.0 1.0
19 Cd_i_Td_Cd2.83 +1 Td Td 1.0 2 2.0 1.0
20 Cd_i_Td_Cd2.83 0 Td Td 1.0 1 1.0 1.0
21 Cd_i_Td_Te2.83 +2 Td Td 1.0 1 1.0 1.0
22 Cd_i_Td_Te2.83 +1 Td Td 1.0 2 2.0 1.0
23 Cd_i_Td_Te2.83 0 Td Td 1.0 1 1.0 1.0
24 Te_i_Td_Te2.83 +2 C1 Cs 0.5 1 0.5 24.0
25 Te_i_Td_Te2.83 +1 Cs C2v 0.5 2 1.0 12.0
26 Te_i_Td_Te2.83 0 C1 C2 0.5 1 0.5 24.0
27 Te_i_Td_Te2.83 -1 C1 C2 0.5 2 1.0 24.0
28 Te_i_Td_Te2.83 -2 C1 C2 0.5 1 0.5 24.0

Note

The product of the defect degeneracy factors (such as orientational and spin, which are automatically computed by doped) and site multiplicity determine the pre-factor is used in the calculation of defect/carrier concentrations (and thus doping / Fermi level behaviour). For more explanation and discussion of this, see e.g. Impact of metastable defect structures on carrier recombination in solar cells & Imperfections are not 0 K: free energy of point defects in crystals.

However, we can also directly use the point_symmetry function from doped.utils.symmetry (see the docstring in the python API docs) to obtain the relaxed or unrelaxed (bulk site) point symmetries of a given defect supercell, directly from just the relaxed structures, regardless of whether these defects were generated/parsed with doped:

from doped.utils.symmetry import point_symmetry
from pymatgen.core.structure import Structure

relaxed_defect_supercell = Structure.from_file("YTOS/Int_F_-1/Relaxed_CONTCAR")
print(point_symmetry(relaxed_defect_supercell))
C4v

Note

Note: It should be noted that the point symmetry analysis is not guaranteed to work in all cases; namely for certain non-trivial supercell expansion matrices (e.g. a 2x1x2 expansion)(particularly with high-symmetry materials) which can break the periodicity of the cell. With the standard doped parsing functions (via DefectThermodynamics), doped will automatically check if this is the case, and will warn you if so. If you are using point_symmetry directly, then you can also provide the reference host supercell (with bulk_structure, as shown below) and doped will check this.

If periodicity-breaking does prevent auto-symmetry determination, you can manually determine the relaxed defect and bulk-site point symmetries, and/or orientational degeneracy, from visualising the structures (e.g. using VESTA)(can use get_orientational_degeneracy to obtain the corresponding orientational degeneracy factor for given defect/bulk-site point symmetries) and setting the corresponding values in the calculation_metadata['relaxed point symmetry']/['bulk site symmetry'] and/or degeneracy_factors['orientational degeneracy'] attributes.

relaxed_defect_supercell = Structure.from_file("YTOS/Int_F_-1/Relaxed_CONTCAR")
bulk_supercell = Structure.from_file("YTOS/Bulk/POSCAR")
print(point_symmetry(relaxed_defect_supercell, bulk_structure=bulk_supercell))
C4v

No warning is encountered here, so we can be confident that there is no periodicity breaking at play. With bulk_structure supplied, we can also get the bulk site symmetry of the defect (i.e. the symmetry at the unrelaxed defect site), by setting relaxed=False:

print(point_symmetry(relaxed_defect_supercell, bulk_structure=bulk_supercell, relaxed=False))
Cs

Further Local Structure Analysis

Here we’re manually analysing some Cdᵢ vasp_gam calculations to see which interstitial site is favoured:

import os
from doped.analysis import DefectParser

bulk_path = "CdTe/CdTe_bulk/vasp_gam/"  # path to bulk (defect-free) supercell calculation
dielectric = 9.13  # calculated dielectric constant, required for computing defect charge corrections
Cd_i_dict = {}  # Keep dictionary of parsed defect entries

for i in os.listdir("CdTe"):
    if 'Cd_i' in i:
        Cd_i_dict[i] = DefectParser.from_paths(
            defect_path=f"CdTe/{i}/vasp_gam/", bulk_path=bulk_path, dielectric=dielectric).defect_entry

for defect_name, defect_entry in Cd_i_dict.items():
    print(f"Name: {defect_name}; Raw Supercell Energy: {defect_entry.get_ediff():.3f} eV")
    # note this energy is just the energy difference of the bulk and defect supercells (including
    # finite-size charge corrections if any – none here as they're neutral defects), without Fermi
    # level or chemical potential terms (though these are constant for the same defect & charge)
Name: Cd_i_Td_Cd2.83_0; Raw Supercell Energy: 0.592 eV
Name: Cd_i_C3v_0; Raw Supercell Energy: 0.728 eV
Name: Cd_i_Td_Te2.83_0; Raw Supercell Energy: 0.728 eV

Here we see that the Cd-coordinated interstitial site is the lowest energy for neutral cadmium interstitials here!

Note

The energies here do not yet account for the chemical potentials, which are included later in the post-processing workflow (as shown earlier in this notebook). However, the chemical potential energy correction is the same for each charge state or site, for a given defect (e.g. Cdi here) - hence the relative energies are still meaningful here.

Here we see that Cd_i_C3v_0 and Cd_i_Td_Te2.83_0 have equal final energies (rounded to 1 meV/atom) suggesting they have relaxed to the same final structure (despite different initial interstitial positions). Let’s use StructureMatcher and local_env to double-check:

# Here we use the pymatgen StructureMatcher class to compare the relaxed structures of neutral Cd_i:
from pymatgen.analysis.structure_matcher import StructureMatcher
sm = StructureMatcher()
print("Are Cd_i_Td_Cd2.83_0 and Cd_i_C3v_0 final structures the same?:",
      sm.fit(Cd_i_dict['Cd_i_Td_Cd2.83_0'].defect_supercell, Cd_i_dict['Cd_i_C3v_0'].defect_supercell))
print("Are Cd_i_C3v_0 and Cd_i_Td_Te2.83_0 final structures the same?:",
      sm.fit(Cd_i_dict['Cd_i_C3v_0'].defect_supercell, Cd_i_dict['Cd_i_Td_Te2.83_0'].defect_supercell))
Are Cd_i_Td_Cd2.83_0 and Cd_i_C3v_0 final structures the same?: False
Are Cd_i_C3v_0 and Cd_i_Td_Te2.83_0 final structures the same?: True
# we can perform further defect structural analysis with these functions:
from pymatgen.analysis.local_env import CrystalNN
import numpy as np

for key, defect_entry in Cd_i_dict.items():
    # get defect site index in structure: (needed for CrystalNN)
    for i, site in enumerate(defect_entry.defect_supercell.sites):
        if np.isclose(site.frac_coords, defect_entry.defect_supercell_site.frac_coords).all():
            isite = i  # site index, starting from 0

    crystalNN = CrystalNN()
    struct = defect_entry.defect_supercell
    struct.add_oxidation_state_by_guess()
    print("Local order parameters (i.e. resemblence to given structural motif): ",
          crystalNN.get_local_order_parameters(struct, isite))
    print("Nearest-neighbour dictionary: ",
          crystalNN.get_cn_dict(struct, isite))

    bond_lengths = []  # Bond Lengths?
    for i in crystalNN.get_nn_info(struct, isite):
        bond_lengths.append({'Element': i['site'].specie.as_dict()['element'],
                             'Distance': f"{i['site'].distance(struct[isite]):.3f}"})
    print("Bond-lengths (in Angstrom) to nearest neighbours: ", bond_lengths, "\n")
Local order parameters (i.e. resemblence to given structural motif):  None
Nearest-neighbour dictionary:  {'Te0+': 6, 'Cd0+': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Te', 'Distance': '3.298'}, {'Element': 'Cd', 'Distance': '3.007'}, {'Element': 'Cd', 'Distance': '3.007'}, {'Element': 'Cd', 'Distance': '3.007'}, {'Element': 'Cd', 'Distance': '3.007'}] 

Local order parameters (i.e. resemblence to given structural motif):  {'square co-planar': 0.08049643519922586, 'tetrahedral': 0.9999935468913711, 'rectangular see-saw-like': 0.007133072179242341, 'see-saw-like': 0.23547633536015408, 'trigonal pyramidal': 0.24644908542744104}
Nearest-neighbour dictionary:  {'Te0+': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}] 

Local order parameters (i.e. resemblence to given structural motif):  {'square co-planar': 0.07996844283674677, 'tetrahedral': 0.9999999999971609, 'rectangular see-saw-like': 0.0070246315480141, 'see-saw-like': 0.23425410407519495, 'trigonal pyramidal': 0.2452100857961308}
Nearest-neighbour dictionary:  {'Te0+': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}, {'Element': 'Te', 'Distance': '2.911'}] 

Here we see the structural similarity of “Cd_i_C3v_0” and “Cd_i_Td_Te2.83_0”, showing that they have indeed relaxed to the same structure. This means we only need to continue with one of these for the more expensive vasp_std and vasp_ncl calculations with our full k-point mesh.

Note

If you want to do this coordination environment analysis with a vacancy, you may have to introduce a fake atom at the vacancy position, in order to create a pymatgen Site object, to then use with CrystalNN. For example:

from doped.thermodynamics import DefectThermodynamics
v_Cd_thermo = DefectThermodynamics(
    [entry for entry in CdTe_defects_thermo.defect_entries if "v_Cd" in entry.name],
    chempots=CdTe_defects_thermo.chempots
)  # only Cd vacancy defects
from pymatgen.analysis.local_env import CrystalNN
from doped.thermodynamics import bold_print

for defect_entry in v_Cd_thermo.defect_entries:
    bold_print(f"{defect_entry.name}, Charge State: {defect_entry.charge_state}")
    crystalNN = CrystalNN(distance_cutoffs=None, x_diff_weight=0.0, porous_adjustment=False, search_cutoff=5)
    struct = defect_entry.defect_supercell.copy()
    struct.append('U', defect_entry.defect_supercell_site.frac_coords) # Add a fake element
    isite = len(struct.sites) - 1 # Starts counting from zero!

    print("Local order parameters (i.e. resemblance to given structural motif): ",
          crystalNN.get_local_order_parameters(struct, isite))
    print("Nearest-neighbour dictionary: ", crystalNN.get_cn_dict(struct, isite))

    bond_lengths = []  # Bond Lengths?
    for i in crystalNN.get_nn_info(struct,isite):
        bond_lengths.append({'Element': i['site'].specie.as_dict()['element'],
                             'Distance': f"{i['site'].distance(struct[isite]):.3f}"})
    print("Bond-lengths (in Angstrom) to nearest neighbours: ",bond_lengths,"\n")
v_Cd_-2, Charge State: -2
Local order parameters (i.e. resemblance to given structural motif):  {'square co-planar': 0.07996848894580866, 'tetrahedral': 0.999999999996243, 'rectangular see-saw-like': 0.007024644113827354, 'see-saw-like': 0.23425369905750856, 'trigonal pyramidal': 0.24520967518806777}
Nearest-neighbour dictionary:  {'Te': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '2.613'}, {'Element': 'Te', 'Distance': '2.613'}, {'Element': 'Te', 'Distance': '2.613'}, {'Element': 'Te', 'Distance': '2.613'}] 

v_Cd_-1, Charge State: -1
Local order parameters (i.e. resemblance to given structural motif):  {'square co-planar': 0.08955199275710107, 'tetrahedral': 0.9980437792997895, 'rectangular see-saw-like': 0.00914205834683717, 'see-saw-like': 0.2561471898083992, 'trigonal pyramidal': 0.2673736880526364}
Nearest-neighbour dictionary:  {'Te': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '2.585'}, {'Element': 'Te', 'Distance': '2.587'}, {'Element': 'Te', 'Distance': '2.587'}, {'Element': 'Te', 'Distance': '3.046'}] 

v_Cd_0, Charge State: 0
Local order parameters (i.e. resemblance to given structural motif):  {'square co-planar': 0.1554382566688805, 'tetrahedral': 0.7810051379511412, 'rectangular see-saw-like': 0.052869064285435134, 'see-saw-like': 0.22758740109965894, 'trigonal pyramidal': 0.23528866099223875}
Nearest-neighbour dictionary:  {'Te': 4}
Bond-lengths (in Angstrom) to nearest neighbours:  [{'Element': 'Te', 'Distance': '2.178'}, {'Element': 'Te', 'Distance': '2.605'}, {'Element': 'Te', 'Distance': '2.235'}, {'Element': 'Te', 'Distance': '2.671'}]