# Preparation of reference data¶

Throughout the tutorial we will be using reference data for training and comparison. The present section provides a short description of the code to generate these data.

## General preparations¶

We first set a few parameters that define the structures to be generated. This includes

the size of the supercell (

`cell_size`

),the number of structures (

`number_of_structures`

),the standard deviation of the distribution of displacements (

`rattle_std`

),the minimum separation of any two atoms in the “rattled” structures (

`minimum_distance`

), andthe name of the output file (

`structures_fname`

).

In addition, we specify a primitive structure and an **ASE** calculator
object. For simplicity, here, we will employ the very simple effective medium
theory (EMT) calculator provide by ASE. In practice, one
would commonly resort to higher quality method such as density functional
theory (DFT).

```
"""
Prepare training structures for FCC-Ni using an EMT calculator and a
Monte Carlo rattle approach for generating displacements.
Runs in approximately 10 seconds on an Intel Core i5-4670K CPU.
"""
from ase.io import write
from ase.build import bulk
from ase.calculators.emt import EMT
from hiphive.structure_generation import generate_mc_rattled_structures
# parameters
n_structures = 5
cell_size = 4
rattle_std = 0.03
minimum_distance = 2.3
# setup
prim = bulk('Ni', cubic=True)
atoms_ideal = prim.repeat(cell_size)
```

## Structure generation¶

One could easily generate structures with randomized displacements by drawing
components of the atomic displacement vector from a normal distribution with a
specific standard deviation (`rattle_std`

). This procedure, however, tends to
produce structures with some very large forces, which poorly mimic the
distribution of forces observed in molecular dynamics simulations.
This behavior occurs since the _uncorrelated_ application of
displacements can lead to very short interatomic distances as shown in
the following figure. Such situations are heavily penalized in e.g.,
MD simulations because of the steep repulsive interaction.

The `generate_mc_rattled_structures`

function
provides a means to overcome this issue. It implements a simple Monte Carlo
procedure, which enforces a lower limit on any interatomic distance in the
final structure (`minimum_distance`

). Here, it is used to generate structures
with randomized displacements. A more detailed discussion of this subject can
be found here, and we recommend
looking into the phonon-rattle approach for generating large but physical displacements.

Warning

Please note that calling functions that rely on the generation of pseudo-
random numbers *repeatedly with the same seed* (i.e., repeatedly falling back
to the default value) is **strongly** discouraged as it will lead to
correlation. To circumvent this problem one can for example seed a sequence
of random numbers and then use these numbers in turn as seeds.

For each randomized structure we then compute the atomic forces. The final set of structures is written to file for later use.

```
structures = generate_mc_rattled_structures(atoms_ideal, n_structures, rattle_std, minimum_distance)
for atoms in structures:
atoms.calc = EMT()
atoms.get_forces()
# save structures
write('prim.extxyz', prim)
write('supercell_ideal.extxyz', atoms_ideal)
write('supercells_rattled.extxyz', structures)
```

## Source code¶

The complete source code is available in
`examples/tutorial/1_prepare_reference_data.py`

```
"""
Prepare training structures for FCC-Ni using an EMT calculator and a
Monte Carlo rattle approach for generating displacements.
Runs in approximately 10 seconds on an Intel Core i5-4670K CPU.
"""
from ase.io import write
from ase.build import bulk
from ase.calculators.emt import EMT
from hiphive.structure_generation import generate_mc_rattled_structures
# parameters
n_structures = 5
cell_size = 4
rattle_std = 0.03
minimum_distance = 2.3
# setup
prim = bulk('Ni', cubic=True)
atoms_ideal = prim.repeat(cell_size)
# generate structures
structures = generate_mc_rattled_structures(atoms_ideal, n_structures, rattle_std, minimum_distance)
for atoms in structures:
atoms.calc = EMT()
atoms.get_forces()
# save structures
write('prim.extxyz', prim)
write('supercell_ideal.extxyz', atoms_ideal)
write('supercells_rattled.extxyz', structures)
```