\[\def\CC{\bf C} \def\QQ{\bf Q} \def\RR{\bf R} \def\ZZ{\bf Z} \def\NN{\bf N}\]

Introduction to Molecular Dynamics¶

The aim of this exercises is to introduce you to basic techniques for analysing output from a molecular dynamics simulation.

Setup (optional)¶

This step is needed only if you run on google colab. Instructions may work on other systems but are not tested.

[ ]:

! pip install data_tutorials
! apt install gfortran

Now bring the data needed for the exercises.

[ ]:

from data_tutorials.data import get_data
get_data(
    url="https://raw.githubusercontent.com/ccp5UK/summerschool/main/Day_5/Basics/",
    filename=["plots.py"],
    folder=".",
)
get_data(
    url="https://raw.githubusercontent.com/ccp5UK/summerschool/main/Day_5/Basics/",
    filename=["acft.f90","auto.f90","block.f90","msd.f90","sfac.f90","vcor.f90","ljmd.F90"],
    folder=".",
)

you can follow the instructions in jupyter cells, feel free to add as needed or run the commands in the console. It is up to you… the instructions are given only as orientation. Please note while the codes you run are written in Fortran, that is just for historical convenience… but they in a way simulate the reality of research where you may have to use tools written in diverse languages.

[ ]:

from plots import *

Basic Analysis of Molecular Dynamics Data¶

These exercises will introduce you to the basic techniques for analysing the output from a molecular dynamics (MD) simulation, in this case a simple Lennard-Jones (LJ) system. This will involve:

Calculation of the radial distribution function and the structure factor;
Statistical analysis of the thermodynamic data;
Time analysis of the thermodynamic fluctuations;
The mean squared displacement; and
The velocity autocorrelation function

Remember that we do not expect you to finish every piece of work and we encourage you to take the course work back to your institution where you can complete it in your own time.

The MD data will be generated by the program ljmd. You will be analysing the ljmd output data files in the following exercises.

The ljmd program, along with the other files for this exercise can be found in current folder. If you are not very familiar with molecular dynamics it is a good idea to look at the source code (ljmd.f90) and see how it works. The things to look out for are:

The generation of the starting structure. For LJ systems this is usually a Face-Centred-Cubic (FCC) crystal.
Setting the initial velocities using random numbers.
Calculating the interatomic forces (the subroutine \(forces\)), using a ‘shifted force’ form of the LJ interaction. This has the form:

\[V(r) = 4 \epsilon \left( \left[ \frac{\sigma}{r} \right]^{12} - \left[ \frac{\sigma}{r} \right]^6 \right) + \alpha r + \beta\]

Where \(\alpha\) and \(\beta\) are chosen so that the force and energy are zero at the cut off. These perturbations of the basic LJ form are small provided the cut off is large. The shifted force form helps to eliminate artefacts arising from the use of a cut off.

The radial distribution calculation, which is embedded in the force calculation.
The implementation of the velocity Verlet integration algorithm.
The uses of the minimum image convention.

These ideas are covered in molecular dynamics lecture 1. Feel free to ask the demonstrators if you want anything explained!

To compile the ljmd.f90 source code use the command:

[ ]:

!gfortran -o ljmd ljmd.F90

To run the ljmd program you need the input file MDIN. The contents are as follows. These are fairly obvious variables, but ask if you are unsure.

[ ]:

%%writefile MDIN

        number of time steps
         number of equilibration steps
           sampling interval for properties
8            particle density (LJ units)
0            temperature (LJ units)
001          time step (LJ units)
.true.        trajectory file option
.false.         write positions (.true.) or velocities (.false.)
17          used to initialize the random numbers seed array

Note that the program uses dimensionless (or LJ) units throughout, so the program does not refer to any particular LJ system, but to all LJ systems in general. For these exercises you will be told which variables to change when you do the experiments. You can open MDMIN to see the content from the left hand side list. ljmd is run using the command

[ ]:

!./ljmd < MDIN

The program immediately prints the values of the control variables (above) on-screen. Thereafter, at regular intervals, it prints the current time step and system energy, temperature and pressure on-screen. The run will take a couple of minutes only and finish with a summary of the average energy, temperature and pressure for the equilibrated system.

ljmd also generates the following files:

STA - the time, energy, temperature and pressure for every time step.
RDF - the system radial distribution function.
TRJ - a \(history\) of the system configuration at regular intervals during the simulation (optional).
XYZ - the XYZ coordinates of the configuration at the end of the run (suitable for viewing with a molecular graphics program).

Exercise 1: The Radial Distribution Function¶

The Radial Distribution Function (RDF) is one of the simplest and most useful properties of an atomic system. Formally it is related to the average number of particles in a shell of thickness dr at a distance r from a central atom divided by the expected number of particles assuming a uniform density i.e.

\[g(r) = \frac{\langle n(r) \rangle}{4 \pi r^2 \rho \delta r}\]

Where g(r) is the RDF, n(r) is the actual mean number in the shell dr, and r is the uniform density.

Calculating the RDF with ljmd is easy - the program does it for you! Take the MDIN file as supplied and run the program. While it runs, note how good the energy conservation is for time steps greater than 1000 (the equilibration period), this is an aspect of the ‘shifted force’ feature of this program - there are no discontinuities in the potential function at the cut-off to contribute to energy fluctuation.

When the job finishes, use a graph plotter to view the RDF file, which is a simple XY formatted file. The plot obtained is of the radial distribution function versus the inter-atomic separation. The peaks appearing characterise the average ‘packing structure’ of the LJ atoms in the system. Note the form of this function for later comparisons (i.e. store it some somewhere safe!).

[ ]:

! ./ljmd < MDIN

[ ]:

plotFile("RDF","x","y","my title")

Exercise 2: Structure Factors¶

Next you should use the structure factor program SFAC. The structure factor is based on a Fourier transform of the radial distribution function:

\[S(k) = 1 + 4 \pi \frac{\rho}{k} \int_{0}^{\infty} (g(r) - 1) sin(kr) r dr\]

Use your editor to look at how this program works (sfac.f90). Note the use of the RADIAL Fourier transform. Compile it using

[ ]:

!gfortran -o sfac sfac.f90

Run the program using the command:

[ ]:

!./sfac

[ ]:

plotFile("SOK","x","y","my title")

and then look at the output file SOK with your graph plotter. The peaks in this plot signify regularity in the structure of the system. This is an important function experimentally, as it is in this form that RDFs are obtained from scattering experiments. Note that the function is often noisy at small values of k, this is because the integral above is supposed to be evaluated to infinite r, but must be truncated in practice. S(k) for small values of k can be computed directly from the Fourier transform of the particle density as in, but this is a more complicated and expensive procedure.

The next thing you should do is perform a series of simulations of the LJ system, lowering the system temperature (as defined in the file MDIN) by 0.1 for each run. After each run store the RDF and associated SOK files. Finally compare them against each other to see how the structure changes with temperature. Note how the peaks tend to sharpen up as the temperature falls. This is important evidence for the state of the system.

Lastly, you should know that the importance of the RDF goes beyond the insight it offers into the structure of a bulk system, it is also important in discussing the thermodynamics. To see this, write a short Fortran/C or Python program to calculate the following integral, based on data in one of your RDF files.

\[U_{c} = 2 \pi \rho N \int g(r) u(r) r^2 dr\]

Where N is the number of atoms (108), r is the mean particle density, g(r) the radial distribution function and u(r) is the dimensionless LJ function:

\[U(r) = 4 \left( \left(\frac{1}{r}\right)^{12} - \left(\frac{1}{r}\right)^{6} \right)\]

Then add the kinetic energy, which is \(1.5*T*N\). Compare the value obtained with the average energy obtained in the corresponding simulation. How close are the two numbers? (Actually there is a deliberate mistake in the procedure described here which, if corrected, will improve the result. Can you work out what it is?)

[ ]:

import numpy as np
import scipy.integrate as integ

def LJ(r,σ,ε):
    return 4.0*ε*((σ/r)**12-(σ/r)**6)

N = 108
ρ = 0.8
E = -3.04543660E+02
T = 9.36468819E-01
box = 5.1299278 # printed in XYZ
rcut = 0.5*box
rdf = np.loadtxt("RDF")
y = np.array([r*r*LJ(r,1.0,1.0)*y for r,y in zip(rdf[:,0],rdf[:,1])])
Uc = 2.0*np.pi*N*ρ*integ.simpson(y=y,x=rdf[:,0])
Ekin = 1.5*T*N
print(Uc+Ekin,E)
# the two energies are different cause we use the wrong potential.. the code uses the shifted version

[ ]:

def LJs(r,σ,ε,a,b):
    return 4.0*ε*((σ/r)**12-(σ/r)**6) + a*r - b

N = 108
ρ = 0.8
E = -3.04543660E+02
T = 9.36468819E-01
box = 5.1299278 # printed in XYZ
rcut = 0.5*box
a = 24.0 * (2.0 * rcut**(-12) - rcut**(-6)) / rcut
b = 4.0 * (rcut**(-12) - rcut**(-6)) + a * rcut
rdf = np.loadtxt("RDF")
y = np.array([r*r*LJs(r,1.0,1.0,a,b)*y for r,y in zip(rdf[:,0],rdf[:,1])])
Uc = 2.0*np.pi*N*ρ*integ.simpson(y=y,x=rdf[:,0])
Ekin = 1.5*T*N
print(Uc+Ekin,E)

Exercise 3: Statistical Analysis of the Thermodynamic Data¶

The program ljmd writes the thermodynamic data into the file STA. The data consist of the system energy, temperature and pressure. In this exercise you will firstly use standard statistical methods to obtain the average of each of these and the fluctuation, and then attempt to get an accurate measure of the error in the mean value, using a standard method known as ‘blocking’. In the next exercise you will use time correlation functions to analyse the time dependence of the fluctuations.

Firstly, run the ljmd program using the original MDIN file to obtain the STA file, which will have data for 10,000 time steps. The ljmd code will give the averages of the thermodynamic variables over the time steps beyond the equilibration period. Make a note of these for later.

Next, locate the program BLOCK and compile it using:

[ ]:

!./ljmd < MDIN

[ ]:

!gfortran -o block block.f90

then run the program with the command:

[ ]:

%%bash

./block <<< 1

The program will automatically locate the STA file and ask you to select one of the data columns buy entering an integer value:

for the energy
for the temperature
for the pressure.

Select the energy first. Very quickly the program will provide the average energy and a series of estimates of the standard error in the mean value and also the error in the estimate of the standard error. These will be displayed on-screen, and also saved in a file ERR. The significance of the second number will become clearer below. Now do the following:

Check the average against that obtained from ljmd. It will be different, and you should know why! You will need to edit the STA file to fix this. Do so, and rerun BLOCK. Now the average value should be the same.

With regard to the standard error estimates you should know that the first estimate is simply \(\sigma / \sqrt{n}\) , where \(\sigma\) is the standard deviation of the n energy values you have averaged over. Note how small this is! - this is expected for the energy in these simulations (why?). This is not a good estimate of the error, because data taken from consecutive data points are highly correlated (though this fact does not affect the average).

The next estimate of the standard error follows a ‘blocking transition’ of the data. This is covered by the lecture course. Meanwhile look at the BLOCK source code in block.f90 and see what this operation amounts to. Note that the transition effectively halves the number of data points, and the error in the estimated standard error is therefore larger. Repeating the blocking transition many times produces a series of estimates of the standard error, each less correlated than the last. Ideally after many blocking transitions the estimated error should converge to the true standard error, but the increasing uncertainty in the value often masks the convergence. The trick is to pick out the converged value from the increasingly noisy data.

The best estimate of the standard error may be obtained from a plot of the estimated standard error as a function of blocking number. Try plotting the file ERR (which contains the error bars for each standard error value). If you cannot estimate the converged value with the help of the error bars, the best you can do is take the largest value as the best estimate. Whatever you do however, the value you obtain will be much larger than the original estimate before applying the blocking transitions. This is the main lesson of this exercise.

Repeat the exercise for the temperature and pressure data. These two have genuine thermodynamic fluctuation (not just random noise from the integration algorithm) and should give better convergence.

[ ]:

%%bash

#code to repeat runs

[ ]:

plotFileErr("ERR",xlabel="x",ylabel="y",title="my title")

Exercise 4: TIME Analysis of the Thermodynamic Data¶

Molecular dynamics provides time dependent data, as well as average properties. In this exercise you will analyse the time dependence of the thermodynamic data using correlation functions. We shall concentrate on autocorrelation functions (or self correlations) in this exercise.

Formally an autocorrelation function is written as:

\[c(t) = \langle (a(0) - \langle a \rangle ) (a(t) - \langle a \rangle ) \rangle\]

in which c(t) is the autocorrelation function, a(t) is the variable concerned and <a> is its average value. The angular brackets formally mean an ensemble average; an average over many replicas of the system in the same thermodynamic state. In molecular dynamics, the equivalent of an ensemble average is an average over time origins:

\[c(t) = \frac{lim}{T \rightarrow \infty} \frac{1}{T} \int_{0}^{T} (a(u) - a_{av}) (a(t + u) - a_{av}) du\]

The integral over the time variable u is an integration over time origins. (\(a_{av}\) is the average of a(t) over the interval 0-T.) In terms of discrete variables (i.e. ones samples at fixed time steps this becomes a sum:

\[c_{k} = \frac{\Delta t}{N - k} \sum_{n=0}^{N-k-1} (a_{n} - a_{av}) (a_{n+k} - a_{av})\]

where N is the number of data samples, and the indices relate to the time step number. The quantity \(a_{av}\) is now the average of N samples of variable a.

To do this exercise, first run the ljmd program to generate a STA file, if you don’t have one already available. Use the supplied MDIN file as input. Next locate the AUTO program and take a look at the source code auto.f90. As its name implies this is an autocorrelation program; it correlates a particular variable with itself. The source auto.f90 is a short program, so please take a look at it to see how it works.

Compile the AUTO using the command:

[ ]:

!gfortran -o auto auto.f90

Then you may run the program using the command

[ ]:

%%bash

./auto <<< 2

This will ask you which of the three thermodynamic quantities in the STA file you want to process. Choose either temperature (2) or pressure (3). The program will then read the data in the file STA and in about a second and will write the file COR, which supplies the correlation function. It will be useful to plot this function graphically. (Note it is often the practice to \(normalise\) a correlation function to it has value 1.0 at zero time.)

You may also consider plotting the correlation on a log-linear scale (take the logarithm of the time coordinate – use logscale x in gnuplot for instance). This will show more clearly the time scale over which the correlation function decays; that dC(t)/dt is zero at the origin; and that C(t) is not necessarily an exponential decay.

[ ]:

plotFile("COR",xlabel="x",ylabel="y",title="my title")
plotFileLog("COR",xlabel="x",ylabel="y",title="my title")

The autocorrelation function has many uses. Most simply it gives an indication of the relaxation time of the processed variable: how long it takes for a fluctuation (or perturbation) from the mean value to decay to the average again. Can you estimate roughly what the relaxation time is for the variable you chose? Is the plot long enough in time to provide this information? – If not, change the parameter ndiv in auto.f90 to something larger and try again. Can you devise a method for obtaining the relaxation time accurately, using integration perhaps, or curve fitting? Run ljmd again at a different density or temperature to see what impact this has on the form of the relaxation curve.

Now suppose you suspected that the fluctuations in pressure and temperature were somehow complimentary – perhaps the fluctuations in pressure followed those in temperature by a short interval in time. To test this you would need to construct a cross correlation function – in which a variable a was multiplied by a variable b from a different time. Adaptation of auto.f90 (to make a program cross.f90) is simple. Make the changes you think are necessary and attempt to calculate the cross correlation function.

[ ]:

!gfortran -o cross cross.f90

[ ]:

! ./cross <<< 2 <<< 3

[ ]:

plotFile("CROSSCOR",xlabel="x",ylabel="y",title="my title")

Finally, use the program ACFT (file acft.f90). This presents another way of calculating the autocorrelation function, using discrete Fourier transforms. It is based on the theorem that the Fourier transform of a correlation function is simply the product of the Fourier transforms of the two functions involved. i.e.

\[c(t) = \int A(f)A^{*}(f) \exp(2 \pi i f t) df\]

Where A(f) is the Fourier transform of a(t) and \(A^{*}(f)\) is a complex conjugate.

Take a look at the source code for acft.f90. Note how the calculations are performed. Most crucially, note the doubling of the arrays by appending a string of zeros. This is to stop aliasing in the Fourier transform. Then compile and run the code. The answers should compare very well with the method in auto.f90.

Why bother with the Fourier transform approach? Two good reasons – Firstly it is possible to use Fast Fourier Transforms (FFTs) to process very large amounts of data very quickly (much faster than the method given in auto.f90). Secondly, the Fourier transform of a correlation function (also called its power spectrum) is itself a valuable function that displays the frequency dependence in the fluctuations. This method gives the power spectrum for free!

[ ]:

! gfortran acft.f90 -o acft.x

[ ]:

! ./acft.x <<< 1

[ ]:

plotFile("COR",xlabel="x",ylabel="y",title="my title")

Exercise 4: The mean Squared Displacement¶

Calculating the mean squared displacement (MSD) is one of the most common, and important, operations in molecular dynamics and is invaluable in calculating diffusion. It is defined (for a system of N particles) by the ensemble average

\[R^{2}(t) = \Big \langle \frac{1}{N} \sum_{i=0}^{N} | \vec{r}_{i}(t) - \vec{r}_{i}(t_{0}) | ^{2} \Big \rangle\]

where t is the time and \(t_{0}\) is the time origin, from which the accumulation of data begins.

In this exercise you will calculate the MSD for the Lennard Jones system. Firstly open the MSD program and examine the source code in the file msd.f90. The program is designed to read the atomic position data in a TRJ file produced by the program ljmd and calculate the mean squared displacement. You should spend a few minutes looking at how the program works. Notice that it does not rely on a single time origin for the calculations, but uses each new configuration as a new time origin, so that it effectively calculates many MSDs at the same time. This has the effect of smoothing the final average MSD quite considerably, by damping down the effects of fluctuation.

Proceed as follows. First run the ljmd program with the option for producing the TRJ file switched on in the MDIN file (see the instructions for the ljmd program). Also be sure to set the option to produce positions rather than velocities in TRJ, and it is best to increase the sampling interval specified in MDIN to (say) 50 time steps. Use a density of about 0.7 and a temperature of 1.0. Run ljmd for 10000 time steps.

Next compile the MSD program using:

[ ]:

!./ljmd < MDIN

[ ]:

!gfortran -o msd msd.f90

and run it using the command:

[ ]:

!./msd

The program will run quickly and produce a file called MSD, which should be plotted. If the ljmd run was long enough and sampled at an appropriate interval, the plot should be a good straight line at long time, and close to the origin may reveal a short parabolic curve, before linearity sets in. (If the sampling interval is large this may not be obvious. If the sampling interval is too short, you may not see a linear portion).The onset of linearity signifies that the particle motion has given way to random walk, signifying diffusion. The slope of the linear portion provides the diffusion coefficient D, through the well known Einstein relation:

\[R^{2}(t) = 6Dt + C\]

Where C is a constant. Once you have obtained a reasonable MSD plot, you should go on and produce a series of MSD curves for different densities and temperatures. Keep a record of your results for comparison with the results of the next exercise. You should also familiarise yourself with the difference between the MSD in a solid and in a liquid. It is useful to think about the results in the light of what you know about the RDFs for the systems concerned and attempt to relate diffusion to structure.

It is useful to plot the MSD on a log-log scale (take the logarithm of t and \(R^{2}(t)\) ). What this will hopefully show is the switch over from ballistic motion to the diffusive regime. At short times \(R^{2}(t)\) scales with \(t^{2}\) (why?), but in the diffusive regime \(R^{2}(t)\) scales with t, so the plot should show the slope switching over from 2 to 1 (ask if you don’t follow this).

[ ]:

plotFileLog("MSD",xlabel="x",ylabel="y",title="my title")

Exercise 5: The Velocity Autocorrelation Function¶

The velocity autocorrelation function (VAF) is another valuable basic property of atomic systems. It sheds light on the relaxational processes in atomic dynamics and can be used to estimate the diffusion coefficient. Its power spectrum (Fourier transform) is useful to interpret infra red spectroscopy experiments. Formally it is written (for N particles) as:

\[Z(t) = \Big \langle \frac{1}{N} \sum_{i=0}^{N} \vec{v}_{i}(t) \cdot \vec{v}_{i}(0) \Big \rangle\]

Open the vcor program and examine the source code in vcor.f90. Like the MSD program above, it reads data from the TRJ file and uses multiple time origins to obtain the final output function. Compile the code using the command:

[ ]:

!gfortran -o vcor vcor.f90

Before running the program, you need a suitable TRJ file – in this case one with velocity data rather than positions. Run ljmd with temperature 1.0, density 0.7, sampling interval 5 and the trajectory option set to produce velocities. When a TRJ file has been produced, run vcor using the command:

[ ]:

!./ljmd < MDIN

[ ]:

!./vcor

This will run for a few seconds and produce a file VAF, which you should plot. What you will see is a function that decays from an original value of 1.0 at short time to a negative value which then gradually decays, with some undulation, towards zero. The point at which the plot goes negative is the interparticle collision time. The plot shows that a moving atom is very quickly made aware of the influence of nearby atoms, which significantly alter its original momentum, even causing it to reverse in a large proportion of cases.

You should now go on to produce a selection of VAFs for systems at different temperatures and densities – to make yourself familiar with the behaviour of the VAF plot under different conditions. The differences between solids and high and low density liquids should be explored.

[ ]:

plotFile("VAF",xlabel="x",ylabel="y",title="my title")

[ ]:

%%bash

your code to run many densitie or use a console

Some more experiments you can do are:

Try integrating the VAF function numerically to obtain the diffusion constant:

\[D = \int_{0}^{\infty}Z(t)dt\]

Compare the result with that obtained from the MSD above. Note this only works reasonably well if the velocity autocorrelation function decays to zero in a reasonable time.

It can be shown that the MSD and VAF are related through the integral:

\[R^{2}(t) = 6t \int_{0}^{t} (1 - s/t)Z(s)ds\]

(Can you prove this?) This interesting integral offers the possibility of accounting for the short time part of the MSD plot, where nonlinear behaviour is observed. Using the VAF you have obtained, try calculating the MSD with this formula. (You may be able to do this with your graph plotter, but if not, write a little program. Note also that since the VAF is normalised to 1 at time zero, you must re-multiply Z(t) by <V(0)V(0)>, which is of course the mean square velocity of a single particle in 3 dimensions).

To get a good comparison with the MSD, you should recalculate the MSD, this time setting the sampling interval in the MDIN file to be the same as that used in the VAF calculation. Then the two results can be compared over the same time scale. What this will show, is that the nonlinear part in the MSD has a dynamical origin, while the linear part comes from random behaviour – the random walk. Or in other words, the linearity of the MSD sets in only when the VAF has lost all correlation. Also if you plot the MSD on a log-log scale, you will find that the nonlinear part of this plot spans the time scale for which the correlation function is finite, as it must.

[ ]: