Documentation

From OxDNA
Jump to navigation Jump to search

Compile options

Compiling oxDNA requires that you change the first rows in the makefile to match your machine configuration. The following parameters can be passed to make:

  • dbg=1 oxDNA will be compiled with debug flags (both for nvcc and gcc). The resulting executable will be put in the Debug directory.
  • g=1 oxDNA will be compiled with both debug and optimization flags. The resulting executable will be put in the Release directory.
  • intel=1 oxDNA will be compiled using the Intel icpc compiler. The resulting executable will be named oxDNA_intel.

Usage

oxDNA input_file

The input file contains all the relevant information for the program to run, such as what initial configuration to use, the topology of the system, how often to print the energies to a file, etc. Please make sure you read the thermostat page if you use molecular dynamics.

Input file

As always in UNIX environments, everything is case sensitive. The options are in the form key = value. There can be arbitrary spaces before and after both key and value. Line with a leading # will be treated as comments. In this part | (pipe) is the separator between the different values that can be used to specify a value for the key. Keys between [ and ] are optional, the value after the equal sign is the default value.

Generic options

The options listed here define the generic behavior of the entire program.

[sim_type=MD]
MD|MC
MD = Molecular Dynamics, MC = Monte Carlo
backend
CPU
backend_precision
float|double
[debug=0]
0|1
1 if you want verbose logs, 0 otherwise.

Simulation options

The options listed here specify the behaviour of the simulation.

steps
number of steps to be performed.
[restart_step_counter=0]
0|1
0 means that the step counter will start from the value read in the configuration file; if 1, the step counter will be reset to 0. The total duration of the simulation is unchanged.
[seed=time(NULL)]
seed for the random number generator. On Unix systems, it will use by default a number from /dev/urandom + time(NULL)
T
temperature of the simulation. It can be expressed in simulation units or kelvin (append a k or K after the value) or celsius (append a c or C after the value).
Examples:
Value Simulation Units
0.1 0.1
300 K 0.1
300k 0.1
26.85c 0.1
26.85 C 0.1
verlet_skin
if a particle moves more than verlet_skin then the lists will be updated. Its name is somewhat misleading: the actual verlet skin is 2*verlet_skin.
[use_average_seq=1]
0|1
specifies whether to use the default hard-coded average parameters for base-pairing and stacking interaction strengths or not. If sequence dependence is to be used, set this to 0 and specify seq_dep_file.
[seq_dep_file]
specifies the file from which the sequence dependent parameters should be read. Mandatory if use_average_seq=no, ignored otherwise. A sample file is provided (sequence_dependent_parameters.txt).
[external_forces=0]
0|1
specifies whether there are external forces acting on the nucleotides or not. If it is set to 1, then a file which specifies the external forces' configuration has to be provided (see external_forces_file).
[external_forces_file]
specifies the file containing all the external forces' configurations. Currently there are six supported force types (see EXAMPLES/TRAPS for some examples):
  • string
  • twist
  • trap
  • repulsion_plane
  • repulsion_plane_moving
  • mutual_trap

Molecular dynamics simulations options

dt
time step of the integration.
thermostat
no|refresh|john
no means no thermostat will be used. refresh will refresh all the particle's velocities from a maxwellian every newtonian_steps steps. john is an Anderson-like thermostat (see pt). Make sure you read thermostat.
newtonian_steps
required if thermostat != no
number of steps after which a procedure of thermalization will be performed.
pt
used if thermostat == john. It's the probability that a particle's velocity will be refreshed during a thermalization procedure.
diff_coeff
required if pt is not specified
used internally to automatically compute the pt that would be needed if we wanted such a self diffusion coefficient. Not used if pt is set.

Monte Carlo simulations options

[check_energy_every=10]
this number times print_energy_every gives the number of steps after which the energy will be computed from scratch and checked against the actual value computed adding energy differences.
[check_energy_threshold=1e-4]
if abs((old_energy - new_energy)/old_energy) > check_energy_threshold then the program will die and warn the user.
ensemble
NVT
ensemble of the simulation. More ensembles could be added in future versions.
delta_translation
maximum displacement (per dimension) for translational moves in simulation units.
delta_translation
maximum displacement for rotational moves in simulation units.

Input/output

The options listed here are used to manage the I/O (read and write configurations, energies and so on)

conf_file
initial configuration file.
topology
file containing the system's topology.
trajectory_file
the main output of the program. All the configurations will be appended to this file as they are printed.
[confs_to_skip=0]
valid only if conf_file is a trajectory. Skip the first confs_to_skip configurations and then load in memory the (confs_to_skip+1)th.
[lastconf_file=last_conf.dat]
this is the file where the last configuration is saved (when the program finishes or is killed). Set to last_conf.dat by default
[refresh_vel=0]
0|1
if 1 the initial velocities will be refreshed from a maxwellian.
energy_file
energy output file.
[print_energy_every=1000]
this will make the program print the energies every print_energy_every steps.
[no_stdout_energy=0]
0|1
if 1 the energy will be printed just to the energy_file.
[time_scale=linear]
linear|log_lin
using linear configurations will be saved every print_conf_interval.
using log_lin configurations will be saved logarithmically for print_conf_ppc times. After that the logarithmic sequence will restart.
print_conf_interval
linear interval if time_scale == linear. First step of the logarithmic scale if time_scale == log_lin.
print_conf_ppc
used if time_scale == log_lin
points per logarithmic cycle.
[print_reduced_conf_every=0]
every print_reduced_conf_every steps the program will print out the reduced configurations (i.e. confs containing only the centers of mass of strands).
reduced_conf_output_dir
used if print_reduced_conf_every > 0
output directory for reduced_conf files.
[log_file=stderr]
file where generic and debug informations will be logged. If not specified then stderr will be used.
[print_timings=0]
0|1
if 1 the MD step timing have be printed to a file.
timings_filename
used if print_timings == 1
output file where the MD step timing will be appended to.

Output files

  • The log file contains all the relevant information about the simulation (specified options, activated external forces, warnings about misconfigurations, critical errors, etc.). If the log file is omitted, all this information will be displayed on the standard output.
  • The energy file layout for MD simulations is
time potential energy kinetic energy total energy hydrogen bonding energy
while for MC simulations is
time potential energy hydrogen bonding energy acceptance ratio for translational moves acceptance ratio for rotational moves
N.B. potential, kinetic and total energies are divided by the number of particles whereas the hydrogen bonding energy is not.
  • Configurations are saved in the trajectory file.

Configuration and topology files

The current state of a system, as specified by oxDNA, is described by two files: a configuration file and a topology file. The configuration file contains all the general information (timestep, energy and box size) and the orientations and positions of each nucleotide. The topology file, on the other hand, keeps track of the backbone-backbone bonds between nucleotides in the same strand. Working configuration and topology files can be found in the EXAMPLES directory.

Configuration file

The first three rows of a configuration file contain the timestep T at which the configuration has been printed, the length of the box sides Lx, Ly and Lz and the total, potential and kinetic energies, Etot, U and K, respectively:

t = T
b = Lz Ly Lz
E = Etot U K

after this header, each row contains position of the centre of mass, orientation, velocity and angular velocity of a single nucleotide in the following order:

Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \overbrace{r_x r_y r_z}^{\rm Position} \overbrace{b_x b_y b_z}^{\rm Backbone-base versor} \overbrace{n_x n_y n_z}^{\rm Normal versor} \overbrace{v_x v_y v_z}^{\rm Velocity} \overbrace{L_x L_y L_z}^{\rm Angular velocity}}

Topology file

The topology file stores the intra-strand, fixed bonding topology (i.e. which nucleotides share backbone links). The first row contains the total number of nucleotides N and the number of strands Ns:

N Ns

After this header, the i-th row specifies strand, base and 3' and 5' neighbors of the i-th nucleotide in this way:

S B 3' 5'

where S is the index of the strand (starting from 1) which the nucleotide belongs to, B is the base and 3' and 5' specify the index of the nucleotides with which the i-th nucleotide is bonded in the 3' and 5' direction, respectively. A -1 signals that the nucleotide terminates the strand in either 3' or 5' direction. The topology file of a strand of sequence GCGTTG would be:

6 1
1 G -1 1
1 C 0 2
1 G 1 3
1 T 2 4
1 T 3 5
1 G 4 -1

Specifying the topology in this way can simplify the process of simulating, for example, circular DNA.

Generation of initial configurations

In order to generate initial configuration and topology files, we provide the ${oxDNA}/UTILS/generate-sa.py script. The usage of the script is

generate-sa.py <box side> <file with sequence>

where <box side> specifies the length of the box side in simulation units and <file with sequence> contains the sequence of the strands to be generated, one row per strand. If double strands are needed, each sequence must be preceded by DOUBLE. For example, a file containing

DOUBLE AGGGCT
CCTGTA

would generate a double strand with a sequence AGGGCT and a single strand with a sequence CCTGTA. The sequences are given in 3'-5' order.

Positions and orientations of the strands are all chosen at random in such a way that the resulting initial configuration does not contain significant excluded volume interactions between nucleotides belonging to different strands. Generated single- and double-strands have helical conformations (i.e. they are in the minimum of the intra-strand interaction energy).

The output configuration and topology are stored in generated.dat and generated.top, respectively. Since this script will initialize nucleotides' velocities and angular velocities to 0, when performing a molecular (or Brownian) dynamics simulation remember to put refresh_vel = 1 in the input file.

Analysis of configurations

The configurations produced by oxDNA can be analysed with the output_bonds program in ${oxDNA}/UTILS/process_data/ directory. This program takes an input the input file (to recover the temperature and topology file), a configuration/trajectory file and an optional number. Since output_bonds reads analyses a single configuration, the optional number selects the configuration which it needs to analyse in the trajectory. Analysing a whole trajectory can be done by looping over a counter.

Please note that output_bonds is not compiled automatically. If you never compiled it, do so as described in the installation instructions.

output_bonds can be used as follows:

${oxDNA}/UTILS/process_data/output_bonds <input_file> <trajectory_file> [counter]

The program outputs some debugging information to the standard error and information regarding the interaction energies to the standard output. The contributions arising from each of the terms in the potential (see the appendix of Ref. 2) are reported for each pair of nucleotides that have non-zero total interactions.

This output can be easily parsed to analyse the configurations.

For each pair of nucleotides that do interact in the configuration, the program prints out a line containing:

  • The id of the two particles (starting from 0)
  • The total interaction energy
  • The hydrogen bonding (base pairing) energy
  • The stacking energy
  • The cross stacking energy
  • The excluded volume energy
  • The FENE interaction energy
  • A letter indicating a status code. This will be N for pairs that interact through bonded interactions (i.e. they are neighbors along a strand) and it will be H when a base pair is present. Our definition of base pair is when two nucleotides have a hydrogen bonding energy less than 0.1 in simulation units (see Ref. 2).

Geometry of the Model

In the configuration/trajectory files only the positions and orientations of the nucleotides are stored. If one wants to recover the positions of the individual interaction sites in the model, some maths need to be done.

The position of the base, stacking and backbone sites can be recovered as follows:

base site: (center) + 0.40 * (axis vector)

stacking site: (center) + 0.34 * (axis vector)

backbone site: (center) - 0.40 * (axis_vector)

The picture in the introduction might help understanding where the sites are.

External Forces

The code implements several types of external forces that can be imposed on the system that can be used either to simulate tension exerted on DNA or simply to accelerate the formation of secondary or tertiary structure. External forces can be tricky to treat, especially in a dynamics simulation, since they are an external source of work. Care should be taken in adjusting the time step, thermostat parameters and such.

To enable external forces, one needs to specify external_forces = 1 in the input file and also supply an external force file to read from with the key external_forces_file = <file>.

The syntax of the external forces file is quite simple. Examples of such files can be found in the hairpin formation and Pseudoknot formation examples. Each force is specified within a block contained in curly brackets. Empty lines and lines beginning with an hash symbol (#) are ignored. Different forces require different keys to be present. If the file has the wrong syntax, oxDNA should spit out a sensible error message while parsing the file.

The different types of forces implemented at the moment are:

  • harmonic trap
  • string
  • repulsion plane
  • mutual trap

All forces act on the centre of the particle.

Forces of different kinds can be combined in the same simulation. There is a maximum number of 10 external forces per particle for memory reasons. This can be manually overridden recompiling the code with a different value of the macro MAX_EXT_FORCES (currently 10) in defs.h.

String

A string is implemented as a force that does not depend on the particle position. Its value can be constant or can change linearly with time. It is useful as it does not fluctuate with time.

A force of this kind is specified with type = string. The relevant keys are:

  • particle (int) the particle on which to exert the force
  • F0 (float) the value of the force at time = 0 in simulation units (please note that the value of the time may or may not be reset when starting a simulation, depending on the input file)
  • rate (float) growing rate of the force (simulation units/time steps). Typical values are very small (< 10^(-5))
  • dir (3 floats separated by commas) direction of the force (automatically normalised by the code)

The following bit of code will create an external force on the first nucleotide in the system starting at 1 simulation units (48.6 pN) and growing linearly with time at the rate of 48.6pN every million time steps. The force will pull the nucleotide along the z direction.

{
type = string
particle = 0
F0 = 1.
rate = 1e-6
dir = 0., 0., 1.
} 

Harmonic trap

This type of force implements an harmonic trap, of arbitrary stiffness, that can move linearly with time. It can be useful to fix the position of the nucleotides to simulate attachment to something or to implement (quasi) constant extension simulations.

A force of this kind is specified with type = trap. The relevant keys are:

  • particle (int) the particle on which to exert the force
  • pos0 (3 floats separated by commas) rest position of the trap
  • stiff (float) stiffness of the trap (the force is stiff * dx)
  • rate (float) speed of the trap (length simulation units/time steps)
  • dir (3 floats separated by commas) direction of movement of the trap

Here is an example input for a harmonic trap acting on the third nucleotide constraining it to stay close to the origin. In this example the trap does not move (rate=0), but one could have it move at a constant speed along the direction specified by dir, in this case the x direction.

{
type = trap
particle = 2
pos0 = 0., 0., 0.
stiff = 1.0
rate = 0.
dir = 1.,0.,0.
}

Please note that the trap does not comply with periodic boundary conditions. This is most likely what you want.

Repulsion plane

This kind of external force implements a repulsion plane that constrains a particle (or the whole system) to stay on one side of it. It is implemented as a harmonic repulsion, but the stiffness can be made arbitrarily high to mimic a hard repulsion.

A force of this kind is specified with type = repulsion_plane. The relevant keys are:

  • particle (int) the particle on which to exert the force. If set to the special value -1, the force will be exerted on all particles.
  • stiff (float) stiffness of the trap (the force is stiff * D, where D is distance from the plane. The force is exerted only if the nucleotide is below the plane)
  • dir (3 floats separated by commas) a direction normal to the plane
  • position (1 float number) specifies the position of the plane

If direction is direction = u,v,w , then the plane contains all the points (x,y,z) that satisfy the equation: u*x + v*y + w*z + position = 0. Only nucleotides with coordinates (x,y,z) that satisfy u*x + v*y + w*z + position < 0 will feel the force. The force exerted on a nucleotide is equal to stiff * D, where D is the distance of the nucleotide from the plane, where For nucleotides for which u*x + v*y + w*z + position >= 0, no force will be exerted.

Here is an example. This plane acts on the whole system and will not exert any force on nucleotides with a positive x coordinate. A force proportional to 48.6 pN * (x coordinate) will be exerted on all particles .

{
type = repulsion_plane
#whole system
particle = -1
stiff = 1. #48.6 pN in simulation units
dir = 1, 0, 0
position = 0
}

If in the above example you would specify position = 3, then the force would be exerted on all nucleotides with coordinate x > -3.

Mutual trap

This force is useful to form initial configurations. It is a harmonic force that at every moment pulls a particle towards a reference particle. It is possible to specify the separation at which the force will be 0.

A force of this kind is specified with type = mutual_trap. The relevant keys are:

  • particle (int) the particle on which to exert the force.
  • ref_particle (int) particle to pull towards. Please note that this particle will not feel any force (the name mutual trap is thus misleading).
  • stiff (float) stiffness of the trap
  • r0 (float) equilibrium distance of the trap.

Here is an example, extracted from the pseudoknot formation example. This will pull particle 14 towards particle 39, favouring an equilibrium distance of 1.4 (which corresponds roughly to the minimum of the hydrogen bonding potential, not a coincidence). The same force with opposite sign is exerted on particle 39 through a separate force. It is not necessary to have both particles feel the force, but it is usually works much better.

{
type = mutual_trap
particle = 14
ref_particle = 39
stiff = 1.
r0 = 1.2
}

{
type = mutual_trap
particle = 39
ref_particle = 14
stiff = 1.
r0 = 1.2
}

Visualisation of structures

oxDNA produces a trajectory file where all the relevant information is stored. A converter is provided (traj2vis.py) in the UTILS directory that is able to produce files in the xyz and pdb formats. The same program can be used on a configuration file and it will produce a snapshot.

Since the model is coarse-grained, we have to "trick" the visualisers into thinking that the interaction sites in the model are actually atoms. Advanced nucleic acids representations such as ribbons will not work on the outputs.

All the images in the Screenshots page were produced with the pdb representation using UCSF chimera (see later on).

xyz format

just run

$oxDNA/UTILS/traj2vis.py xyz <trajectory> <topology> 

(where $oxDNA is the oxDNA source directory) to get the xyz representation in a file called the same as the trajectory file with .xyz appended. Please note that boundary conditions are implemented strand-wise, so strands that are bound might appear at two different sizes of the box. Also, the center of mass of the system (where each strand is weighted the same regardless of the length) is set to 0 at each frame. Carbons represent the backbone sites and oxygens the base sites.

The resulting file can be read with a variety of programs. Here we will explain how to visualise it sensibly in VMD.

  • Run VMD and load the xyz file.
  • In the graphics menu, go to Representations.
  • In the Selected Atoms line, input name C. Also select Drawing method CPK, sphere scale 0.8 and Bond Radius 0.
  • In the Selected Atoms line, input name O. Also select Drawing method CPK, sphere scale 0.6 and Bond Radius 0.

This should produce a ball representation of our model DNA. Bonds automatically produced by VMD are NOT meaningful in our context.

pdb format

Run

$oxDNA/UTILS/traj2chimera.py <trajectory> <topology> 

to produce a trajectory/configuration in the pdb format. A further file called chimera.com will be produced (more on this later). All comments above about periodic boundaries and centre of mass apply here as well.

The pdb file can be visualised in VMD just like the xyz format, but a nicer output can be produced with UCSF Chimera (although only for snapshots at the moment) as follows:

Run chimera and load the pdb file. An ugly output will be displayed.

Bring up the command line under the Tools → General Controls menu. Input read chimera.com in the command line and press enter. You should get a nicer visualisation with different bases in different colors, all the covalent bonds in the right place, etc.

On large configurations, the production of ellipsoids will be extremely slow. You can remove it by removing the line

aniso scale 0.75 smoothing 4

from the commands file. Loading the resulting file should be much faster.

UCSF chimera can in turn export the scene in a variety of formats.