
Programmer_guide

Copyright (C) 1998-2007 ABINIT group (XG,DCA) 
This file is distributed under the terms of the
GNU General Public License, see ~abinit/COPYING
or http://www.gnu.org/copyleft/gpl.txt .
For the initials of contributors, see ~abinit/doc/developers/contributors.txt .

NOTE : this file has NOT yet been updated for the response function features...
NOTE : this file has NOT yet been updated for structured datatypes ...


This is a brief programmer guide describing the 
structure of the main program (abinit) of the ABINIT package.
It is intended to provide some introductory guidance to 
programmers who may want to modify parts of the code.
You will find the code fairly well commented and should explore it 
to get more details than provided below.

The reader is assumed to have already gone through
the latest version of the following files : 
~abinit/doc/users/new_user_guide.html
~abinit/doc/users/abinis_help.html
~abinit/doc/users/context .

It is important that the reader know how to compile the code, and how to
run tests. This is described in detail in the installation notes on the
Web, that can also be found in the ~abinit/doc/install_notes
directory.

From now on, we assume that you are sufficiently familiarized with these
different points, and that you have sufficient experience
in the use of ABINIT.

The ABINIT group rules for coding in Fortran 90 are detailed in the
~abinit/doc/developers/rules_coding file. 
These rules are mostly code-independent.
Here we describe facts related specifically to the ABINIT package.

In order to allow programmers to develop different parts of
the code at the same time, while avoiding synchronisation problems,
a few rules have been sketched. See the 
~abinit/doc/developers/contributing.html file.

Specific facts related to parallelism in abinit
are explained in ~abinit/doc/developers/rules_paral

Structure of the present file :
A. A few facts
B. The skeleton of the code.
C. Debugging, timing and statistics facilities.
D. Utility subroutines.
E. Libraries.

**************************************************************************

A. A few facts, useful to know.

  The main routine is called abinit.F90 and is present
in the directory ~abinit/src/main.  
The rest of the subroutines called by abinit.F90
are in the other ~abinit/src or ~abinit/lib directories.
Other main routines (newsp.F90, mrgddb.F90, anaddb.F90, chi.F90, sigma.F90 ...) 
are present in ~abinit/src/main.

  The subroutines are splitted in two parts : those that come from
other packages, like Blas, Lapack, and other numerical routines or IO routines
; and those that have been written
directly for ABINIT. The first ones are found in the ~abinit/lib directories,
are often written in Fortran77 or C, and do not follow the coding rules of the ABINIT
project. The second ones are written in Fortran90 (there are two C routines,
for timing purposes), follow the coding rules, and are found in the
different ~abinit/src directories. At the time of writing, there are more 
than 300 subroutines, and about 100000 lines of code, including the 
library routines.

  Machine dependency (ibm rs6000 model 590, Pentium Pro, DEC alpha and
HP S-class and Exemplar, SGI Origin, Cray T3E ...) is accomodated by using 
the c preprocessor on ALL files in ~abinit/src , so the fortran compilations
are conducted by first preprocessing every file then passing the 
result to the compiler. See the ~abinit/doc/install_notes
directory and the ~abinit/doc/developers/use_cpp file.
The sequential and parallel versions are also produced by c preprocessing
a unique source file. The routines that differ in the sequential and
parallel versions of the code are found in the Src_seqpar
and Src_basis directory.
Other Src_* directories contains separately the routines for XC treatment, 
pseudopotential input, parsing of input file, those for the anaddb code,
and all the remaining (common) sources.

*****************************************************************
 
B. The skeleton of the code.
(to be updated for RF features)

One can distinguish 10 important routines, called levels :
(1) abinit  
(2) driver
(3) gstate
(4) move or (5) brdmin
(6) scfcv
(7) vtorho
(8) vtowfk
(9) cgwf
(10) getghc
The routine abinit.f calls driver.f, driver.f calls gstate.f, ...


B.1. abinit.f

The main routine, abinit, level 1, has the aim of reading completely
the input files, and checking whether the input variables
are sensible, and whether the available memory is sufficient.
These operations should be very fast, so that the user is
quickly warned whether his/her input are incorrect.
No big array is allocated in level 1, except for testing purposes.
In detail, abinit.f performs, or calls routines that perform :
- Eventually initialize MPI (for parallel runs) 
- Initialize overall timing of run
- Print greeting for interactive user 
- Read names of files (input, output, rootinput, rootoutput, roottemporaries),
- Create the name of the status file, initialize the status subroutine.
- Open output file and print herald at top of output and log files
- Read the input file, and store the information in a long string of characters
- Take ndtset and ntypat from the input string, then allocate
   the arrays whose dimensions depends only on ndtset, ntypat and msym.
- Finish to read the "files" file completely,
   and also initialize mproj and mpsang
- Continue to analyze the input string, and allocate the arrays needed for input.
- Provide defaults for the variables that have not yet been initialized.
- Call the main input routine, and finish the input variable initialisation.
- Echo input data to output file and log file
- Perform additional checks on input data

At this stage, all the information from the "files" file and "input" file
have been read and checked.

- Perform main calculation  (call gstate)

- Give final echo of coordinates, etc.
- Timing analysis
- Delete the status file, and, for build-in tests,
      analyse the correctness of results
- Write the final timing, close the output file, and write a final line
      to the log file
- Eventual cleaning of MPI run


B.2. driver.f

In driver, level 2, a loop on the data sets is present. For each data set,
either the ground state subroutine (gstate.f) or 
the response function subroutine (respfn.f) is called (to be described
in a future version of this file). 
A few big arrays are allocated at that level.


B.3. gstate.f

The routine gstate.f , level 3, performs a variety of initialisation tasks
and result analysis, for which different routines are called. 

A variety of arrays are computed or initialized in subroutine
setup1.f .  Then, based on the geometrical data input and the
values of k points and ecut, the basis sphere of planewaves is computed
by kpgio.f (that calls kpgsph.f and boundy.f).  
Then header information is written to a file, for 
use in constructing output wavefunction files, see headwr.f . These wf files 
contain a description of various input settings which were used to create
them. Other routines related to the header, and used later,
are headcopy.f , headck.f , and headlv.f , as well as pspini.f and clnup1.f .

Next, all the pseudopotential files needed for the calculation are read.
Subroutine pspini.f controls this part, and calls, for each atom type, the
routine pspatm.f , that will call different routines 
(psp1in.f, psp2in.f, psp3in.f, psp5in.f, psp6in.f), according
to the pseudopotential file format.
Various transforms of the input psp data are taken
(bessel function transforms) relevant to the local and nonlocal parts of 
the potential (psp1lo.f, psp1nl.f, psp2lo.f, psp2nl.f, psp3lo.f, psp3nl.f,
psp5lo.f, psp5nl.f) and the non-linear XC core-correction (psp1cc.f, 
psp4cc.f, and psp6cc.f) . 

The wavefunctions are initialized (read or set to random numbers) in inwffil.f .
If they are to be initialized at random,
or if the simple reading of an existing wf file is needed, then the
routine initwf.f is called by inwffil.f . If some work
has to be done of the existing wavefunctions,
the operation is more delicate, and inwffil.f needs to call the routine
newkpt.f, that calls different other routines :
- listkk.f (to find the closest k point)
- kpgsph.f (generate list of plane waves)
- sphere.f (to translate plane wave coefficients from one cut-off sphere
     to another)
- envlop.f (multiply by an envelope the random coefficient, to reduce their
     kinetic energy)
- orthon.f (to orthonormalize the wavefunctions)

The symmetries are initialized in setsym.f,
occupation numbers might be computed in newocc.f if needed. 

Then, the code computes a starting density and screening potential,
either from the existing wavefunctions (mkrho.f), or from the characteristics
of the pseudopotential (initro.f), or by reading a file (ioarr.f).

At this point, the code either pursues a fixed atom calculation (level 6)
or a moving atom calculation (levels 4 or 5).

After these calls, when the big calculations are done, gstate.f continues
by printing results, closing files, deallocating arrays, and return the
control to driver.f


B.4. and B.5.  move.f, brdmin.f and moldyn.f

For fixed atoms, gstate.f calls directly scfcv.f
(self-consistent field convergence) ;
for molecular dynamics using Numerov's predictor, it calls move.f ; 
for Broyden structural optimization,
it calls brdmin.f (Broyden minimization);  
for molecular dynamics using Verlet's algorithm, is calls moldyn.f.
In all three cases, the routines call many times scfcv.f, which
controls update and mixing of the density and potential and generates
forces for a given arrangement of atoms. With these data, the
molecular dynamics or the geometry optimization can be performed.


B.6. scfcv.f

This routine performs the SCF loop. A few arrays, needed for that
purpose, are allocated there.
Inside the loop, scfcv.f calls :
- setvtr.f, usually only at the initialisation step, to set a first 
    trial potential ;
- vtorho.f, level 7, to get the density from the trial potential ;
- vresfo.f, to get the potential residual, the forces, and components
    of the energy ;
- newvtr.f, to precondition the potential residual, and compute the
    new trial potential.
The computation of Hartree and XC potential is done inside setvtr.f and
newvtr.f, by calling the routine rhohxc.f .
That routine, in turn, calls hartre.f , for the Hartree potential, and
many different routines for the XC potential, depending on the
different XC functionals, and the intxc option (xcden.f, xchelu.f, xcpbe.f,
xcpot.f, xcpzca.f, xcspol.f, xctetr.f, xcwign.f, xcxalp.f)

After the loop, scfcv.f computes the stress, by calling stress.f ,
and also eventually print density, potential, or other files.


B.7. vtorho.f

Subroutine vtorho.f (potential -v- to density -rho-, level 7) 
produces the density in a fixed potential, 
by summing all contributions of different k points and eventually 
different spins.  Forces are recomputed after each 
pass in all k points. 
Parallelism is implemented at the level of concurrent
treatment of each k-point separately, in vtorho.f .


B.8. vtowfk.f

Subroutine vtowfk (potential -v- to k-point wavefunctions, level 8)
is called to improve the wavefunctions
over all bands at a single k point at a time. It gives also
the contributions of each band to kinetic energy and non-local
energy. In the case of fixed occupancies, it gives the
contribution of each k point to the density.   
Subspace diagonalization, and orthogonalisation
is done within vtowfk, and might be time-consuming.


B.9. cgwf.f and getghc.f

Subroutine cgwf (Conjugate-gradient on the wavefunctions, level 9), 
runs the iterative optimization of wavefunction
for a single band and k point, in a fixed potential.
It start from an existing wavefunction, either in central memory or
on a temporary file on disk, and refine it, finally writing in central
memory or on another temporary file on disk.
Deep within cgwf is a call to getghc (level 10), 
which computes <G|H|C> where |C> is the 
wavefunction.  This subroutine is the guts of the method.  Its time
is presently dominated by fft calls (about 50-60%), with the next 
bottleneck being the nonlocal operator (20-35%).

  You will find the code fairly well commented and may explore it further
to get more details than provided above.  

*****************************************************************

C. Debugging, timing and statistics facilities.

  The abinit code has been equipped with a set of tools for the
developers. These include :

1) The log file.
2) The prtvol input variable.
3) The status file.
4) The memory subroutine.
5) The time analysis backbone.
6) The statistics provided in the make.

As mentioned in the new_user_guide or in the abinis_help,
there are two general files for output : the "output" file and
the "log" file. When something goes wrong in the code, without
causing the code to crash, the log file will mention the name
of the routine where something went wrong, and what went wrong.
Usually, corrective actions are suggested.
The output of messages is handled thanks to wrtout.f , called when
the message has been packed in a character string (usually called 'message').
When something has gone wrong, the exit is to be done by a call to the
leave_new.f subroutine.

The use of the prtvol input variable in conjunction with
the log file is the most important tool for debugging.
As indicated by its name, prtvol controls the print volume in the
output file and in the log file. 
When equal to 0, the information in the log file is kept at the minimum.
When equal to 1, the information is already much more complete.
Even much more flexibility is gained when prtvol is used with negative values.
These negative values each refer to one the levels of the code
(i.e. prtvol=-10 refer to debugging of getghc, prtvol=-7 to debugging
of vtorho). When debugging some level through the use of the corresponding
negative prtvol value, the amount of data written on the log file,
coming from this level of the code, will increase dramatically. Moreover,
after the first execution of the complete level, the code will automatically
stop (except for the levels 1 and 3, that stop BEFORE entering
the next level). 

The status file is another important tool for debugging, especially 
because of the UNIX pecularity (when running from a script) 
that the outputs are not immediately 
written in a file, but kept in a buffer, unless this file is closed.
When the code crashes (for example with a message "segmentation fault"),
it is difficult to know at which place the segmentation fault happened,
from the log file.
The status file is a very short file that, depending on the
value of the input parameter istatr, can be opened, rewound, written,
and closed very frequently. This is done by calls to the status.f 
subroutine. Due to its frequent closing, 
it can indicates precisely where a crash just happened. 
( Note : if the status file is situated on a disk that is "local" to 
the cpu where the job is run,
the whole operation is usually less than 0.2 msec. On a remote disk 
(NFS), the operation is 10 times more consuming. These data may differ
from machine to machine : on a Cray-T3E the I/O operations are 
relatively slow. In order not to cause troubles, in the default mode
the value of istatr is relatively large, causing the file not to be
often updated. On some machines and depending on the disk access, 
using a small value of istatr -under 5- will cause the code to crash)

The memory.f subroutine is a place where the memory space needed for the
code is estimated, shortly after reading the input. 
The subroutine will immediately try to allocate as much as memory space
as estimated, and send an error message if not possible
(on the P6, this operation makes the job crash in case there is not enough
memory, but it is not difficult to understand what is the problem, thanks
to the status file). This memory estimation is hand-coded (this is very boring !).
The precise description of the allocation in the most critical 
subroutines can constitute a help for the optimization of memory usage.

Another help for the optimisation is provided by the time analysis backbone.
Many important routines are timed internally, thanks to two
calls to the timer routine timepw.f (one call at the entrance, 
one call at the exit). A final call to the
routine timanalys.f provide a detailed analysis of the repartition of the
CPU and Wall clock time, in the critical subroutines, or in the
different levels of the code. Thanks to this tool, it is rather
obvious what parts of the code should be optimized, and also 
what is going wrong when the code is ported to a new machine.

The last feature useful for developing the code is provided by the
statistics of the make command, in the directory ~abinit.
This allows to make sure that no file is getting too big to be
easily manipulated, and indicate when a file is to be splitted
(see ~abinit/doc/developers/rules_coding).

Of course, it is important that adequate care is taken to implement
these features in newly developed parts of the code.
It is thus expected that the developer read the
subroutines wrtout.f, leave_new.f, timepw.f, and eventually
timanalys.f, status.f, and memory.f. It is advised also to read
the subroutine getghc.f (the latter, for an idea of the usage of
the prtvol=-level debugging option, there for prtvol=-10).

*****************************************************************

D. Utility subroutines.

Beyond the big main routines presented in section B, and the different
routines for timing, debugging and statistics of section C, other
routines in ~abinit/src may be worth to learn about...

There is a whole set of routines for the treatment of strings 
of characters (they should be described shortly in a next version): 
appdig.f, fappnd.f, inarray.f, incomprs.f, inread.f, inreplsp.f, 
inupper.f, subchr.f

Routines for numerical derivation and/or integration :
ctrap.f, der_int.f

Some Numerical functions 
- besjm.f : half-integer bessel functions
- derfc.f : complementary error function
- invcb.f : fast computation of a series of inverse cubic roots
- sincos.f : fast computation of a series of sine and cosine

Routines related to symmetries and brillouin zone (should be described) :
chkgrp.f, chkibz.f, cnstti.f, fixsym.f, irrzg.f, setsym.f, strsym.f,
sygrad.f, symatm.f, symchk.f, symdet.f, symg.f, symrhg.f, symzat.f


Vector operations :
- norm.f : normalize a vector
- normev.f : normalize a set of vectors, and fix the phases
- fxphas.f : fix the phase of a vector
- orthon.f : orthonormalize a set of vectors
- projbd.f : orthogonalize one vector to a set of other vectors
- sdirot.f : rotate a set of vectors by a unitary transformation

3x3 matrix inversion (integer and real) :
mati3inv.f, matr3inv.f

Other (should be described) :
clsopn.f, fxphas.f, hermit.f, iseq.f, isfile.f, mkkin.f, 
mkrdim.f, prmat.f, randac.f, xredxcart.f



*****************************************************************

E. Libraries.

As for the utility subroutines, the developer should be aware of the
routines available from the libraries, and use them instead of
coding something with the same purpose. 

The Lapack library contains a matrix diagonalizer, zhpev.f, that
is needed many times in the code. Presently, this is the only 
entry point in the Lapack library. The Blas routines are used
by Lapack, but are not directly called by ABINIT.
Only the specific subset of Lapack and Blas, needed to support
zhpev.f, is present in ABINIT.

The Numerical Recipes library contains :
- sorting routines (insort.f, isort2.f, sort2.f)
- a routine that computes the julian day number (julday.f)
- a function that returns a uniform random deviate between 0.0 and 1.0 (ran1.f)
- spline fitting routines (splfit.f and spline.f)
- (to be updated)

