Avoca for Tambo Users
This page shows some early advice on how to use the very new IBM BlueGene/Q Avoca at VLSCI. Please be aware that this information is very much under development.
Background
Avoca is a BlueGene/Q, similar in some way to VLSCI's existing BlueGene/P, Tambo. However, there are some differences and job scripts written for Tambo will not work on Avoca. We present here a few pointers to help early adopters migrate from Tambo to Avoca.
File Systems and User Account
Avoca is directly connected to the same high performance GPFS file system as Tambo (and Merri). Further, its connected to the same authorisation and project management system. So you can connect to Avoca with the same credentials you use to connect to Tambo and expect to find all your files there. Avoca works with the same quota management system as Tambo, any quota you expect to have in Q3 and Q4 of 2012 for example will pop up seamlessly on Avoca.
Recompile Your Code!
As for any new system you must recompile your code for Avoca, code compiled for our previous BlueGene "Tambo" will not run here as it is a newer architecture. The compiles are newer versions of the compilers on Tambo so you may get away without having to modify your Makefiles or other build scripts at all.
Requesting Resources
Avoca is far more flexible than its predecessor Tambo which was limited to 256 core blocks, on Avoca you can request as little as one node. However, to make good use of Avoca you must understand what you get in that one node. Each Avoca node has 16 cores and 16GB ram. Each core can support one, two or four threads. Purely from a compute perspective, running four threads per core on all 16 nodes might deliver the most computing for the Service Units you burn up. However, for MPI code for example, that implies 64 MPI Ranks each with only 256MBytes ram. Many problems will not like that! Conversely, running only one MPI rank per node (and getting all 16GBytes ram for it) may guarantee your job will run but possibly run quite slowly (and still use up 16 cpu hours or 4 Service Units per hour).
Choosing to run one MPI rank per core (ie 16 per node) or two (32 mpi ranks per node) is a good compromise.
Single node SMP codes sharing a 16GByte image is appropriate for SMP applications.
Job Launch
To simplify the task of writing job submission scripts we provide an interactive job script generator.
Your existing Tambo job scripts will not work on Avoca, the main problem being that a different command to launch compute codes in your script is needed. The new command you need to use is called "srun".
For instance, if on Tambo you started a NAMD job this way:
mpirun /usr/local/namd/2.8-xl-dcmf/bin/namd2 MyProtein.conf
then you could launch that same job as running 16 tasks per node (1 per core) as:
srun --ntasks-per-node=16 namd2 MyProtein.conf
You will probably want to adjust the number of mpi ranks and the number of Avoca nodes required, and the options you need to specify the threading will vary per application (if it is a multithreaded application).
For instance you could try running NAMD with 4 threads per process to use all 64 hardware threads thus:
srun --ntasks-per-node=16 namd2 +ppn 4 MyProtein.conf
If you needed more memory per core you could run 4 tasks per node (and so have 4GB per process) and then run 16 threads per task (to use 64 threads per node) thus:
srun --ntasks-per-node=4 namd2 +ppn 16 MyProtein.conf
You could even try running just 1 process per node so it can access all 16GB of RAM and then tell it to use 64 hardware threads thus:
srun --ntasks-per-node=1 namd2 +ppn 64 MyProtein.conf
IMPORTANT
Avoca is still in the very early stages of its life. If you see anything unexpected, something you don't understand or think is misleading, PLEASE let us know. Its quite likely a system problem that we can fix if we know about it, or, something we need to document more carefully to benefit other users.