
Contents:
Hydra is an IBM eServer 1350 Linux cluster with 129 nodes. A high-speed Myrinet 2000 optical fibre network connects 128 of the nodes, which are compact (1U) IBM xSeries 335 servers. The other node is used for cluster management.
Each node has dual 2.4 GHz Intel Xeon processors and 2 GBytes of RAM. The Xeon processors have 512 kBytes of L2 cache memory and are capable of two floating point operations (flops) per clock cycle if the SSE2 instructions are used, or else 1 flop per cycle. The theoretical peak speed of the cluster is 1228 GFlops, or 1.2 Tflops.
Hydra has access to a pool of storage which is shared amongst several of eResearch SA's facilities. Some of this is provided from a dedicated storage node. Another node is used as the front end (or host) for the cluster, for compiling code, testing programs and the submission of jobs.
In addition, each of the compute nodes on Hydra has 30 GB of local disk space for storing data and other files whilst a job is running. These files will be removed once a job has terminated.
System software
Compilers and parallel computing libraries
Libraries
Application software
BioInformatics
Engineering
Mathematics & Computing
Time on the machine is available to researchers at any of the South Australian universities through eResearch SA. Researchers at these universities who wish to use any of eResearch SA's facilities should complete the membership form.
Anyone else who is interested in using eResearch SA facilities should consult the Conditions of Use to determine how best to gain access to the machine.
Before using the machine, you must read this User Guide. If you have any questions about the machine or its usage, email the system administrators at the eResearch SA Service Desk.
The host or front end for the cluster (the machine that you log in to, ie hydra.eresearchsa.edu.au) is to be used only for program development and compilation. DO NOT attempt to run any jobs on the front end.
You must use the Torque job queuing system to submit jobs to the compute nodes of Hydra. See Running jobs for information on how to do this.
NOTE: Users who violate the above policy may have their accounts disabled.
To use the cluster, you should log in to the front end, called hydra.sapac.edu.au, which has dual 2.4 GHz Intel Xeon processors running Linux (Centos 5). The front end should be used for compiling, debugging and testing your program code, and for submitting production jobs to the cluster. You should never need to log on to a node of the cluster, only to the front end.
Please read all of this User's Guide before you try to run any jobs on the cluster, particularly the sections on compiling programs and running jobs.
You must use ssh (secure shell) to log in to Hydra, and sftp (secure FTP) to transfer data. If you have not used ssh before, see the primer below for a brief guide on how to use it.
Using ssh and sftp to access Hydra.
ssh stands for Secure SHell, and is a secure replacement for telnet, rlogin and rsh, i.e. it is for logging in to a remote machine. The standard ssh packages also provide sftp and scp, which are secure replacements for ftp and rcp, i.e. for transferring data to and from a remote machine.
If ssh is not available on your local machine, you can ask your systems administrator to install it, or install it yourself. You can download a ssh client for Unix from http://www.openssh.com/. You can download a ssh client for MS Windows from http://www.chiark.greenend.org.uk/~sgtatham/putty.
On Hydra we are using openssh version 4.3p2. Older versions of ssh may not be compatible with this one, so you may need to upgrade.
Using ssh and sftp is simple. To connect to Hydra from the eResearch SA domain:
For a Unix based computer, use,
ssh hydra sftp hydra
If you are outside the eresearchsa.edu.au domain, you will of course need to specify the complete hostname:
ssh hydra.sapac.edu.au sftp hydra.sapac.edu.au
If your username on Hydra is different to your username on the machine you are logging in from, you will need to specify your username on Hydra:
ssh username@hydra.sapac.edu.au sftp username@hydra.sapac.edu.au
The process for a MS Windows based computer using the Putty ssh client is similar but slightly different in that the connection is done through a GUI. Please read the accompanying documentation or consult your systems administrator.
The first time you connect, ssh may tell you that you have not connected to this host before, and ask if it should go ahead and connect.
ssh username@hydra.sapac.edu.au Host key not found from the list of known hosts. Are you sure you want to continue connecting (yes/no)? yes Host 'hydra' added to the list of known hosts. username@hydra's password: Last login: Tue Sep 26 10:16:48 2000 on tty1
NOTE - make sure you type "yes" not just "y":
Using SCP to transfer data to and from Hydra.
Another utility ssh provides is scp, which works exactly the same way as rcp for remotely copying files. To copy a file from your local machine to your home directory on Hydra, use:
scp myfile.dat username@hydra:
Note the ':' at the end. You can also specify a directory where you want the file to go:
scp myfile.dat username@hydra:/home/username/mydir
or a new name for the file, as with the standard cp file copying command:
scp myfile.dat username@hydra:/home/username/mydir/mynewfile.dat
There is a GUI based scp client for MS Windows based computers that has a "drag and drop" facility and an inbuilt file editor. It can be obtained from http://winscp.net/eng/index.php
Using XWindows applications with Hydra.
If you wish to run an xterm or another XWindows application such as pgprof from a Unix based computer, you will need to enable X forwarding. This is done from a Unix based machine by passing the -X option (you may find that you will need to use the -Y option instead in some cases).
ssh -X username@hydra.sapac.edu.au
For MS Windows based computers you will also need a Windows XServer running. For advice, contact the eResearch SA Service Desk.
Login files (.cshrc) and environment variables
Every time you login to Hydra a default .cshrc system file is run. This file establishes some of your basic environment, setting your prompt and ensuring your $PATH variable gives access to basic system commands. In addition you have a file in your home directory, .cshrc.hydra, that is invoked by this global .cshrc login file and can be easily configured to allow access to the various application software packages you wish to run.
The global default .cshrc file can be found in your home directory
NOTE: Do not attempt to change the default .cshrc file in your home directory. Any changes you make will be overwritten by the system the next time you login. Any changes you wish to make to your operating environment should be made in the .cshrc.hydra file in your home directory.
Below is an example of a typical .cshrc.hydra file:
module load intel
module load java
This file make the following additions to your environment:
NOTE: You should avoid putting commands in your .cshrc.hydra file that write output to the screen, since this may affect some non-interactive jobs.
eResearch SA's system administrators already supply a basic .cshrc.hydra file when your account is created. You will only need to alter it if you need access to certain other installed software. If you are unsure as to how to effect changes, contact the eResearch SA Service Desk.
Modules
eResearch SA has embarked on using "modules" as the primary way to configure the user environment to provide access to software packages rather than the environment variable method used previously on Hydra and Aquila. This provides much easier access to the packages on the system. Researchers who have used APAC's HPC systems will have already had some exposure to this more dynamic mechanism for gaining access to software.
To see what modules are available to be loaded (which applications are available on the cluster), type
module avail
at the command prompt.
You can also see which modules you currently have loaded by typing
module list
Similarly, you can unload modules using, for example, module unload gaussian to unload the Gaussian module, removing all references to the Gaussian executable and associated runtime libraries
If you do not see a module listed for the application that you wish to run please contact the eResearch SA Service Desk.
Sequential programs
Sequential programs should run without change on a single processor of the cluster. You can therefore use the cluster without knowing how to write parallel programs, simply by submitting multiple sequential jobs.
Parallel programming
Alternatively, you can port or develop your programs using a standard parallel programming language. Programs written using High Performance Fortran (HPF), Message Passing Interface (MPI), or OpenMP (shared memory directives) can be compiled and run on the cluster. OpenMP programs can only be run on one node (2 processors) since they use shared memory. HPF programs can be run on up to 64 processors (this is a license restriction). MPI jobs can be run on any number of processors.
HPF
Programs written in Fortran 90 and using Fortran 90 array syntax can be ported to HPF fairly simply by adding compiler directives to specify the distribution of arrays over processors. For more information on HPF and the Portland Group HPF compiler that is provided on Hydra, see the PGHPF documentation. There is a good online HPF Programming Course from Edinburgh Parallel Computing Centre. The High Performance Fortran Handbook by C.H. Koelbel et al. is a useful reference. See Documentation for more information.
MPI
You can use MPI to parallelize programs written in Fortran, C or C++. This is more difficult to program than HPF or OpenMP, but typically gives better performance. For more information on MPI, you can look at this list of materials for learning MPI. There is a good online MPI Programming Course from Edinburgh Parallel Computing Centre. A standard reference book is Using MPI: Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994. See Documentation for more information.
Parallel scientific software libraries
For some programs, the majority of the time is taken up in standard routines such as matrix solve, FFT, or computing eigenvalues. In that case, it is possible to use libraries containing parallel versions of these routines, which should speed up your program without requiring you to write any parallel code.
ScaLAPACK is a parallel version of the well-known LAPACK linear algebra libraries that provides parallel versions of many commonly-used numerical routines. ScaLAPACK is available on Hydra through the Intel Math Kernel Libraries (MKL).
Standard software packages
Many standard software packages have parallel versions of the software available. The Software section of this User Guide lists some parallel programs that have been installed. Please contact the eResearch SA Service Desk if you would like other packages installed.
Help with parallel program development
eResearch SA periodically runs training courses on parallel programming.
For help with porting programs and optimising performance on the cluster, contact the eResearch SA Service Desk.
The following compilers are provided on Hydra. They should all be accessible, with the default path provided once a choice has been made for the $COMPILER environment variable in your .cshrc.hydra file. If you are unsure about how do do this, refer to the Getting started section of this User Guide.
NOTE: You may find that some programs will only compile, or will run faster, using certain compilers, so you may want to try them all.
Check the Documentation section of this User Guide for details on usage and options for each compiler.
MPI programs
MPI programs can be compiled using mpicc (for C programs), mpiCC (C++), mpif77 (Fortran 77) or mpif90 (Fortran 90). This uses the underlying Compiler Suite that has been loaded as a module before loading the openmpi module, maybe in your .cshrc.hydra file. This will have set the MPI include ($MPIINCDIR) and library ($MPILIBDIR) directory environment variables.
Use the which command to check you are getting the right version of the MPI compilers. For example, when using the Intel compiler Suite, you will have preloaded the intel and openmpi modules, either in your .cshrc.hydra file or on the command line with:
module load intel
module load openmpi
and then;
which mpicc
should return
/opt/apps/system/openmpi/1.2.7/intel/bin/mpicc
NOTE: You can also use the C and Fortran compilers directly and just link in the MPI libraries compiled with the Intel compiler, for example:
icc -I$MPIINCDIR -L$MPILIBDIR -o MyMpiProgram MyMpiProgram.c -lm -lmpi -lpthread
High performance fortran programs
HPF programs can be compiled using the Portland Group High Performance Fortran compiler pghpf, so you should make sure that you have set module load pgi in your .cshrc.hydra file.
When compiling to run a program you should:
For example:
pghpf -Mmpi -L$MPICHLIBDIR -o MyHPFProgram MyHPFProgram.f -lpthread
OpenMP programs
OpenMP directives for shared memory parallel programming are supported by the Portland Group and Intel compilers, however these will only be able to run on a single node of Hydra (i.e. 2 processors).
General tips and information
Jobs are run on Hydra by submitting a jobscript to the Torque queuing system.
Jobs are submitted to the Torque queuing system by issuing the command:
qsub myscript
where myscript contains the relevant Torque commands.
Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:
Sample Torque Jobscript for a Sequential Job
#!/bin/sh -l
#PBS -V
### Job name
#PBS -N MyJobName
### Join queuing system output and error files into a single output file
#PBS -j oe
### Send email to user when job ends or aborts
#PBS -m ae
### email address for user
#PBS -M Your-email-Address
### Queue name that job is submitted to
#PBS -q hydra
### Request nodes NB THIS IS REQUIRED
#PBS -l nodes=1,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
module load application
# Run the executable
MyProgram+Arguments
NOTES:
#PBS are interpreted as Torque commands directly to the queuing system.MyJobName.oXXX in the directory from which the job is submitted.Sample Torque Jobscript for a MPI Job
#!/bin/sh -l
#PBS -V
### Job name
#PBS -N MyMPIJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user
#PBS -M Your-email-Address
### Queue name
#PBS -q hydra
### Number of nodes
#PBS -l nodes=X:ppn=2,walltime=HH:MM:SS
# Calculate the number of processors to be used
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Using nodes
cat $PBS_NODEFILE
module load Compiler
module load openmpi
module load application
# Run the executable
mpirun -np $NP MyProgram+Arguments
NOTES:
#PBS are interpreted as PBS commands.MyMPIJobName.oXXX in the directory from which the job is submitted.mpirun instead of mpirun to run MPI programs under Torque.-np option indicates the number of MPI processes to run and can be up to 2x the number of nodes requested (X)
Checking a job's status in the queue
Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.hydra will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Hydra, type
qstat
Output similar to the following will be displayed:
Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 787.hydra flicprop3 wkamleh 2873:45: R large 886.hydra flicprop0 wkamleh 3115:56: R large 915.hydra flicprop2 wkamleh 2370:40: R large 920.hydra dyn.mparappi mparappi 00:25:22 R small 921.hydra dyn.mparappi mparappi 0 Q seq
You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.
The above sample output show 3 jobs running on the queue large, 1 job running on the queue small and 1 queued job that has not yet started on the queue seq.
NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.
Deleting a queued job
To delete a queued or running job type
qdel job.id
where the job.id is that given by the output of qstat
NOTE: You will only be able to delete your own jobs.
PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!
Temporary storage during computation
Each of eResearch SA's compute facilities has some temporary storage available, in the form of local hard disks, whilst jobs are running. Please see the Hardware section of this User Guide to determine how much temporary space is available.
Long term storage
There is 3 TB in total of shared storage available for all users of eResearch SA's facilities. This leaves a relatively modest amount of disk space available for each user. Consequently, users' home directories, /home/users, should be used ONLY for storing small data files, executables, job submission scripts, etc. Larger data files should be stored in the shared data area, /data/users.
Note:
In addition, some research groups, having need for greater capacity, have funded their own dedicated storage, which is managed by eResearch SA on their behalf.
Individual researchers, or research groups can arrange similar access to further storage capacity as their needs increase.
General information
Cluster hardware
Compilers
Message passing interface (MPI)
Fortran and High Performance Fortran (HPF)
For more information on eResearch SA's facilities, systems support, assistance with parallel programming and performance optimisation and to report any problems, contact the eResearch SA Service Desk.
When reporting problems, please give as much information as you can to help us in diagnosis, for example: