Contents:
Aquila is an SGI Altix 3000 with 160 Intel Itanium2 1.3 GHz processors (64 bit), each with 3 MB of L3 cache. It has 160 GB of RAM which can be shared between the processors in various ways, for example, as 160 processors each accessing 1 GB of RAM, or as 1 processor accessing 160 GB of RAM.
Aquila uses SGI's propriety NUMAlink interconnect fabric architecture for system communication. This is what gives the SGI Altix the ability to present itself as a single image machine with 160 processors and 160 GB of available RAM. This is quite different from our clustered systems.
Aquila has access to a pool of storage which is shared amongst several eResearch SA facilities. See Data storage and backups for more information.
In addition, Aquila has approximately 1.6 TB of local disk space for storing data and other files whilst a job is running, /scratch. These files will be removed once a job has terminated. Remember that the local disk space is a shared resource used by all running jobs.
Note: If your job will require local disk space in excess of 100–150 GB contact the eResearch SA Service Desk before you try to run the job.
Compilers and parallel computing libraries
Libraries
Installed application software
Time on the machine is available to researchers at any of the South Australian universities through eResearch SA. Researchers at these universities who wish to use any of eResearch SA's facilities should complete the membership form.
Anyone else who is interested in using eResearch SA's facilities should consult the Conditions of Use to determine how best to gain access to the machine.
Before using the machine, you must read this User's Guide. If you have any questions about the machine or its usage, contact the eResearch SA system administrators at the eResearch SA Service Desk.
Aquila is a large, single image, shared memory machine. Every user is logged on to the same machine and all users have their jobs running on that machine. It is important to be aware of this when using Aquila. What you do on the machine can easily affect other users.
You must use the Torque job queuing system to submit jobs to run on Aquila. See Running jobs for information on how to do this. It is only permissible to compile programs and run very short test jobs from the commandline.
Note: Users who violate the above policy may have their accounts disabled.
Please read this User Guide before you try to run any jobs on the machine, particularly the sections on compiling programs and running jobs.
You must use ssh (secure shell) to log in to Aquila, and sftp (secure FTP) to transfer data. If you have not used ssh before, see the primer below for a brief guide on how to use it.
Using ssh and sftp to access Aquila.
ssh stands for Secure SHell, and is a secure replacement for telnet, rlogin and rsh, i.e. it is for logging in to a remote machine. The standard ssh packages also provide sftp and scp, which are secure replacements for ftp and rcp, i.e. for transferring data to and from a remote machine.
If ssh is not available on your local machine, you can ask your systems administrator to install it, or install it yourself. You can download a ssh client for Unix from http://www.openssh.com/. You can download a ssh client for MS Windows from http://www.chiark.greenend.org.uk/~sgtatham/putty.
On Aquila we are using openssh version 4.1p1. Older versions of ssh may not be compatible with this one, so you may need to upgrade.
Using ssh and sftp is simple. To connect to Aquila from the eResearch SA domain:
For a Unix based computer, use,
ssh aquila sftp aquila
If you are outside the eresearchsa.edu.au domain, you will of course need to specify the complete hostname:
ssh aquila.sapac.edu.au sftp aquila.sapac.edu.au
If your username on Aquila is different to your username on the machine you are logging in from, you will need to specify your username on Aquila:
ssh username@aquila.sapac.edu.au sftp username@aquila.sapac.edu.au
The process for a MS Windows based computer using the Putty ssh client is similar but slightly different in that the connection is done through a GUI. Please read the accompanying documentation or consult your systems administrator.
The first time you connect, ssh may tell you that you have not connected to this host before, and ask if it should go ahead and connect.
ssh username@aquila.sapac.edu.au Host key not found from the list of known hosts. Are you sure you want to continue connecting (yes/no)? yes Host 'aquila' added to the list of known hosts. username@aquila's password: Last login: Tue Sep 26 12:23:51 2005 on tty1
Note: make sure you type "yes" not just "y":
Using SCP to transfer data to and from Aquila.
Another utility ssh provides is scp, which works exactly the same way as rcp for remotely copying files. To copy a file from your local machine to your home directory on Aquila, use:
scp myfile.dat username@aquila:
Note the ':' at the end. You can also specify a directory where you want the file to go:
scp myfile.dat username@aquila:/home/username/mydir
or a new name for the file, as with the standard cp file copying command:
scp myfile.dat username@aquila:/home/username/mydir/mynewfile.dat
There is a GUI based scp client for MS Windows based computers that has a "drag and drop" facility and an inbuilt file editor. It can be obtained from http://winscp.net/eng/index.php
For advice, contact eResearch SA Service Desk.
Login files (.cshrc) and environment variables
Every time you login to Aquila a default .cshrc system file is run. This file establishes some of your basic environment, setting your prompt and ensuring your $PATH variable gives access to basic system commands. In addition you have a file in your home directory, .cshrc.aquila, that is invoked by this global .cshrc login file and can be easily configured to allow access to the various application software packages you wish to run.
The global default .cshrc file can be found in your home directory
You should take the time to read the comments in this file as they provide details of changes you may wish to make to your environment.
NOTE:
eResearch SA's system administrators already supply a basic .cshrc.aquila file when your account is created. You will only need to alter it if you need access to certain other installed software. If you are unsure as to how to effect changes, contact the servicedesk.
Modules
eResearch SA has embarked on using "modules" as the primary way to configure the user environment to provide access to software packages rather than the environment variable method used previously on Hydra and Aquila. This provides much easier access to the packages on the system. Researchers who have used APAC's HPC systems will have already had some exposure to this more dynamic mechanism for gaining access to software.
To see what modules are available to be loaded (which applications are available on the cluster), type
module avail
at the command prompt.
You can also see which modules you currently have loaded by typing
module list
Similarly, you can unload modules using, for example, module unload gaussian to unload the Gaussian module, removing all references to the Gaussian executable and associated runtime libraries
If you do not see a module listed for the application that you wish to run please contact the eResearch SA Service Desk.
Sequential programs
Sequential programs should run without change on a single processor of the machine. You can therefore use the machine without knowing how to write parallel programs, simply by submitting sequential jobs to the queuing system. Of course, your programs will have to be compiled to run on the 64 bit architecture first.
Parallel programming
Alternatively, you can port or develop your programs using a standard parallel programming language. Programs written using Message Passing Interface (MPI), or OpenMP (shared memory directives) can be compiled and run on Aquila. OpenMP and MPI programs can be run on any number of processors up to the limit of the physical number of processors available.
NOTE: If you wish to run a program on a large number of processors (more than 16), make sure that you contact the servicedesk before you submit the job or you may encounter difficulty in actually getting the job to run.
OpenMP
The OpenMP API supports shared-memory parallel programming in C, C++ and Fortran. You can use OpenMP directives placed into your source code to allow some automatic parallelisation of loops within programs.For more information on OpenMP, a useful and comprehensive tutorial is available. Other resources can be found in the Documentation section of this User Guide.
MPI
You can use MPI to parallelise programs written in Fortran, C or C++. This is more difficult to program than OpenMP, but typically gives better performance. For more information on MPI, you can look at this list of materials for learning MPI. There is a good MPI Programming Course from Edinburgh Parallel Computing Centre. A standard reference book is Using MPI: Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994. More information is available in the Documentation section of this User Guide.
Parallel scientific software libraries
For some programs, the majority of the time is taken up in standard routines such as matrix solving, FFT, or computing eigenvalues. In that case, it is possible to use libraries containing parallel versions of these routines, which should speed up your program without requiring you to write any parallel code.
SGI provides its Scientific Computing Software Library, SCSL, for use on the Altix. This contains BLAS, LAPACK, Sparse Matrix solvers and FFT routines, highly optimised for the Altix architecture.
In addition, the Intel Math Kernel Library, MKL, is available on the Altix for use in conjunction with the Intel Compiler Suite with its own BLAS, LAPACK, ScaLAPACK, Sparse Matrix Solvers and FFT routines, optimised for the Intel Itanium 2 processors.
Standard software packages
Many standard software packages have parallel versions of the software available. The Software section of this User Guide lists some parallel programs that have been installed. Please contact the servicedesk if you would like other packages installed.
Help with parallel program development
eResearch SA periodically runs training courses on parallel programming. For help with porting programs and optimising performance on the machine, contact the servicedesk.
The following compilers are provided on Aquila. They should all be accessible, with the default path provided once a choice has been made for the $COMPILER environment variable in your .cshrc.aquila file. If you are unsure about how do do this, refer to the Getting started section of this User Guide.
NOTE: You may find that some programs will only compile, or will run faster, using certain compilers, so you may want to try them all.
Check the Documentation section of this User Guide for details on usage and options for each compiler.
OpenMP Programs
OpenMP directives for shared memory parallel programming are supported by the Intel compilers. You will need to pass the -openmp flag to the compiler, for example:
icc -openmp -o MyOMPProgram MyOMPProgram.c
NOTE: You will then also have to set the environment variable $OMP_NUM_THREADS to be equal to the number of processors you wish to run on. See Running jobs in this User Guide.
MPI Programs
MPI programs can be compiled using mpicc (for C programs), mpiCC (C++), mpif77 (Fortran 77) or mpif90 (Fortran 90). This uses the underlying Compiler Suite that has been set to the $COMPILER environment variable in your .cshrc.aquila file. This could be one of gnu|intel. You will also need to link the MPI library in to your program, for example:
icc -o MyMPIProgram MyMPIProgram -lmpi
General Tips and Information
Jobs are run on Aquila by submitting a jobscript to the Torque queuing system.
Jobs are submitted to the Torque queuing system by issuing the command:
qsub myscript
where myscript contains the relevant Torque commands.
Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:
Sample Torque Jobscript for a Sequential Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyJobName
### Join queuing system output and error files into a single output file
#PBS -j oe
### Send email to user when job ends or aborts
#PBS -m ae
### email address for user
#PBS -M Your-email-Address
### Queue name that job is submitted to
#PBS -q aquila
### Request nodes NB THIS IS REQUIRED
#PBS -l ncpus=1,nodes=ppn=1,mem=1GB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
module load application
# Run the executable
MyProgram+Arguments
NOTES:
#PBS are interpreted as Torque commands directly to the queuing system.MyJobName.oXXX in the directory from which the job is submitted.Sample Torque Jobscript for an OpenMP Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyOpenMPJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user #PBS -M Your-email-Address
### Queue name
#PBS -q aquila
### Number of processors, amount of memory and time required
#PBS -l ncpus=XX,nodes=ppn=XX,mem=YYGB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
setenv OMP_NUM_THREADS XX
module load application
# Run the executable
MyProgram+Arguments
NOTES:
#PBS are interpreted as PBS commands.MyOpenMPJobName.oXXX in the directory from which the job is submitted.Sample Torque Jobscript for a MPI Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyMPIJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user #PBS -M Your-email-Address
### Queue name
#PBS -q aquila
### Number of nodes, amount of memory and time required
#PBS -l ncpus=XX,nodes=ppn=XX,mem=YYGB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
module load application
# Run the executable
mpirun -np XX MyProgram+Arguments
NOTES:
#PBS are interpreted as PBS commands.MyMPIJobName.oXXX in the directory from which the job is submitted.Checking a Job's Status in the Queue
Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.aquila will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Aquila, type
qstat
Output similar to the following will be displayed:
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
787.aquila flicprop3 wkamleh 2873:45: R aquila
886.aquila flicprop0 wkamleh 3115:56: R aquila
915.aquila flicprop2 wkamleh 2370:40: R aquila
920.aquila dyn.mparappi mparappi 00:25:22 R aquila
921.aquila dyn.mparappi mparappi 0 Q aquila
You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.
The above sample output show 4 jobs running on the queue aquila and 1 queued job that has not yet started on the queue aquila.
NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.
Deleting a Queued Job
To delete a queued or running job type
qdel job.id
where the job.id is that given by the output of qstat
NOTE: You will only be able to delete your own jobs.
PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!
Temporary Storage During Computation
Each of eResearch SA's compute facilities has some temporary storage available, in the form of local hard disks, whilst jobs are running. Please see the Hardware section of this User Guide to determine how much temporary space is available.
Long Term Storage
eResearch SA currently maintains and manages 22 TB of storage.
There is 3 TB in total of shared storage available for all users of eResearch SA's facilities. This leaves a relatively modest amount of disk space available for each user. Consequently, users' home directories, /home/users, should be used ONLY for storing small data files, executables, job submission scripts, etc. Larger data files should be stored in the shared data area, /data/users.
Note:
Files in the data directories WILL NOT be backed up. You must make your own arrangements for backing up this data. You can use sftp to transfer data to your own computer and back it up yourself. We can assist by providing facilities for backup onto removable disks.
You may be asked to clean up your data area periodically, especially if you start to store large amounts of data there. This is to ensure everybody gets a fair allocation of a limited resource.
In addition, some research groups, having need for greater capacity, have funded their own dedicated storage, which is managed by eResearch SA on their behalf.
Individual researchers, or research groups can arrange similar access to further storage capacity as their needs increase.
General Information
SGI Hardware
Compilers
SGI Resources for the Altix
OpenMP Resources
MPI Resources
For more information on eResearch SA's facilities, systems support, assistance with parallel programming and performance optimisation and to report any problems, contact the Service Desk.
When reporting problems, please give as much information as you can to help us in diagnosis, for example: