pbs_submission_flags
-
Pbs Submission Flags: common and useful directives
-
How can I make a PBS wait until another has completed before running ?
Below is a list of the most common and useful directives.
| Option | System type | Description |
|---|---|---|
| -k | All | Send "stdout" and/or "stderr" to your home directory when the job runs
#PBS -k o or |
| -l | All | Precedes a resource request e.g. processors, wallclock |
| -M | All | Send an email messages to an alternative email address
#PBS -M me@myplace.thingy.com |
| -m | All | Send an email address when a job begins execution and/or ends or aborts
#PBS -m b etc |
| mem | Shared Memory |
Specifies the amount of memory you need for a job. This is a requirement for ORAC where you may need a lot of memory but not many processors. Memory and processors are given out together on ORAC so if you need a high memory to processor ratio, remember to specify this directive.
#PBS -l mem=80gb |
| mpiprocs | Clusters | Number of processes per node on a cluster. This should equal number of processors on a node in most cases.
#PBS -l mpiprocs=4 |
| -N | All | Give your job a unique name
#PBS -N galaxies1234 |
| -ncpus | Shared Memory |
The number of processors to use for a shared memory job. On ORAC this should always be a multiple of 4 as processors are allocated by blades, each with 4 processors.
#PBS ncpus=4 |
| -r | All | Control whether or not jobs should automatically re-run from the start if the system crashes or is rebooted. Users with check points might not which this to happen. #PBS -r n #PBS -r y |
| select | Clusters | Number of compute nodes to use. Usually combined with the mpiprocs directive
#PBS -l select=2 |
| -V | All | Make sure that the environment in which the job runs is the same as the environment in which it was submitted.
#PBS -V |
| walltime | All | The maximum time a job can run before being stopped. If not used a default of a few minutes is used. Use this flag to prevent jobs which go bad running for hundreds of hours. Format is HH:MM:SS
#PBS -l walltime=12:00:00 |
How can I make a PBS wait until another has completed before running ?
The trick is to make the subsequent job(s) depend on the first. In the following simple example, job 80 is not going to start until job 79 finishes (afterany means 79 just has to end, successfully or not) The second job has a line in its submission script that says
#PBS -W depend=afterany:79
To show it works, I use a qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
79.corvus testuser corvus Depend01 15771 1 1 1800mb -- R 00:02
80.corvus testuser corvus Depend02 -- 1 1 1800mb -- H --
The state "H" means hold .. i.e. waiting for something Obviously you'll need to figure out the PBS jobIDs as you go along ... if you use a script to submit the jobs, then capture the last job number and use it to create 3the correct depend= statement. The man qsub page is missing some of the important facts (at least on my terminal) so you might want to download the PDF user guide from sunfire01 (I emailed you about it after the rebuild of PBS) The missing details for depend are as follows:
after:arg_list
This job may be scheduled for execution at any point after all jobs
in arg_list have started execution.
afterok:arg_list
This job may be scheduled for execution only after all jobs in
arg_list have terminated with no errors. See "Warning about exit
status with csh" in EXIT STATUS.
afternotok:arg_list
This job may be scheduled for execution only after all jobs in
arg_list have terminated with errors. See "Warning about exit status
with csh" in EXIT STATUS.
afterany:arg_list
This job may be scheduled for execution after all jobs in
arg_list have terminated, with or without errors.
before:arg_list
Jobs in arg_list may begin execution once this job has begun
execution.
beforeok:arg_list
Jobs in arg_list may begin execution once this job terminates
without errors. See "Warning about exit status with csh" in
EXIT STATUS.
beforenotok:arg_list
If this job terminates execution with errors, the jobs in arg_list
may begin. See "Warning about exit status with csh" in EXIT
STATUS.
beforeany:arg_list
Jobs in arg_list may begin execution once this job terminates
execution, with or without errors.
on:count
This job may be scheduled for execution after count depen-
dencies on other jobs have been satisfied. This type is used in
conjunction with one of the before types listed. count is an
integer greater than 0.


