Download User Manual - Support

Transcript
18
Workload Management
4.4
Job Submission Process
Whenever a job is submitted, the workload management system checks on the resources requested by
the job script. It assigns cores, accelerators, local disk space, and memory to the job, and sends the job to
the nodes for computation. If the required number of cores or memory are not yet available, it queues
the job until these resources become available. If the job requests resources that are always going to
exceed those that can become available, then the job accordingly remains queued indefinitely.
The workload management system keeps track of the status of the job and returns the resources to
the available pool when a job has finished (that is, been deleted, has crashed or successfully completed).
4.5
What Do Job Scripts Look Like?
A job script looks very much like an ordinary shell script, and certain commands and variables can be
put in there that are needed for the job. The exact composition of a job script depends on the workload
manager used, but normally includes:
• commands to load relevant modules or set environment variables
• directives for the workload manager to request resources, control the output, set email addresses
for messages to go to
• an execution (job submission) line
When running a job script, the workload manager is normally responsible for generating a machine
file based on the requested number of processor cores (np), as well as being responsible for the allocation
any other requested resources.
The executable submission line in a job script is the line where the job is submitted to the workload
manager. This can take various forms.
Example
For the Slurm workload manager, the line might look like:
srun --mpi=mpich1_p4 ./a.out
Example
For Torque or PBS Pro it may simply be:
mpirun ./a.out
Example
For SGE it may look like:
mpirun -np 4 -machinefile $TMP/machines ./a.out
4.6
Running Jobs On A Workload Manager
The details of running jobs through the following workload managers is discussed later on, for:
• Slurm (Chapter 5)
• SGE (Chapter 6)
• Torque (with Maui or Moab) and PBS Pro (Chapter 7)
© Bright Computing, Inc.