Download User Manual - Support
Transcript
18 Workload Management 4.4 Job Submission Process Whenever a job is submitted, the workload management system checks on the resources requested by the job script. It assigns cores, accelerators, local disk space, and memory to the job, and sends the job to the nodes for computation. If the required number of cores or memory are not yet available, it queues the job until these resources become available. If the job requests resources that are always going to exceed those that can become available, then the job accordingly remains queued indefinitely. The workload management system keeps track of the status of the job and returns the resources to the available pool when a job has finished (that is, been deleted, has crashed or successfully completed). 4.5 What Do Job Scripts Look Like? A job script looks very much like an ordinary shell script, and certain commands and variables can be put in there that are needed for the job. The exact composition of a job script depends on the workload manager used, but normally includes: • commands to load relevant modules or set environment variables • directives for the workload manager to request resources, control the output, set email addresses for messages to go to • an execution (job submission) line When running a job script, the workload manager is normally responsible for generating a machine file based on the requested number of processor cores (np), as well as being responsible for the allocation any other requested resources. The executable submission line in a job script is the line where the job is submitted to the workload manager. This can take various forms. Example For the Slurm workload manager, the line might look like: srun --mpi=mpich1_p4 ./a.out Example For Torque or PBS Pro it may simply be: mpirun ./a.out Example For SGE it may look like: mpirun -np 4 -machinefile $TMP/machines ./a.out 4.6 Running Jobs On A Workload Manager The details of running jobs through the following workload managers is discussed later on, for: • Slurm (Chapter 5) • SGE (Chapter 6) • Torque (with Maui or Moab) and PBS Pro (Chapter 7) © Bright Computing, Inc.