Download User`s Guide

Transcript
Appendix B. TORQUE Release Information
e - remove the forking from pbs_server running a job, the thread handling the request just
waits until the job is run.
e - change qdel to simply send qdel all - previously this was executed by a qstat and a qdel
of every individual job
e - no longer fork to send mail, just use a thread
e - use hwloc as the backbone for cpuset support in TORQUE (contributed by Dr. Bernd Kallies)
e - add the boolean variable $use_smt to mom config. If set to false, this skips logical
cores and uses only physical cores for the job. It is true by default.
(contributed by Dr. Bernd Kallies)
n - with the multi-threading the pbs_server -t create and -t cold commands could no longer
ask for user input from the command line. The call to ask if the user wants to continue
was moved higher in the initialization process and some of the wording changed to
reflect what is now happening.
e - if cpusets are configured but aren’t found and cannot be mounted, pbs_mom will now
fail to start instead of failing silently.
e - Change node_spec from an N^2 (but average 5N) algorithm to an N algorithm with respect
to nodes. We only loop over each node once at a maximum.
e - Abandon pbs_iff in favor of trqauthd. trqauthd is a daemon to be started once that can
perform pbs_iff’s functionality, increasing speed and enabling future security
enhancements
e - add mom_hierarchy functionality for reporting. The file is located in
<TORQUE_HOME>/server_priv/mom_hierarchy, and can be written to tell moms to send
updates to other moms who will pass them on to pbs_server. See docs for details
e - add a unit testing framework (check). It is compiled with --with-check and tests
are executed using make check. The framework is complete but not many tests have
been written as of yet.
b - Made changes to IM protocol where commands were not either waiting for a reply
or not sending a reply. Also made changes to close connections that were left
open.
b - Fix for where qmgr record_job_info is True and server hangs on startup.
e - Mom rejection messages are now passed back to qrun when possible
e - Added the option -c for startup. By default, the server attempts to send the mom
hierarchy file to all moms on startup, and all moms update the server and request
the hierarchy file. If both are trying to do this at once, it can cause a lot of
traffic. -c tells pbs_server to wait 10 minutes to attempt to contact moms that
haven’t contacted it, reducing this traffic.
e - Added mom parameter -w to reduce start times. This parameter wait to send it’s
first update until the server sends it the mom hierarchy file, or until 10
minutes have passed. This should reduce large cluster startup times.
3.0.5
b b b e b -
fix for writing too much data when job_script is saved to job log.
fix for where pbs_mom would not automatically set gpu mode.
fix for alligning qstat -r output when configured with -DTXT.
Change size of transfer block used on job rerun from 4k to 64k.
With nvidia gpus, TORQUE was losing the directive of what nodes it should
run the job on from Moab. Corrected.
e - add the $PBS_WALLTIME variable to jobs, thanks to a patch from Mark Roberts
n - change moab_array_compatible server parameter so it defaults to true
e - change to allow pbs_mom to run if configured with --enable-nvidia-gpus but
installed on a node without Nvidia gpus.
3.0.4
c - fix a buffer being overrun with nvidia gpus enabled
b - no longer leave zombie processes when munge authenticating.
61