Download User`s Guide
Transcript
Appendix B. TORQUE Release Information e - remove the forking from pbs_server running a job, the thread handling the request just waits until the job is run. e - change qdel to simply send qdel all - previously this was executed by a qstat and a qdel of every individual job e - no longer fork to send mail, just use a thread e - use hwloc as the backbone for cpuset support in TORQUE (contributed by Dr. Bernd Kallies) e - add the boolean variable $use_smt to mom config. If set to false, this skips logical cores and uses only physical cores for the job. It is true by default. (contributed by Dr. Bernd Kallies) n - with the multi-threading the pbs_server -t create and -t cold commands could no longer ask for user input from the command line. The call to ask if the user wants to continue was moved higher in the initialization process and some of the wording changed to reflect what is now happening. e - if cpusets are configured but aren’t found and cannot be mounted, pbs_mom will now fail to start instead of failing silently. e - Change node_spec from an N^2 (but average 5N) algorithm to an N algorithm with respect to nodes. We only loop over each node once at a maximum. e - Abandon pbs_iff in favor of trqauthd. trqauthd is a daemon to be started once that can perform pbs_iff’s functionality, increasing speed and enabling future security enhancements e - add mom_hierarchy functionality for reporting. The file is located in <TORQUE_HOME>/server_priv/mom_hierarchy, and can be written to tell moms to send updates to other moms who will pass them on to pbs_server. See docs for details e - add a unit testing framework (check). It is compiled with --with-check and tests are executed using make check. The framework is complete but not many tests have been written as of yet. b - Made changes to IM protocol where commands were not either waiting for a reply or not sending a reply. Also made changes to close connections that were left open. b - Fix for where qmgr record_job_info is True and server hangs on startup. e - Mom rejection messages are now passed back to qrun when possible e - Added the option -c for startup. By default, the server attempts to send the mom hierarchy file to all moms on startup, and all moms update the server and request the hierarchy file. If both are trying to do this at once, it can cause a lot of traffic. -c tells pbs_server to wait 10 minutes to attempt to contact moms that haven’t contacted it, reducing this traffic. e - Added mom parameter -w to reduce start times. This parameter wait to send it’s first update until the server sends it the mom hierarchy file, or until 10 minutes have passed. This should reduce large cluster startup times. 3.0.5 b b b e b - fix for writing too much data when job_script is saved to job log. fix for where pbs_mom would not automatically set gpu mode. fix for alligning qstat -r output when configured with -DTXT. Change size of transfer block used on job rerun from 4k to 64k. With nvidia gpus, TORQUE was losing the directive of what nodes it should run the job on from Moab. Corrected. e - add the $PBS_WALLTIME variable to jobs, thanks to a patch from Mark Roberts n - change moab_array_compatible server parameter so it defaults to true e - change to allow pbs_mom to run if configured with --enable-nvidia-gpus but installed on a node without Nvidia gpus. 3.0.4 c - fix a buffer being overrun with nvidia gpus enabled b - no longer leave zombie processes when munge authenticating. 61