Download User`s Guide
Transcript
Chapter 3. Running Programs • Provide knowledge of the state of the system to the application manually, through a configuration file, or through some add-on scheduling software. With Scyld ClusterWare, most of these steps are removed. Jobs are started on the master node and are migrated out to the compute nodes via BProc. A cluster architecture where jobs may be initiated only from the master node via BProc provides the following advantages: • Users no longer need accounts on remote nodes. • Users no longer need authorization to spawn jobs on remote nodes. • Neither binaries nor libraries need to be available on the remote nodes. • The BProc system provides a consistent view of all jobs running on the system. With all these complications removed, program execution on the compute nodes becomes a simple matter of letting BProc know about your job when you start it. The method for doing so depends on whether you are launching a parallel program (for example, an MPI job or PVM job) or any other kind of program. See the sections on running parallel programs and running non-parallelized programs later in this chapter. Program Execution Examples This section provides a few examples of program execution with Scyld ClusterWare. Additional examples are provided in the sections on running parallel programs and running non-parallelized programs later in this chapter. Example 3-1. Directed Execution with bpsh In the directed execution mode, the user explicitly defines which node (or nodes) will run a particular job. This mode is invoked using the bpsh command, the ClusterWare shell command analogous in functionality to both the rsh (remote shell) and ssh (secure shell) commands. Following are two examples of using bpsh. The first example runs hostname on compute node 0 and writes the output back from the node to the user’s screen: [user@cluster user] $ bpsh 0 /bin/hostname n0 If /bin is in the user’s $PATH, then the bpsh does not need the full pathname: [user@cluster user] $ bpsh 0 hostname n0 The second example runs the /usr/bin/uptime utility on node 1. Assuming /usr/bin is in the user’s $PATH: [user@cluster user] $ bpsh 1 uptime 12:56:44 up 4:57, 5 users, load average: 0.06, 0.09, 0.03 Example 3-2. Dynamic Execution with beorun and mpprun In the dynamic execution mode, Scyld decides which node is the most capable of executing the job at that moment in time. Scyld includes two parallel execution tools that dynamically select nodes: beorun and mpprun. They differ only in that beorun runs the job concurrently on the selected nodes, while mpprun runs the job sequentially on one node at a time. The following example shows the difference in the elapsed time to run a command with beorun vs. mpprun: [user@cluster user] $ date;beorun -np 8 sleep 1;date 22