Download TOTALVIEW USER GUIDE

Transcript
IBM Blue Gene Applications
While the way in which you debug IBM Blue Gene MPI programs is very similar to
debugging these programs on other platforms, starting TotalView on your program
differs slightly. Unfortunately, each machine is configured differently so you’ll need
to find information in IBM’s documentation or in documentation created at your
site.
Nevertheless, the remainder of this section presents some hints based on information we have gathered at various sites.
TotalView supports debugging applications on three generations of Blue Gene systems: Blue Gene/L, Blue Gene/P, and Blue Gene/Q. While the different Blue Gene
generations are similar, there are differences that affect how you start the debugger.
In general, either launch the MPI starter program under the control of the debugger,
or start TotalView and attach to an already running MPI starter program. On Blue
Gene/L and Blue Gene/P, the starter program is named mpirun. On Blue Gene/Q,
the starter program is named runjob in most cases, or srun when the system is configured to use SLURM.
For example, on Blue Gene/L or Blue Gene/P:
{ totalview | totalviewcli } mpirun -a mpirun-command-line
On most Blue Gene/Q systems:
{ totalview | totalviewcli } runjob -a runjob-command-line
On Blue Gene/Q systems configured to use SLURM:
{ totalview | totalviewcli } srun -a srun-command-line
All Blue Gene systems support a scalable tool daemon launching mechanism call
“co-spawning”, where the tool daemons, such as TotalView’s tvdsvr, are launched
along with the parallel job. As part of the startup or attach sequence, TotalView
tells the MPI starter process to launch (or co-spawn) the TotalView Debug Servers
on each Blue Gene I/O node.
To support co-spawning, TotalView must pass the address of the network interface
connected to the I/O node network on the front-end node to the servers on the I/O
nodes. This is usually not the same network interface that is used to connect to the
front-end node from the outside world. TotalView assumes that the address can be
resolved by using a name that is:
front-end-hostname-io.
For example, if the hostname of the front-end is bgqfen1, TotalView will attempt to
resolve the name bgqfen1-io to an IP address that the server is able to connect to.
Some systems follow this convention and some do not. If you are executing programs on a system that follows this convention, you will not need to set the
TotalView variables described in the rest of this section. You can use the command
ping -c 1 `hostname -s`-io on the front-end node to check whether the system
is using this convention.
ROGUEWAVE.COM
Setting Up MPI Debugging Sessions
445