Download 1.x PDF - Read the Docs

Transcript
IPython Documentation, Release 1.2.1: An Afternoon Hack
5.7 Using MPI with IPython
Often, a parallel algorithm will require moving data between the engines. One way of accomplishing this is
by doing a pull and then a push using the multiengine client. However, this will be slow as all the data has
to go through the controller to the client and then back through the controller, to its final destination.
A much better way of moving data between engines is to use a message passing library, such as the Message Passing Interface (MPI) [MPI]. IPython’s parallel computing architecture has been designed from the
ground up to integrate with MPI. This document describes how to use MPI with IPython.
5.7.1 Additional installation requirements
If you want to use MPI with IPython, you will need to install:
• A standard MPI implementation such as OpenMPI [OpenMPI] or MPICH.
• The mpi4py [mpi4py] package.
Note: The mpi4py package is not a strict requirement. However, you need to have some way of calling
MPI from Python. You also need some way of making sure that MPI_Init() is called when the IPython
engines start up. There are a number of ways of doing this and a good number of associated subtleties. We
highly recommend just using mpi4py as it takes care of most of these problems. If you want to do something
different, let us know and we can help you get started.
5.7.2 Starting the engines with MPI enabled
To use code that calls MPI, there are typically two things that MPI requires.
1. The process that wants to call MPI must be started using mpiexec or a batch system (like PBS) that
has MPI support.
2. Once the process starts, it must call MPI_Init().
There are a couple of ways that you can start the IPython engines and get these things to happen.
Automatic starting using mpiexec and ipcluster
The easiest approach is to use the MPI Launchers in ipcluster, which will first start a controller and
then a set of engines using mpiexec:
$ ipcluster start -n 4 --engines=MPIEngineSetLauncher
This approach is best as interrupting ipcluster will automatically stop and clean up the controller and
engines.
306
Chapter 5. Using IPython for parallel computing