LAM/MPI logo

LAM/MPI Overview

  |   Home   |   Download   |   Documentation   |   FAQ   |  
LAM/MPI is a high quality implementation of the Message Passing Interface (MPI) Standard. LAM/MPI provides high performance on a variety of platforms, from small off-the-shelf single CPU clusters to large SMP machines with high speed networks, even in heterogenous environments. In addition to high performance, LAM provides a number of usability features key to developing large scale MPI applications.

MPI-1 Support
LAM/MPI provides a complete implementation of the MPI 1.2 standard, ensuring source-code compatibility with any other implementation of the standard. A recompile of the application is all that is required for execution of any MPI-1 application under LAM/MPI.

MPI-2 Support
LAM/MPI includes support for large portions of the MPI-2 standard. A partial list of the most commonly used features LAM supports is provided below:

Checkpoint/Restart
MPI applications running under LAM/MPI can be checkpointed to disk and restarted at a later time. LAM requires a 3rd party single-process checkpoint/restart toolkit for actually checkpointing and restarting a single MPI process - LAM takes care of the parallel coordination. Currently, the
Berkeley Labs Checkpoint/Restart package (Linux only) is supported. The infrastructure allows for easy addition of new checkpoint/restart packages.

Fast Job Startup
LAM/MPI utilizes a small, user-level daemon for process control, output forwarding, and out-of-band communication. The user-level daemon is started at the beginning of a session using lamboot, which can use rsh/ssh, TM (OpenPBS / PBS Pro), SLURM, or BProc to remotely start the daemons. Although the time for lamboot to execute can be long on large platforms using rsh/ssh, the start-up time is amortized as applications are executed - mpirun does not use rsh/ssh, instead using the LAM daemons. Even for very large number of nodes, MPI application startup is on the order of a couple of seconds.

High Performance Communication
LAM/MPI provides a number of options for MPI communication with very little overhead. The TCP communication system provides near-TCP stack bandwidth and latency, even at Gigabit Ethernet speeds. Two shared memory communication channels are available, each using TCP for remote-node communication. LAM/MPI 7.0 and later support Myrinet networks using the GM interface. Using Myrinet provides significantly higher bandwidth and lower latency than TCP. LAM/MPI 7.1 also provides support for high-speed, low-latency Infiniband networks using the Mellanox Verbs Interface (VAPI).

Run-time Tuning and RPI Selection
LAM/MPI has always supported a wide number of tuning parameters. Unfortunately, most could only be set at compile time, leading to painful application tuning. With LAM/MPI 7.0, almost every parameter in LAM can be altered at run-time -- environment variables or flags to mpirun -- make tuning much simpler. The addition of LAM's System Services Interface (SSI) allows RPI (the network transport used for MPI point-to-point communications) selection to be made at run-time rather than compile-time. Rather than recompiling LAM 4 times to decide which transport gives best performance for an application, all that is required is a single flag to mpirun.

SMP-Aware Collectives
The use of clusters of SMP machines is a growing trend in the clustering world. With LAM 7.0, many common collective operations are optimized to take advantage of the higher communication speed between processes on the same machine. When using the SMP-aware collectives, performance increases can be seen with little or no changes in user applications. Be sure to read the LAM User's Guide for important information on exploiting the full potential of the SMP-aware collectives.

Integration with PBS
PBS (either
OpenPBS or PBS Pro) provides scheduling services for many of the high performance clusters in service today. By using the PBS-specific boot mechanisms, LAM is able to provide process accounting and job cleanup to MPI applications. As an added bonus to MPI users, lamboot execution time is drastically reduced when compared to rsh/ssh.

Integration with BProc
The
BProc distributed process space provides a single process space for an entire cluster. It also provides a number of mechanisms for starting applications not available on the compute nodes of a cluster. LAM's BProc support supports booting under the BProc environment, even when LAM is not installed on the compute nodes -- LAM will automatically migrate the required support out to the compute nodes. MPI applications still must be available on all compute nodes (although the -s option to mpirun eliminates this requirement).

Globus Enabled
LAM 7.0 includes beta support for execution in the
Globus Grid environment. Be sure to read the release notes in the User's Guide for important restrictions on your Globus environment.

Extensible Component Architecture
LAM 7.0 is the first LAM release to include the System Services Interface (SSI), providing an extensible component architecture for LAM/MPI. Currently, "drop-in" modules are supported for booting the LAM run-time environment, MPI collectives, Checkpoint/Restart, and MPI transport (RPI). Selection of a component is a run-time decision, allowing for user selection of the modules that provide the best performance for a specific application.

Easy Application Debugging
Debugging with LAM/MPI is easy. Support for parallel debuggers such as the
Distributed Data Debugging Tool and the Etnus TotalView parallel debugger allows users straight-forward debugging of even the most complicated MPI applications. LAM is also capable of starting standard single-process debuggers for quick debugging of a subset of processes (see our FAQ). A number of tools, including XMPI are available for communication debugging and analysis.

Interoperable MPI
LAM implements much of the
Interoperable MPI (IMPI) standard, intended to allow an MPI application to execute over multiple MPI implementations. Use of IMPI allows users to obtain the best performance possible, even in a hetergenous environment.