LAM/MPI logo

LAM FAQ: Information about LAM itself

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just the FAQ
Table of contents:
  1. Is there information about how LAM works internally?
  2. Why does LAM use these annoying daemons?
  3. But no other MPI implementation uses daemons...?
  4. Is LAM multi-threaded?
  5. Is LAM thread safe?
  6. Does LAM/MPI provide asynchronous message passing progress?
  7. How does I/O work in LAM?
  8. Can I have access to the LAM code repository?
  9. Is LAM Y2K compliant?

[ Return to FAQ ]


1. Is there information about how LAM works internally?

Yes and no. Starting with v7.0, LAM has transformed into a component architecture -- many of its services are performed by components that are tied together by a back-end framework. While many portions of the LAM run-time environment and the MPI communication layer are part of this framework (and not components in themselves, and therefore are not documented), the component types are all formally documented.

Specifically, API documents for each of the component types are available on the LAM download site. These document how each component works and provides insight into how the overall framework functions. Other than that, and other than other FAQ questions [mainly in this section], there is little additionaly information in the way of formal documentation on how LAM works internally.

Additionally, in the LAM download area is a paper entitled "The XMPI API and trace file format" (filename xmpi_api.ps) that details both how XMPI extracts run-time trace information from LAM, as well as the format of tracefiles that LAM can produce with the -ton command line option to mpirun. The format of the file and the format of the data that XMPI receives is the same; the mechanism for obtaining the two is slightly different.

There are some other papers in the download area that discuss some of LAM's internals, but they are pretty much broad overviews of the techniques that LAM uses.

There are, however, a bunch of manual pages on the Trollius library API calls (recall that LAM/MPI is a layer on top of an underlying message passing system named Trollius) included in the LAM distribution. Look in sections 2 and 3 of the LAM man pages for more information on these calls.

Additionally, the entire source code to LAM is provided in the download distribution. Feel free to "source dive" and investigate how LAM works yourself. This is probably the most reliable way to get information, unfortunately.

The current LAM development tree is also available through anonymous CVS. See the LAM CVS web page for details.

The LAM Team may be able to answer some questions about the LAM implementation, but will more than likely only be able to direct you to relevant parts of the LAM source code (LAM itself is very large -- there are hundreds and hundreds of C and C++ source files) where you can find specific answers.

[ Top of page | Return to FAQ ]


2. Why does LAM use these annoying daemons?

A common complaint about LAM/MPI is that it uses user-level daemons; particularly in batch queue or other automatic execution environments, some consider it inconvenient to launch and kill the LAM run-time environment.

There are many good reasons that LAM/MPI uses daemons, and we are in no hurry to get rid of them. Indeed, all major implementations of MPI now include daemons of one flavor or another (yes, even that other open source MPI implementation!). The fact is that some kind of external agent is necessary for some MPI-2 functionality such as MPI_COMM_SPAWN. LAM/MPI uses the daemons for other kinds of functionality as well, including:

  • Having an external agent that has already satisfied security/authentication requirements allows for fast request execution (e.g., mpirun).
  • A meta-network of daemons does its own monitoring and can guarantee cleanup when a user aborts an MPI job with Ctrl-C and/or MPI_ABORT
  • Third party monitoring tools such as XMPI can tap into the deamon meta-network to provide external monitoring of running jobs.

Additionally, starting with v7.0, the mpiexec command can be used for "one-shot" MPI executions -- it will seamlessly boot the LAM run-time environment, run the MPI process, and then take down the LAM run-time environment. This hides all the background work from the user.

We consider this functionality to be essential for a robust parallel run-time environment, and therefore the gains from this functionality greatly outweighs any inconvenience of starting and stopping the LAM daemons.

Some have asked why LAM/MPI doesn't have a root-level daemon so that users don't have to startup their own daemons with lamboot. If we did that, we'd have to include an authentication mechanism that, once satisfied that an incoming request is a request from a valid MPI user (many people run LAM/MPI on open networks, so this security/authentication is necessary), it would then allow the requested action to be performed (perhaps fork off an MPI program, or a user-specific MPI daemon for general use, or whatever).

However, this is exactly what rshd (or sshd) already is. We don't see the need to duplicate security/authentication functionality in an MPI implementation, particularly when many robust, peer-reviewed solutions already exist. The LAM Team believes that rshd and sshd already supplies this functionality well (root-level authentication and launching user-specific commands). There's no need for us to recreate this functionality and risk showing up on Bugtraq.

In batch queue and other kinds of automatic execution environments, the setup and teardown of the LAM daemons can be automated and hidden from the user(or the user can use mpiexec with the -boot and related options). Batch systems typically have inherent "setup" and "teardown" steps where the lamboot and lamhalt can be hidden if the administrator/user desires. These steps can also be used to guarantee the teardown for failure cases. Solutions for this have (examples of PBS epilogue scripts) been posted on the LAM user list, for example.

[ Top of page | Return to FAQ ]


3. But no other MPI implementation uses daemons...?

Actually, that's not true.

Even "that other freeware MPI implementation" uses daemons now; it's not the default, but daemons are there. All vendor MPI implementations use some kind of daemons as well (most are typically root-level, though, and have internal security/authentication mechanisms).

MPI-2 functionality such as MPI_COMM_SPAWN and MPI_NAME_PUBLISH require external agents. Multi-threaded MPI implementations may be able to avoid some of these issues, but it's just a heckuvalot simpler to have an external agent (i.e., a daemon).

[ Top of page | Return to FAQ ]


4. Is LAM multi-threaded?

No, LAM is not multi-threaded. The message passing engine is single-threaded, both in the stubs that are compiled into user LAM/MPI programs as well as the LAM message passing daemons.

[ Top of page | Return to FAQ ]


5. Is LAM thread safe?

It depends on exactly what you mean by "thread safe".

LAM is "thread safe" in that MPI programs may use multiple threads. It is not safe, however, to have multiple threads simultaneously executing in the MPI library. If user programs utilized multiple threads, they must ensure that only one thread uses LAM at a time. Unpredictable results (read: crash and burn) will occur if multiple threads access LAM simultaneously.

There are plans to make LAM have the ability to allow multiple threads simultaneously executing within the MPI library. This will take quite some time, however.

See the related question in the "Typical Setup of LAM" section of the FAQ about how to use LAM in multi-threaded programs.

[ Top of page | Return to FAQ ]


6. Does LAM/MPI provide asynchronous message passing progress?

Yes and no.

True asynchronous message passing progress depends on the ability to have threads inside of LAM/MPI -- a hidden thread could continue to make progress on sending and receiving, regardless of what the user application is doing. Since LAM/MPI is single threaded, progress on the non-blocking calls such as MPI_ISEND occurs only when the user calls into the MPI library again to check for progress on these calls. Hence, there may be no progress on an MPI_ISEND unless some flavor of MPI_TEST or MPI_WAIT is invoked (or other selected MPI communications functions).

That being said, LAM does use "eager" message sending protocols for "short" messages, where the exact definition of "short message" is different in each RPI -- it is usually messages under a specific length (see the LAM/MPI User's Guide for a description of each of LAM's underlying message passing devices). For example, by default, LAM's TCP RPI will eagerly try to send messages under a 64K (note that this default value is changable at compile- and run-time). That is, LAM will try to send the entire message during a call to MPI_ISEND. Depending on current levels of operating system buffering, the entire message may be sent immediately (or, more specifically, may be copied out of the process space into the TCP stack's internal buffers). Hence, it is possible to get some level of asynchronous progress when using short messages because of eager protocols. Keep in mind, however, that operating system and/or device buffering is finite, so it a lot of short messages are sent (perhaps without corresponding receives on the reciver), it is possible that even eager sends will not be sent immediately (e.g., MPI_ISEND is not able to send the message in the first pass through the progression engine, and progress will only be made at the next call to some flavor of MPI_TEST or MPI_WAIT).

Note that some networks provide some support for independant progress. Myrinet and Infiniband, for example, have communication co-processors -- LAM simply gives a message to be sent to the co-processor and then returns control to the user application. The co-processor can move the message across the network independent of the user application behavior (i.e., "in the background"). However, LAM/ MPI will not recognize that this has happened until MPI_TEST or MPI_WAIT (or other MPI communications function) has been invoked. Additionally, LAM's internal flow control and signaling will not occur until LAM's progression engine is invoked (during MPI_TEST, MPI_WAIT, and other communication functions). So even if an MPI_ISEND is invoked on a network with a local communication co-processor, there may only be limited progress while execution is outside of LAM's progress engine.

Also, it should be noted that the lamd RPI does provide true asynchronous message passing progress -- at a cost. The lamd RPI immediately passes all MPI messages to the local LAM daemon. The messages are passed from the local deamon to the receiving process' daemon (note that if the sending and receiving processes are on the same node, the message will stay in the local daemon) and are "ready for pickup" when the receiver process posts a corresponding receive. Simply put, all messages are sent eagerly to the local LAM daemon, and the daemons provide "in the background" progress for moving messages across the network. Remember that the daemons are a different process, and can therefore make progress on message passing while the user's application is not in the MPI library.

As such, the lamd RPI definitely offers asynchronous message passing, but at the cost of added latency for two extra hops (from the sender process to the local LAM daemon, and from the receiver's LAM daemon to the receiver's process). Even with this additional latency, some applications can greatly benefit from the asynchronicity -- some users have reported on the LAM mailing list seeing large overall speedups of their applications using the lamd RPI.

[ Top of page | Return to FAQ ]


7. How does I/O work in LAM?

In the interests of scalability (and speed starting applications), LAM does not construct a TCP connection from every process back to the user's terminal. Instead, LAM provides scalable remote I/O via the LAM daemon processes and redirection.

Local processes (i.e. those on the node on which mpirun is invoked) inherit stdin, stdout and stderr from mpirun.

Processes on remote nodes have their stdout and stderr redirected to that of mpirun, and stdin is redirected to /dev/null.

[ Top of page | Return to FAQ ]


8. Can I have access to the LAM code repository?

Yes, because the LAM Team tries very hard to release stable and as-bug-free-as-possible distributions, we tend to take a long time between major releases. However, there are many useful new features (and bug fixes) in our internal Subversion repository that some users have asked for access to. Additionally, for those who are actually develop with the internals LAM/MPI, Subversion access gives the most up-to-date versions rather than the periodic tarball access. As such, the LAM Team has decided to provide read-only access to the LAM/MPI Subversion repository.

Be aware, however, that the Subversion checkouts are not guaranteed to be stable. For the most part, we try very hard to not check in things that are broken, but this is an active development tree -- bugs happen. This is actually another major reason that this tree has been made available: peer review. If you find any bugs, please report them! Contributions, suggestions, and comments are welcome. To checkout the Subversion development tree, see the LAM Subversion web page.

[ Top of page | Return to FAQ ]


9. Is LAM Y2K compliant?

LAM is mostly Y2K compliant in the sense that (for the most part) LAM doesn't care what time it is.

However, there are at least 2 MPI functions that report the time/date: MPI_Wtime() and MPI_Wtick(). These functions directly report whatever the underlying operating system tells them is the current date and time. Hence, if the underlying OS is not Y2K compliant, these two functions may report inaccurate information. LAM cannot do anything to fix this.

Also, the LAM tracing mechanism depends on the time (as reported by the underlying operating system). That is, analyzers such as XMPI use the tracing information reported by LAM to visually display communication patterns. If the time that is reported by the underlying OS is incorrect, this trace information will also likely be incorrect.

In short, LAM does not specifically store the year anywhere in its source code. The only times that LAM uses are a conglomerate of the month, day, year, hour, minute, and second, as reported by the underlying OS. LAM specifically uses the tv_sec and tv_usec members of the structures returned by the gettimeofday library call.

Additionally, other OS functions may depend on the system date and time. If the underlying OS is not Y2K compliant and internal time functions get hosed, LAM may behave unpredicably (because the OS is behaving unpredictably). LAM cannot do anything about this as well (chances are that LAM will be the least of your problems at this point, anyway).

Finally, normal disclaimers apply (see the disclaimer statement in the LICENSE file in the LAM distribution).

[ Top of page | Return to FAQ ]