LAM/MPI logo

Performance Issues with LAM/MPI on Linux 2.2.x

  |   Home   |   Download   |   Documentation   |   FAQ   |  
  • Introduction

    Several people have noted on the LAM mailing list some odd performance oddities with LAM running under Linux 2.2.x. In order to determine whether this is a problem with Linux or with LAM, an extensive testing of performance is going on.

  • The Test Programs

    All of the source code for the test programs is available in a gzip'ed tar ball.
    • MPI Test Program

      The Test Program is a simple "ping-pong" program which sends a message back and forth between two nodes, timing how long the round trip takes. The test is repeated for messages of size 1 byte to 8 megabytes. The output is in a format that can be immediately be executed by matlab to produce graphs.

    • TCP Test Program

      In order to test if any oddities can be discovered in the Linux 2.2 TCP implementation, a simple TCP Sockets (sox) Program was written in order to test the send and receive routines used by LAM by simply including the relevant parts out of the LAM source tree.

  • Graphs and such

    Note: If you wish to just view the graphs, you can use the graphomatic.

    • Reference Platforms

      The issue at hand is definitely a Linux issue. Although it could possibly be the fault of LAM, it only occurs on Systems running Linux 2.2.x. The test programs have been run on several non-Linux 2.2.x platforms in order to show this.

      • Solaris

        In order to have data from a non-Linux platform to compare the Linux data against, the pingpong.c program was run on Solaris 2.6 (sparc-sun-solaris2.6) running LAM 6.2b, and on Solaris 2.5.1 (sparc-sun-solaris2.5.1) running LAM 6.3-b2
        • Solaris 2.6, Lam 6.2b
          • -lamd mode

            thumbnail The -lamd mode is the default mode that LAM uses to run programs. Every lam process communicates to the local lamd, which takes care of passing messages to other processes. The graph shows a simple curve: fairly constant time to send a single message until a certain point (10^4 in this case) is reached, at which point sending time grows with message size at an expected rate.

          • -c2c mode

            thumbnail The -c2c mode is the client-to-client mode: the local lamd's are only responsible for starting up the processes, after which they communicate directly with each other, bypassing the lam daemon. This graph shows again the same behavior as is seen with -lamd, except the sending times are a little lower (-c2c mode is generally faster).

        • Solaris 2.5.1, Lam 6.3-b2
          • -lamd mode

            thumbnail Basically the same graph as for Solaris 2.6, except the 2.5.1 machines tested were on 10 Mbits/sec ethernet as opposed to 100 Mbits/sec for the 2.6 boxes, so that we see slower sending times.

          • -c2c mode

            thumbnail Basically the same graph as for Solaris 2.6, except the 2.5.1 machines tested were on 10 Mbits/sec ethernet as opposed to 100 Mbits/sec for the 2.6 boxes, so that we see slower sending times.

      • Linux 2.0.x

        The odd performace problems we've witnessed in LAM only occur in Linux 2.2.x, not in Linux 2.0.x, indicating a potential problem in the Linux 2.2.x tcp/ip implementation. It is also possible that LAM is doing something odd which manifests itself as a performance problem only in Linux 2.2.x. This is what we are trying to determine.

        • MPI Test Program
          The next two graphs show the pingpong test program running on Linux 2.0.36, using LAM 6.3b2. These tests were run on a Dual Processor machine using loopback networking.
          • -lamd mode

            thumbnail The -lamd mode is the default mode that LAM uses to run programs. Every lam process communicates to the local lamd, which takes care of passing messages to other processes. The graph shows a simple curve: fairly constant time to send a single message until a certain point (10^4 in this case) is reached, at which point sending time grows with message size at an expected rate.

          • -c2c mode

            thumbnail The -c2c mode is the client-to-client mode: the local lamd's are only responsible for starting up the processes, after which they communicate directly with each other, bypassing the lam daemon. This graph shows again the same behavior as is seen with -lamd, except the sending times are a little lower (-c2c mode is generally faster).

          This next two graphs show the pingpong test program running on Linux 2.0.36, using LAM 6.3-b2. The test was run between two machines on the same 10Mbit/sec ethernet segment.
          • -lamd mode

            thumbnail The -lamd mode is the default mode that LAM uses to run programs. Every lam process communicates to the local lamd, which takes care of passing messages to other processes. The graph shows a simple curve: fairly constant time to send a single message until a certain point (10^4 in this case) is reached, at which point sending time grows with message size at an expected rate.

          • -c2c mode

            thumbnail The -c2c mode is the client-to-client mode: the local lamd's are only responsible for starting up the processes, after which they communicate directly with each other, bypassing the lam daemon. This graph shows again the same behavior as is seen with -lamd, except the sending times are a little lower (-c2c mode is generally faster).

        • TCP Test Program
          tests still need running...
    • Linux 2.2 -- what is going on?

      • Linux 2.2.9 (and possibly lower)

        Two versions of Linux 2.2.x were tested: 2.2.9 and 2.2.10. It seems that the graphs we see for 2.2.9 and reports we've seen elsewhere suggest that lower 2.2.x versions exhibit similar behavior to 2.2.9.
        • MPI Test Program
          Using our pingpong program we see a large performance drop when using -c2c mode in Linux 2.2.x, x <= 9.
          • loopback, -lamd mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.9, using LAM 6.3-b2, using loopback networking on a dual processor machine. lamd mode is used, and the graph shows what we would expect.

          • loopback, -c2c mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.9, using LAM 6.3-b2, using loopback networking on a dual processor machine. c2c mode is used, and the graph shows a huge performance drop between messages of size 64 K and messages of size 128K.

          • 10 Mbit/sec, -lamd mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.9, using LAM 6.3-b2, using two machines on the same 10 Mbit/sec ethernet segment. lamd mode is used, and the graph shows what we would expect.

          • 10 Mbit/sec, -c2c mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.9, using LAM 6.3-b2, using two machines on the same 10 Mbit/sec ethernet segment. c2c mode is used, and the graph shows a performance drop between messages of size 64 K and messages of size 128K. It is not as noticable as the one in the graph above that uses loopback networking, mainly because there is more network overhead to mask out the drop.

        • TCP Test Program

          The TCP Test Program can be compiled to use either regular read/write function calls, or the readv/writev calls that LAM uses. Both experience an anomaly at 2K bytes.

          • Using readv/writev

            thumbnail Using readv/writev there are two spikes evident at a message sizes of 2K and 16K bytes. The rest of the graph follows a predictable curve, but it takes as long to send a 2K or 16K byte message as it does to send a 32K byte message.


          • Using read/write

            thumbnail Using regular read and write, there are still spikes at messages of size 2K and 16K bytes, but they are not nearly as large as the spikes when using readv/writev.


      • Linux 2.2.10

        The 2.2.10 release of Linux apparently included several fixes and changes to the networking code, including a fix to the TCPNODELAY flag. This seems to have fixed some of the bad behavior we witnessed under 2.2.9 and lower.
        • MPI Test Program
          Using our pingpong program we see that the large performance drop seems to have disappeared in Linux 2.2.10. However, we see that lamd and c2c modes have nearly identical performances.
          • loopback, -lamd mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.10, using LAM 6.3-b2, using loopback networking on a dual processor machine. lamd mode is used, and the graph shows what we would expect.

          • loopback, -c2c mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.10, using LAM 6.3-b2, using loopback networking on a dual processor machine. c2c mode is used, and the graph shows results very similar to the above lamd graph, except for very small message sizes where c2c mode outperforms lamd mode by a small margin.

          • 10 Mbit/sec, -lamd mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.10, using LAM 6.3-b2, using two machines on the same 10 Mbit/sec ethernet segment. lamd mode is used, and the graph shows what we would expect.

          • 10 Mbit/sec, -c2c mode

            thumbnail This graph shows a run of the pingpong program under Linux 2.2.10, using LAM 6.3-b2, using two machines on the same 10 Mbit/sec ethernet segment. c2c mode is used, and the graph shows results very similar to the above lamd graph, except for very small message sizes where c2c mode outperforms lamd mode by a small margin.

        • TCP Test Program

          The TCP Test Program can be compiled to use either regular read/write function calls, or the readv/writev calls that LAM uses. Both experience an anomaly at 2K bytes.

          • Using readv/writev

            thumbnail Using readv/writev a huge spike is evident at a message size of 2K, with a minor spike at message size of 16K. The rest of the graph follows a predictable curve, but it takes as long to send a 2K byte message as it does to send a 256K byte message.


          • Using read/write

            thumbnail Using regular read and write, there are still spikes at messages of size 2K and 16K bytes, but they are not nearly as large as the spike when using readv/writev.


    • Related Links

      These are mainly links to posts to mailing lists about issues that might be related to this.
    • Conclusion

      The 2.2 series of the Linux kernel apparently messed up part of its TCP/IP implementation. This caused performance problems in the LAM/MPI software. Linux 2.2.10 appears to have fixed this problem, since LAM performance in 2.2.10 is comparable to LAM performance under 2.0.36. We have not yet found a way to explain the weird spikes we see in the TCP test program, even in 2.2.10, but since LAM behaves properly under 2.2.10, the LAM Team will probably not try to fix LAM's behaviour on 2.2.x versions of Linux (x < 10).

      The problems experienced by LAM under Linux 2.2.x have also been experienced in other pieces of software, including MPICH, another free MPI implementation.

      If you have further info you can enlighten us with, or can suggest ways in which we can further investigate this phenomena, please contact us.