|
Last updated: Monday, July 18, 2022 at 12:51AM
This page gives a brief overview of what's in Subversion that's not in
the current stable release of LAM/MPI (listed in more-or-less reverse
chronological order):
- Add
gfortran to the list of Fortran compilers searched
in configure.
- Work around apparent alignment issue in OS X 10.5's
CMSG_DATA() macro when compiling on Intel Macs in
64 bit mode, which would cause mpirun to fail with
stdio errors. Also properly initialize msg_flags
in the sfh_send_fd() and sfh_recv_fd() calls.
- Fix a number of places in configure where
main() was
being improperly declared due to Autoconf's square
bracket eating. Thanks to Jeff Squyres for the patch.
- Add
-Wl,-search_paths_first to wrapper LDFLAGS on OS X
to combat issue with OS X linker finding Open MPI's
libmpi.dylib in /usr/lib instead of LAM/MPI's libmpi.a
due to the dynamic-first search policy.
- Fix Fortran interface functions for
MPIL_Trace_on and
MPIL_Trace_off .
- Clean up shared library dependencies to support use of
--as-needed linker option on GNU systems. Thanks to
Justin Bronder for bringing this to our attention.
- Released LAM/MPI 7.1.4
These changes are
available both on the Subversion trunk and the tags/lam-7-1-4
Subversion tag.
- Work around some batch schedulers (BJS, LANL's BProc + MOAB)
from killing the lamds when lamboot exits by keeping a child
of lamboot around for the life of the lamds.
- Properly escape SSI parameters and pass SSI parameters to
lamboot when using mpiexec. Also use
/tmp or $TMPDIR for
the app schema. Thanks to Sam Steingold for bringing this
to our attention.
- Allow user to disable building the TM or SLURM boot ssi
module, even if the libraries are available on the system.
Thanks to Jens Klostermann for bringing this to our
attention.
- Fix compile issue on NetBSD 3.0 and later. Thanks to Aleksey
Cheusov for the patch.
- Properly handle slurm clusters where all nodes do not have the
same prefix in a hostname. Thanks to Moe Jette for the patch.
- Released LAM/MPI 7.1.3
These changes are
available both on the Subversion trunk and the tags/lam-7-1-3
Subversion tag.
- A number of man page cleanups suggested by Eric Raymond.
- Search for tkill, in addition to the default install
location and /bin. Also, do not segfault if tkill
is not found after searching all these locations. Thanks to
Josh Lehan and Jeff Squyres for the patch.
- Abort rather than hang if lamboot is unable to get the list of
local network devices.
- Fix for hangs in 64 bit nuilds on Mac OS X systems (Intel and PowerPC).
- Correct check for localhost in hostfiles during lamboot to check
for 127.0.0.0/8 instead of 127.0.0.1/32, to meet RFC 1700. Thanks
to Martin Knoblauch for the patch.
- Added support for Fortran types
MPI_REAL{4,8,16} for predefined
reduction operations supported by floating point types (MPI_MAX ,
MPI_MIN , MPI_SUM , MPI_PROD ).
- Fixed error in IB rpi that could cause compiler errors with some
compilers. Thanks to Jens Klostermann for bringing this to our
attention.
- Fixed error with
MPI_COMM_ACCEPT on Fedora 4 that would cause a
"bad address error". Thanks to Orion Poplawski for the fix.
- Renamed internal
strtonum function lam_strtonum to avoid clashes
with a function of the same name in FreeBSD.
- Fixed installation issue on Cygwin when trying to make symlinks to
executables (such as
lamwipe -> wipe ).
- Fix bug with comments in hostfiles where a comment in the middle of
a line would cause the entire line to be ignored. Thanks to
Christian Siebert for bringing this to our attention.
- Build the totalview queue debugging shared object as a dynamically
loaded shared object instead of a shared library. Fixes an issue
on Mac OS X where the TotalView library could not be found.
- Cleanup restart logic in 'cr self' module. Add a bunch of documentation
regarding this module to the man page, and user docs. Thanks to Jeff
Squyres for helping in this effort.
- Released LAM/MPI 7.1.2
These changes are
available both on the Subversion trunk and the tags/lam-7-1-2
Subversion tag.
- Fix MPI_COMM_SPAWN problem with app schema info keys.
- Fix assembly problem for AIX in 64 bit mode with usysv.
- Fix bad cast in C++ bindings with
MPI::Win::Create() that was
causing an invalid MPI_Comm to be passed to the underlying
MPI_Win_create() call.
- Woarkaround for newer MVAPI implementations that call
free() in
VAPI_deregister_mr() , which was causing hangs in certain situations.
As a result, sbrk() is not called with a negative value, so for
a very small number of applications, memory usage might be slightly
higher.
- Fixed references to
cr_base_dir in user docs -- the SSI parameters
is actually cr_blcr_base_dir .
- Fixed error that resulted in
wipe not properly using the session
directory prefix/suffix options.
- Fix two errors in ptmalloc2 code. The "TSD hack" was not
properly enabled, causing an infinite loop leading to a segfault.
Also, we were not properly intercepting
munmap() .
- Fix a bug deep within MPI_INIT that prevented using IMPI.
- Updated to GNU Libtool 1.5.22.
- Add another
setsid() in hboot to facilite working in SGE
environments.
- Fixed a command-line parsing problem with
mpiexec .
- Renable a few virtual destructors in the C++ bindings.
- Updated to GNU Automake 1.9.6.
- Don't add external declarations for the
PMPI_W{TICK,TIME} functions
if profiling isn't enabled. It appers that some compilers (g95 )
will try to resolve the symbols if they are prototyped.
- Added work around for Apple's mis-interpretation of the use of
semctl 's 4th argument, as specified in IEEE Std 1003.1, 2004 Edition.
Correct reading would say that the 4th argument when cmd is SETVAL
should be the specified union, with the val having meaning. Apple
interpreted it to mean the 4th argument should be an integer. There
is significant difference on big-endian LP64 machines. Note that
every other 64 bit big endian Unix (including Linux, Solaris, AIX,
and IRIX) take the first interpretation.
- Added support for Fortran types
MPI_INTEGER{1,2,4,8} for predefined
reduction operations supported by Fortran integer types (MPI_MAX ,
MPI_MIN , MPI_SUM , MPI_PROD , MPI_BAND , MPI_BOR , MPI_BXOR ).
- Fix problem where singletons and jobs launched via mpirun could not
MPI_COMM_CONNECT / MPI_COMM_ACCEPT each other.
- Fix silly
putenv() mistake in hboot.c .
- Work around bug in
net/if.h header file on OS X 10.4 in 64bit mode
that was preventing ioctl(..., SIOCGIFCONF, ...) from working.
- Fixed corner case in the
rsh boot SSI module where we were invoking
.profile on the remote side for Bash shells, when it really wasn't
necessary (because bash will invoke .bashrc automatically).
- Properly handle
MPI_*_NULL for the MPI_*_c2f functions.
- Added missing
MPI_ROOT definition to mpif.h.in .
- Always populate
mpirun 's MPIR_proctable structure so that
parallel debuggers can find all processes in the job. Previously,
the table was only populated if mpirun was also starting a parallel
debugger daemon on all the nodes (ie, -tv was given as an option to
mpirun ).
- Added access to the Fortran datatypes from the C
mpi.h .
- Change default behavior of
lamhalt to wait until all lamd s
are dead before returning. Add -i (immediate) option that
replicates older (deprecated) behavior -- lamhalt returns
immediately, most likely before the universe is completely halted.
- Added some missing man pages to LAM tarballs (
lamnodes ,
lamhalt ).
- Make the lack of a
PATH environment variable in hboot not be an
error.
- Added
and to various configure tests that link
Fortran executables so that systems with icky Fortran installations
can add additional linker flags / libraries. Documented that OS X
Tiger (10.4) users should probably add LIBS=-lSystemStubs to their
configure line because gfortran doesn't do it automatically, and not
having it will cause several of our tests to incorrectly fail due
to missing symbols.
-
MPI_GET_VERSION checked to see if MPI_INIT had already been called,
which is erroneous (MPI-2, 3.1).
- Somehow the processing for
lamboot 's -b option was removed; fixed.
- Per advice from the ROMIO maintainers, remove some NFS locking tests
from
romio/configure .
- Fixed a problem with the upper and lower bounds when creating DARRAY
datatypes.
- Fixed number of prefix 0's generated in SLURM host lists.
- Fixed
MPI_ALLTOALLW fortran wrapper.
- Fixed a potential infinite loop in
tkill if some system calls
returned bogus values.
- Make BSD systems have a default of "none" for the memory
manager.
- The SLURM boot SSI module help messages all accidentally had the
wrong filename, so if anything ever went wrong, no help message
would be printed. Fixed.
- Workaround for a bug in some versions of gcc that masked the
debugging definition of
struct _proc , which caused problems when
using the TotalView debugger.
- Changed LAM Basic
MPI_Bcast binomial tree algorithm to complete
send to one process before starting the next send, resulting in
much better performance in some situations.
- Fixed bug in
smp collective module that would cause corrupted
collective operation if multiple communicators with different
sizes were created.
- Fixed bug that allowed multi-word
values to accidentally
propagate to incorrect places (like WRAPPER_EXTRA_LDFLAGS ).
- Fix ROMIO test for Fotran linking convention on OS X by using
nm instead of strings on that platform
- Update to the Totalview docs for TV v6.6.
- Fixed problem with zero padding in Slurm host list parsing.
- Add support for BProc implementations without
bproc_vexecmove
support.
- Fixed bad egrepping in configure to snarf
LDFLAGS and LIBS from
generated Makefiles.
- Fixed errors in pthread tests that could result in incorrect
flags being set for threading when f77 is used for linking. Also
fixed an error where the Linux pthreads test could give a harmless
false positive.
- Converted IMPI coll module to coll API v1.1.0.
- Fixed verbosity in coll selection to print the module that was
selected, not the last module that was examined.
- Fix problem with the
usysv RPI on the Apple G5 platform.
The G5 can reorder writes to improve memory performance, which was
causing failures in the synchronization routines. Added sync
instruction to force data / lock writes to be ordered.
- Expanded
ib RPI configure tests to look for vapi.h and the
VAPI libraries in odd places for some IB implementations.
- Fix some forgotten / bit-rot compile errors in the
impi coll
module.
- Set a number of LAM daemon sockets to be close on exec to
eliminate wasted file descriptors in clients
- Reordered
tkill shutdown to better support platforms with
/tmp in NFS.
- Patch
libtool to recognize Portland C compilers so that snarfing
flags from libtool does the right thing in the MPI wrapper
compilers.
- Fixed compile problem with recent
gcc versions (missing
#include in a really old source file).
- Fix a problem with some ancient F77 compilers and remove all
single and double quotes from
mpif.h .
- Fix a problem inadvertantly caused by bug 682: instead of trying to
rectify
crmpi modules that are sent by MPI processes to the spawning
agent, simply disallow MPI_COMM_SPAWN'ed processes from being
checkpointable.
- LAM no longer examines the
(argc, argv) that comes in from
MPI_INIT because it can cause problems in some scenarios.
- Fix an uninitialized variable that can cause seg faults in the
rsh
boot SSI.
- Re-enable stdin for rank 0; this was accidentally disabled in 7.1.
- Add a configure test for
inet_ntop() in the slurm boot module so
that environments that do not have that function (e.g., Cygwin) will
not attempt to compile the slurm module.
- Fix wrapper compiler
LDFLAGS and LIBS .
- Updated User Guide to clarify the
ib RPI module scalability
restrictions.
- Re-enable a virtual destructor for
MPI::Comm_null .
- Escape linker flags added to the wrapper compilers'
LDFLAGS to
support the OS X malloc intercept code with -Wl, . The XL compilers
were getting confused by the -u _lam_darwin_malloc_linker_hack
option when it was passed to them.
- Add
zsh to the list of shells that do not have the .profile script
explicitly run for lamboot . This list includes csh -derived shells
and bash , as both have a set of scripts run for non-interactive
logins.
- Fix missing space in test for the existance of a
.profile script
when using an sh -derived shell.
- Released LAM/MPI 7.1.1
These changes are
available both on the Subversion trunk and the tags/lam-7-1-1
Subversion tag.
- Upgraded to Libtool v1.5.8.
- Added
rpi_ib_mtu SSI param (see User Guide for more info).
- Fixed minor problem with
ib RPI startup code that prevented it from
working on some vendor IB stacks.
- Fixed problem with
--export-dynamic showed up in the wrapper
compiler underlying commands.
- Don't emit warning on
stderr and abort if we get a
permission denied when killing a process with tkill . If the lamd
dies uncleanly, it is possible for another process (possibly with
another user) to end up with that lamd 's pid which will cause
tkill to have problems later (if the pid is another users).
- Released LAM/MPI 7.1
These changes
are available both at the Subversion trunk and the tags/lam-7-1
Subversion tag.
- Add the --with-memory-manager=external flag that allows LAM
to be configured to allow external triggering of its sbrk()
interception code. See the LAM/MPI Installation Guide release notes
on Myrinet and Infiniband for more details.
- Added first version of Infiniband RPI module (
ib ).
- Fix a problem where
$includedir/lam_config.h may end up with
permissions affected by the installer's default umask instead of
being set to 0644.
- Added preliminary support for the upcoming BProc 4.0 release.
- Added ability for
mpirun to start applications that have execute but
not read permissions. Only works if the -s option is not given to
mpirun . Also
fixed path searching problem when ./test was specified as command to
mpirun.
-
mpirun is now better about returning non-zero in the cases where
the launched job aborts before calling MPI_INIT.
- Add support for ptmalloc2 and Apple Darwin/OS X memory managers when
catching deallocations for unpinning user memory.
- Added possibility of using
IMPI_HOST_NAME environment
variable for external name publishing.
- Added support for optional MPI datatypes
MPI_INTEGER1 ,
MPI_INTEGER2 , MPI_INTEGER4 , MPI_REAL4 , and MPI_REAL8 . Added
support for non-existant MPI datatypes (!) MPI_INTEGER8 ,
MPI_REAL16 .
- Added
boot_rsh_ignore_stderr SSI parameter for users too
lazy to fix their "dot" files. :-)
- Added SLURM boot SSI module.
- Added support for run-time dynamically loaded SSI modules. A
LAM installation can therefore be extended by simply adding a shared
library SSI module into a specific directory.
- Various gm RPI fixes:
- Added
--with-rpi-gm-lib option to specify a non-default location
for the GM library.
- Fix for incorrectly handling when gm dropped packets.
- Performance improvements in the
gm RPI; no more "short"
message protocol -- only "tiny" and "long".
- Added "fast" support for the gm rpi module, although it's
unreliable for communication-intense applications (and therefore
disabled by default).
- Support for building the gm rpi module dynamically.
- The gm RPI module now supports checkpoint/restart (must set the
rpi_gm_cr SSI parameter to 1).
- Enable experimental use of the gm 2.x
gm_get() function for long
messages when explicitly asked for with the --with-rpi-gm-get
configure switch.
- Added smp-aware collective algorithms for the following MPI
functions:
MPI_ALLGATHER , MPI_ALLGATHERV , MPI_REDUCE_SCATTER ,
MPI_SCAN
- Added new MPI functions:
MPI_EXSCAN and MPI_ALLTOALLW .
- Added
mpi_hostmap SSI parameter to transform the IP addresses
supplied by the LAM run-time environment to an alternate set of
addresses that will be used for MPI communications.
- Added option "
-prefix " in lamboot
and lamwipe to allow users to switch between LAM installations
without having to modify their local environments.
- Added
prefix parameter for the rsh boot module boot schema
files to allow users to specify different LAM installation paths on
different nodes.
- Added a new
MPI_COMM_SPAWN info key
(lam_no_root_node_schedule ) to disallow processes to be spawned on
the root node.
- Wrapper compilers now do not add any additional flags unless
there is at least one argv that does not begin with "
- " (e.g.,
"mpicc -v " will not add any additional LAM/MPI-specific
flags).
- Added
options:cxx_exceptions output in laminfo to indicate whether
LAM was configured --with-cxx-exceptions or not.
- Added
-param option to laminfo to display available SSI parameters
and their default values.
- Added
-showme:compile and -showme:link flags to the wrapper
compilers for printing out the compiler and linker flags,
respectively. For example "cc foo.c `mpicc -showme:compile` " and
"cc foo.o `mpicc -showme:link` -o foo ".
- Performance improvements in the gm RPI; no more "short" message
protocol -- only "tiny" and "long".
- Renamed "
wipe " command to "lamwipe " per request from the
Mandrake Cooker team. The name "wipe " is now deprecated, and will
be removed in some future release.
- LAM/MPI 7.0.7 (unreleased; all included in
7.1)
These changes are available both at the
Subversion trunk and the branches/branch-7-0 Subversion
branch.
- Removed the reset of the
MAKE macro in romio/Makefile.in that
disallowed using a make other that what is found at configure time.
- Fixed some missing
header files that caused unresolved
symbols on some platforms.
- Added possitiblity of
--without-exflags to force
not using any special C++ exception compiler flags.
- Fix man page sections.
- Only execute
.profile if it exists in the rsh module.
- Released LAM/MPI 7.0.6
These changes
are available both at the Subversion trunk and the tags/lam-7-0-6
Subversion tag.
- Fixed error in lamnet code used to find available interfaces when
we don't pre-allocate enough space.
- Fixed ordering of
LAM_SESSION_SUFFIX and batch system ID
evaluation when determining the session directory suffix.
- Released LAM/MPI 7.0.5
These changes
are available both at the Subversion trunk and the tags/lam-7-0-5
Subversion tag.
- Fix an obscure race condition that could occur if running in a LAM
universe with more than 255 nodes.
- Make
getorigin() and getnodeid() return proper value in
_kio , if they are called before kenter() , based on the pids.
- Fix the calculation of upper and lower bound of datatype which
is used for the calculation of extent. The fix handles the cases
where the block size is 0 and it is the first or the last block of
the datatype.
- Fix the value of
MPI_ERRCODES_IGNORE to be a (int *) 0
instead of (void *) 0 .
- Add fix to set TCP socket buffer size to run-time value of
ssi_rpi_tcp_short / ssi_rpi_crtcp_short in all rpi modules, as
relevant.
- Change the
lam-helpfile to correct the error in lamboot
synopsis. Add -s and delineate the options -bdhHlsvVx . Also
correct all those cases where all options were lumped together.
- Fix minor prototype problem with
lam_ksignal() .
- Make network interface code allow for arbitrary numbers of
interfaces on the localhost.
- Add dependant libraries for the PBS TM library on Solaris.
- Fixes to the SGE detection logic for the session directory.
- Updates to documentation about Globus module.
- Released LAM/MPI 7.0.4
These changes are
available both at the Subversion trunk and the tags/lam-7-0-4
Subversion tag.
- Update docs to reflect true behavior of
LAM_MPI_SESSION_PREFIX .
- Do not propagate
LAM_MPI_SESSION_PREFIX via mpirun .
- Fixed
crtcp rpi deadlock handling for deferred writes during a
checkpoint in the presence of other blocking reads.
- Better fix for Libtool 1.5 broken
icc -c /-o test; patch the
generated configure script to remove the bad commands.
- Fixed minor typo in
blcr cr module configure scripts.
- Released LAM/MPI 7.0.3
These changes are
available both at the Subversion trunk and the tags/lam-7-0-3
Subversion tag.
- Minor fixes with bad
printf() formats in the kenyad and
blcr /crlam .
- Workaround for
libtool 1.5 bug with the Intel compiler (libtool
didn't think that icc supported -c and -o at the same time).
- Changed
LAM_CONFIGURE_* macros from -D command line
arguments to #define 's to prevent problems with some compilers
that don't like -D values with embedded spaces.
- Changed search order for Fortran compilers to look for GNU
g77 before f77 so that the default matches the defaults for the
C and C++ compilers.
- Removed
LAM_NEED_SYS_SELECT_H , instead including
sys/select.h any time it is available.
- Updated SYS V semaphore and shmem tests to check for
functionality. Adds
-lrt (Solaris) and -lcygipc (Cygwin) if
needed.
- Added configure switch
--with-fd-setsize to increase the
size of an FD_SET and increase the soft per-process file
descriptor limit on platforms that support such things. This should
allow larger TCP LAM jobs on. Be sure to read the release notes for
your platform before using this option.
- Released LAM/MPI 7.0.2
These changes are
available both at the Subversion trunk and the tags/lam-7-0-2
Subversion tag.
- Fixed a problem in LAM's distribution scripts that accidentally left
out the
gm RPI from the 7.0.1 tarballs.
- Released LAM/MPI 7.0.1
These changes are
available both at the Subversion trunk and the tags/lam-7-0-1
Subversion tag.
- Removed legacy function
panic() because it conflicts with a function
in OS X's system headers with the same name.
- Fixed a problem with the
sbrk() declaration in ptmalloc.c and the
Portland C compiler.
- Fixed a problem with the
boot_rsh_agent SSI parameter not being
recognized properly.
- Fixed a problem with mpirun's default running with tracing enabled.
Tracing is now only enabled if
-t , -ton , or -toff is specified on the
mpirun command line (see mpirun(1) for more information).
- Fixed a memory leak when freeing a datatype created by
MPI_Type_create_hindexed .
- Fixed a minor problem with the
cr_base_dir SSI parameter.
- Fixed a couple of problems with duplicate symbols on OS X when
using the Fortran bindings.
- Fixed thread configure tests to test a much wider variety of thread
compiler and linker flags.
- Ensure that relevant compiler and linker flags are propgated properly
to SSI configure scripts so that we compile all of LAM with the same
flags.
- Added support for GM-2.x in the
gm rpi module.
- Removed errant "-" typo in
MPI_Intercomm_merge .
- Minor #include fixes for FreeBSD 4.x.
- Made the tests for
getsockopt() and recvfrom() more robust.
- Fixed a problem with opening unix sockets with really long filenames
(e.g., in PBS Pro environments).
- Add
--with-romio-libs=LIBS to allow passing of arbitrary
LDFLAGS /LIBS args down to the environment of ROMIO's configure
script and also into the wrapper compilers. e.g., when building
ROMIO with PVFS support, "-lpvfs" needs to be added in both places.
- Released LAM/MPI 7.0
These changes are
available both at the Subversion trunk and the tags/lam-7-0
Subversion tag.
- Allow the internal "name" (
argv\0\ to underlying MPI_Init ) for
FORTRAN programs to be overridden by the environment variable
LAM_MPI_PROCESS_NAME .
- Fixed file descriptor leak for non-MPI processes (and MPI procs
that did not exit properly) in the lamd.
- Added
mpiexec for portable MPI process startup (described in
MPI 2 standard). mpiexec also has support for "one shot"
lamboot , mpirun and lamhalt .
- Restore umask to original value when launching application
from the
lamd , as the lamd runs with a umask of 077.
- Updated ROMIO to v1.2.5.1. Revamped ROMIO configure/build
process to be better integrated with LAM.
-
bproc boot SSI support added; can now lamboot on bproc
clusters (still launches a lamd on every node). Added bonus that
"mpirun C|N foo " will, by default, not run on the bproc head
node.
-
lamnodes now reports per-node flags, such as "origin ",
"this_node ", and "no_schedule ".
- Re-activated long-unused feature in LAM to not schedule MPI
and serial processes on selected nodes. For example, you can
lamboot on a head node and some compute nodes and have "mpirun C
foo " only run on the compute nodes.
- Added new
laminfo command to get detailed information about
LAM's configuration, including available SSI modules and their
various version numbers.
- Added support for attaching TotalView debugger to MPI
processes launched by mpirun, including support for the
partial-attach feature provided by TotalView. Also include support
for examining messages queues.
- MPI collectives have been SSI-ized. The LAM collective
algorithms have been moved into a module named
lam_basic . See
lamssi_coll(7) .
- Increased the number of MPI tags and communicator contexts
available in all RPIs where this was possible (i.e., everything
except
lamd ). MPI jobs that do not use the lamd RPI will now
automatically get use of more MPI tags and simultaneous
communicators. Additionally, increased the efficiency of the
communicator context ID allocation algorithm (at the expense of
communication efficiency during communicator construction).
- Try to use
-pthread when compiling with POSIX threads and
GNU compilers, since many Linux / BSD-flavored distributions include
this flag in the local configurations. Failing that, fall back to
-D_REENTRANT and -lpthread .
- When the LAM daemon is killed by
SIGTERM , it will gracefully
kill all of its sub-processes, release all of its resources, and die
nicely (as opposed to just dying).
- LAM will use the
$TMPDIR environment variable to determine
where to create temporary files.
- Added "promiscuous" and "expected" modes for base SSI boot
protocols, where connections are accepted from any IP address or
only from the IP addresses listed in the boot schema,
respectively.
- The back-end process for
lamboot (and friends) have been
SSI-ized with the "boot" SSI kind. See lamssi_boot(7) .
Currently have two boot modules available: rsh (which also does
ssh ) and tm (for PBS).
- Added the MPI-2 C++ bindings implementation for
MPI::Win .
- Added
--with-wrapper-extra-ldflags option to configure that
parses the output of libtool to get the extra compiler/linker
flags and put them into the wrapper compilers (e.g., shared library
run-time search path).
- The
memcpy() in glibc performs poorly if the copy size is
not divisable by 4. Added a workaround to significantly increase
LAM's shmem RPIs and unexpected message buffering performance in
these cases, as well as command line configure switches to
enable/disable this behavior (--with-prefix-memcpy and
--without-prefix-memcpy ).
- Changed the bit mapping in error codes that are used in MPI
because the field specifying the MPI function was only 8 bits, yet
there are 300+ functions in MPI. This unfortunately changes the bit
mapping of the errorcode argument in
MPI_ABORT ; see the
MPI_Abort(3) man page for more information.
- Added functionality per MPI-2:4.8 -- attributes added to
MPI_COMM_SELF will be deleted as nearly the first thing in
MPI_FINALIZE , effectively allowing user-specified functions during
MPI_FINALIZE .
- Updated BSD4.4 file descriptor passing to fit expected use.
- Removed
MPIL_Spawn (LAM-specific, pre-MPI-2 spawn call).
- MPI thread support now
MPI_THREAD_SERIALIZED . We don't
enforce any distinction between FUNNELED or SERIALIZED , so it is
possible to write a threaded application that runs fine on LAM but
causes issues on other platforms.
- Add checks for if running under an LSF job, and automatically set
the socket suffix to be the LSF job ID (a la how PBS jobs are already
handled).
- Print out friendly error message from wrapper compilers if underlying
compiler isn't found.
- Update for MPI 2.1 errata:
MPI_GET_COUNT behavior with respect to 0
byte datatypes now returns 0 (vs. MPI_UNDEFINED ) when 0 data bytes
have been transferred.
- There are now lots of run-time tunable parameters for the various
RPIs. See the
lamssi_rpi(7) man page for a list of the tunable
parameters that can be passed in to each RPI.
- The first System Services Interface (SSI) kind has been added -- the
RPI layers have been converted to SSI. Now all available RPI's are
compiled in simultaneously and you can choose which to use at
run-time. See the
mpirun(1) , lamssi(7) , and lamssi_rpi(7) man
pages.
- Fixed a problem where the IMPI client was not properly endianizing
IMPI_CMD_FINI before sending it to the IMPI server.
- Fixed a problem where if
$prefix is /usr , hf77 would
complain that it could not find the ROMIO and MPI-2 C++ libraries.
This isn't too important for 6.6.x since we've totally re-written
the wrapper compilers, but we record the bug fix anyway.
- Completely rewrote the Myri/gm RPI. It's smaller, faster, and
generally mo' better.
- Only install lam-bhost.def if one does not exist in
$(sysconfdir) .
- Renamed
lam-conf.lam and lam-conf.otb to lam-conf.lamd and
lam-conf.separate to make the meanings more obvious and less
confusing with the corresponding lam-bhost.* files. Renamed
lam-conf.lam to be lam-conf.example to make its purpose more
obvious, and no longer install it under $(sysconfdir) .
- Added the MPI-2 C++ bindings implementation for
MPI::Info .
- Fixed a problem in the main lamd kernel on NetBSD where
select() will zero out fd_sets even on "accepted" failures.
- Fixed minor issue with
show_help() that could cause problems
for help messages with large numbers of arguments.
- Added the "C++ only" datatypes specified in the MPI-2
standard:
MPI_BOOL , MPI_COMPLEX , MPI_DOUBLE_COMPLEX , and
MPI_LONG_DOUBLE_COMPLEX , as well as the built-in operands
specified in the standard. Note that the complex types will *only*
work if the implementation of complex<float> allows casting to
struct { float r ; float i; } ; (and likewise for double and
long double . This seems to be the case everywhere we have
seen.
- Fixed a couple small problems that prevented running the lamd
as a group of processes. Moved
-b from $inet_topo to
$socket_suffix in the lam-conf files.
- Add version checking into the LAM commands and
MPI_INIT . If a user
attempts to run a LAM or MPI program that does not match the version
of the lamd that is running, a warning message will be displayed and
the program will bail.
- Changed name of binary for C++ compiler to
mpic++ . On most systems,
there will be a symlink from mpiCC -> mpic++ . On systems without
a case sensitive file system (like HFS+ on Mac OS X), this symlink
will not be created, as it conflicts with mpicc .
- Removed linking to the C++ bindings when using
mpicc and
mpif77 because this creates a problem when using gcc 3.0, and it
doesn't make sense anyway.
- Able to finally remove the
automake_bogosity.(c|h) files
and extra noinst_HEADERS /noinst_PROGRAMS rules from various
directories/Makefile.am 's.
- Add
-nn and -np options to lamboot , recon , and wipe
to prevent adding "-n " to the remote agent command line and to
prevent the execution of $HOME/.profile on the remote side, even
if the remote shell is Bourne.
- In addition to the syslog, send
lamd debugging output to the
lam-debug-log.txt file in the LAM session directory. This is
particularly helpful since many Linux distributions do not allow
normal users to view the syslog.
- Related to the note below (LAM session directory located on a
networked filesystem), add a workaround in the
flatd when
attempting to open a new flatd temp file in the LAM session
directory with O_APPEND . If the first attempt to open a new file
fails, try again without O_APPEND .
- Fix the
lamd kernel to set the kill file to be
close-on-exec so that it is not inherited by child processes (this
can cause a problem during lamhalt if the LAM session directory is
on NFS -- tkill will inherit the open file descriptor and then
remove the file. NFS will then created a ".nfsXXXXX " cache file
entry, which will prevent the removal of the directory).
- Change LAM's registry to not depend on the
O_EXCL flag to
open() . Use an alternative locking mechanism if it is determined
(at run time) that O_EXCL will not work in the LAM session
directory. This can happen when the LAM session directory is on a
networked filesystem.
- Pass "
-d " to tkill during lamboot (through hboot ) if
lamboot was invoked with "-d ".
- Some fixes to the
gm RPI, particularly with respect to
allocating and freeing memory.
- Add specific error message for the case where the
gm RPI is
unable allocate a gm port. This is much more helpful than an
amorphous "something went wrong during MPI_INIT" message.
- Robust-ized
lamhalt such that it will timeout (after 15
seconds) if it doesn't receive all the HALT ACKs back that it thinks
that it should receive -- and prints out an appropriate error
message indicating which nodes it didn't get ACKs from.
- Various minor improvements in the build system.
- Integrated the C++ bindings into the configure/build system
better.
- Revamped the configure system for future extensibility. Updated
build system to use Autoconf 2.52, Automake 1.5, and Libtool 1.4.2
(or higher).
- In
share/etc/kill.c , kill off LAM directory with rmdir() ,
not remove() - it appears that MacOS X will not allow remove() to
be called on a directory.
- Removed use of
.so nroff "include" directive in man
pages; it didn't work on all platforms. Also updated some text in
the mpicc and mpif77 man pages.
- Added a specific check to ensure
MPI_INIT is not called
after MPI_FINALIZE . This is a special case of the check that no
MPI function was called after MPI_FINALIZE , as new users tend not
to realize that you can't re-INIT a process.
- Released LAM 6.6b1
- Removed the LAM-version-checking code from the
mpi2c++
bindings; they're really not necessary since we're inside LAM
anyway.
- Fixed ambiguity of
RTF_KENYA flag being used for two
purposes (forked from the kenyad and attached to the kenyad ),
and split it into RTF_KEYNA_CHILD and RTF_KEYNA_ATTACH .
- Changed the behavior of the
--with-rsh option in configure.
Now, rather than always putting the full path in lam_config.h , it
only adds the full path when an absolute or relative path was given
(as opposed to just a binary name).
- First public release of Myrinet/
gm support in the gm
RPI.
- Fixed a problem where two different flags had the accidentally
same value on a request, which lead to truncation errors in
one-sided communications in
lamd mode when the daemons were
compiled separately.
- Added better support to
mpirun and the kenyad to catch
when an MPI process dies without first detaching (i.e., calling
MPI_FINALIZE ).
- Re-added hooks to create/remove the "
impirun " sym link in
$(bindir) during "make install "/"make uninstall ". These
were lost when we converted to an automake -style build.
- Fixed a bug in
dlo_inet in fault tolerant mode. On some
OSes, recvfrom() can return ECONNREFUSED , which should not cause
an abort in FT mode.
- Fixed a problem when a process sends a LAM signal to itself
via
kdoom() ; the signal handler would erroneously get triggered
twice.
- Added the
MPI_Info key "lam_spawn_sched_round_robin " on
MPI_COMM_SPAWN to allow finer-grained control on the placement of
spawned MPI processes without the need to write an app schema to a
temporary file (and allows functionality that you can't really do
with an app schema, anyway). See MPI_Comm_spawn (1) for more
information on this key.
- Renamed the
MPI_Info key name on MPI_COMM_SPAWN "file "
to "lam_spawn_file ". Since it is a LAM-specific key, it should
have a LAM-specific name. While the "file " key still exists for
backwards compatability, its use is deprecated.
- Added two predefined attributes on
MPI_COMM_WORLD :
LAM_UNIVERSE_NCPUS and LAM_UNIVERSE_NNODES . They return the
number of CPUs in the current LAM universe and the number of nodes
in the current LAM universe (respectively). Note that these values
can be larger than their corresponding counts from the application's
MPI_COMM_WORLD .
- Increase the default optimization flags in configure to be
-O3 for gcc /g++ , -O for all other compilers.
- Moved the handling of signals in user code from signal
handlers installed by
MPI_INIT to the lamd and mpirun . That
is, the lamd will now detect that a process died due to a signal
and send back that information to mpirun . mpirun will print out
the appropriate error messages. This has the side effect of
allowing the OS default signal handlers to be used in user programs
rather than the LAM singal handlers. In at least some cases, this
is a good thing -- some users want core dumps, for example. Two new
options have been added to mpirun -- "-sigs " and "-nsigs ",
to enable / disable the LAM signal handlers from MPI_INIT .
"-nsigs " is now the default, since the lamd /mpirun make
these signal handlers redundant. However, "-sigs " will enable
the old behavior for backwards compatibility.
- Fixed a bunch of potential signed / unsigned comparison
problems. This was a real bug in at least one case, which could
effectively result in garbage being sent to the
lamd , which would
cause the lamd to eventually die.
- Fixed up
lamnodes to print more intelligible error messages
when you specify an illegal node/CPU.
- Ensure that the directory where the
lamd named socket lives
is not left around if you invoke a LAM command when there is no
lamd running. Moved the function lam_rmdir() from tkill.c to
share/etc/kill.c , and renamed it to be lam_rmsocknamedir() , and
ensured that it is called when kinit() fails because there is no
lamd .
- Made LAM's signal handler a bit smarter by checking to see if
it is already in ths signal handler. e.g., if a callback function
has been registered via
atexit() /onexit() and causes a seg fault
after the signal handler has been triggered the first time, this can
cause a loop of seg faults which is quite difficult to kill. LAM's
signal handler will now detect this situation and gracefully
abort() .
|
|
|