How to use MPI
Note
With this tutorial, our goal is to give insights on how to set up eHive to run Jobs using Shared Memory Parallelism (threads) and Distributed Memory Parallelism (MPI).
First of all, your institution / compute-farm provider may have documentation on this topic. Please refer to them for implementation details (intranet-only links: EBI, Sanger Institute)
You can find real examples in the ensembl-compara repository. It ships Runnables used for phylogenetic trees inference: RAxML and ExaML. They look very light-weight (only command-line definitions) because most of the logic is in the base class (GenericRunnable), but nevertheless show the command lines used and the parametrisation of multi-core and MPI runs.
How to setup a module using Distributed Memory Parallelism (MPI)
This case requires a bit more attention, so please be very careful in including / loading the right libraries / modules. The instructions below may not apply to your system. In doubt, contact your systems administrators.
Tips for compiling for MPI
MPI usually comes in two implementations: OpenMPI and MPICH. A common source of problems is to compile the code with one MPI implementation and try to run it with another. You must compile and run your code with the same MPI implementation. This can be easily taken care by properly setting up your .bashrc.
If you have access to Intel compilers, we strongly recommend you to try compiling your code with it and checking for performance improvements.
If your compute environment uses Module
Module provides configuration files (module-files) for the dynamic modification of your environment.
Here is how to list the modules that your system provides:
module avail
And how to load one (mpich3 in this example):
module load mpich3/mpich3-3.1-icc
Don’t forget to put this line in your ~/.bashrc
so that it is
automatically loaded.
Otherwise, follow the recommended usage in your institute
If you don’t have modules for the MPI environment available on your system, please make sure you include the right libraries (PATH, and any other environment variables).
The eHive bit
Here again, once the environment is properly set up, we only have to define the correct Resource Class and command lines in eHive.
You need to setup a Resource Class that uses e.g. 64 cores and 16Gb of RAM:
sub resource_classes { my ($self) = @_; return { # ... '16Gb_64c_mpi' => {'LSF' => '-q mpi-rh7 -n 64 -M16000 -R"select[mem>16000] rusage[mem=16000] same[model] span[ptile=4]"' }, # ... }; }
The Resource description is specific to our LSF environment, so adapt it to yours, but:
-q mpi-rh7
is needed to tell LSF you will run a job (Worker) in the MPI environment. Note that some LSF installations will require you to use an additional-a
option.same[model]
is needed to ensure that the selected compute nodes all have the same hardware. You may also need something likeselect[avx]
to select the nodes that have the AVX instruction setspan[ptile=4]
, this option specifies the granularity in which LSF will split the jobs/per node. In this example we ask for each machine to be allocated a multiple of four cores. This might affect queuing times. The memory requested is allocated for each _ptile_ (so 64/4*16GB=256GB in total in the example).
You need to add the Analysis to your PipeConfig:
{ -logic_name => 'MPI_app', -module => 'Bio::EnsEMBL::Compara::RunnableDB::ProteinTrees::MPI_app', -parameters => { 'mpi_exe' => $self->o('mpi_exe'), }, -rc_name => '16Gb_64c_mpi', # ... },
How to write a module that uses MPI
Here is an excerpt of Ensembl Compara’s ExaML MPI module. Note that LSF needs the MPI command to be run through mpirun. You can also run several single-threaded commands in the same Runnable.
sub param_defaults {
my $self = shift;
return {
%{ $self->SUPER::param_defaults },
'cmd' => 'cmd 1 ; cmd 2 ; #mpirun_exe# #examl_exe# -examl_parameter_1 value1 -examl_parameter_2 value2',
};
}
Temporary files
In our case, Examl uses MPI and wants to share data via the filesystem too.
In this specific Runnable, Examl is set to run in eHive’s managed temporary
directory, which by default is under /tmp which is not shared across nodes on
our compute cluster.
We have to override the eHive method to use a shared directory ($self->o('examl_dir')
) instead.
This can be done at the resource class level, by adding
"-worker_base_tmp_dir ".$self->o('examl_dir')
to the
worker_cmd_args
attribute of the resource-class