eHive exposes an interface for Runnables (jobs) to interact with the system:
- query their own parameters (see Parameter Handling),
- control its own execution and report issues,
- run system commands,
- trigger some dataflow events (e.g. create new jobs).
Reporting and logging¶
Jobs can log messages to the standard output with the
$self->say_with_header($message, $important) method. However they are only printed
when the debug mode is enabled (see below) or when the
$important flag is switched on.
They will also be prefixed with a standard prefix consisting of the
runtime context (Worker, Role, Job).
The debug mode is controlled by the
--debug X option of
beekeeper and runWorker. X is an integer,
allowing multiple levels of debug, although most of the modules will only
check whether it is 0 or not.
(so that the messages are printed on the standard output) but also stores
them in the database (in the
To indicate that a Job has to be terminated earlier (i.e. before reaching
the end of
write_output), you can call:
$self->complete_early($message)to mark the Job as DONE (successful run) and record the message in the database. Beware that this will trigger the autoflow.
$self->complete_early($message, $branch_code)is a variation of the above that will replace the autoflow (branch 1) with a dataflow on the branch given.
$self->throw($message)to log a failed attempt. The Job may be given additional retries following the analysis’ max_retry_count parameter, or is marked as FAILED in the database.
All Runnables have access to the
$self->run_system_command method to run
arbitrary system commands (the
SystemCmd Runnable is merely a wrapper
around this method).
run_system_command takes two arguments:
- The command to run, given as a single string or an arrayref. Arrayrefs
are the preferred way as they simplify the handling of whitespace and
quotes in the command-line arguments. Arrayrefs that correspond to
straightforward commands, e.g.
['find', '-type', 'd'], are passed to the underlying
systemfunction as lists. Arrayrefs can contain shell meta-characters and delimiters such as
>(to redirect the output to a file),
;(to separate two commands that have to be run sequentially) or
|(a pipe) and will be quoted and joined and passed to
systemas a single string.
- An hashref of options. Accepted options are:
use_bash_pipefail: Normally, the exit status of a pipeline (e.g.
cmd1 | cmd2is the exit status of the last command, meaning that errors in the first command are not captured. With the option turned on, the exit status of the pipeline will capture errors in any command of the pipeline, and will only be 0 if all the commands exit successfully.
use_bash_errexit: Exit immediately if a command fails. This is mostly useful for cases like
cmd1; cmd2where by default,
cmd2would always be executed, regardless of the exit status of
timeout: the maximum number of seconds the command is allowed to run for. The exit status will be set to -2 if the command had to be aborted.
During their execution, jobs may certainly have to use temporary files.
eHive provides a directory that will exist throughout the lifespan of the
Worker with the
$self->worker_temp_directory method. The directory is created
the first time the method is called, and deleted when the Worker ends. It is the Runnable’s
responsibility to leave the directory in a clean-enough state for the next
Job (by removing some files, for instance), or to clean it up completely
By default, this directory will be put under /tmp, but it can be overriden
by setting the
-worker_base_tmp_dir option for workers through their
resource classes. This can
be used to:
eHive is an event-driven system whereby agents trigger events that are immediately reacted upon. The main event is called “dataflow” (see Dataflows for more information). A dataflow event is made up of two parts: An event, which is identified by a “branch number”, with an attached data payload, consisting of parameters. A Runnable can create as many events as desired, whenever desired. The branch number can be any integer, but note that “-2”, “-1”, “0”, and “1” have special meaning within eHive. -2, -1, and 0 are special branches for error handling, and 1 is the autoflow branch.
If a Runnable explicitly generates a dataflow event on branch 1, then no autoflow event will be generated when the Job finishes. This is unusual behaviour – many pipelines expect and depend on autoflow coinciding with Job completion. Therefore, you should avoid explicitly creating dataflow on branch 1, unless no alternative exists to produce the correct logic in the Runnable. If you do override the autoflow by creating an event on branch 1, be sure to clearly indicate this in the Runnable’s documentation.
Within a Runnable, dataflow events are performed via the
$data must be of one of these types:
- A hash-reference that maps parameter names (strings) to their values,
- An array-reference of hash-references of the above type, or
undefto propagate the Job’s input_id.
If no branch number is provided, it defaults to 1.
Runnables can also use
This method simply wraps
dataflow_output_id, allowing external programs
to easily generate events. The method takes two arguments:
- The path to a file containing one JSON object per line. Each line can be prefixed with a branch number (and some whitespace), which will override the default branch number.
- The default branch number (defaults to 1).
Use of this is demonstrated in the Runnable Bio::EnsEMBL::Hive::RunnableDB::SystemCmd