beekeeper

DESCRIPTION

The Beekeeper is in charge of interfacing between the eHive database a compute resource or ‘compute farm’. Its Job is to synchronise both, to assess the compute requirements of the pipeline and to send the requested number of workers to open machines via the runWorker.pl script.

It is also responsible for identifying workers which died unexpectedly so that dead workers can be released and unfinished Jobs reclaimed.

USAGE EXAMPLES

    # Usually run after the pipeline has been created to calculate the internal statistics necessary for eHive functioning
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname -sync

    # Do not run any additional Workers, just check for the current status of the pipeline:
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname

    # Run the pipeline in automatic mode (-loop), run all the workers locally (-meadow_type LOCAL) and allow for 3 parallel workers (-total_running_workers_max 3)
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -meadow_type LOCAL -total_running_workers_max 3 -loop

    # Run in automatic mode, but only restrict to running blast-related analyses with the exception of analyses 4..6
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -analyses_pattern 'blast%-4..6' -loop

    # Restrict the normal execution to one iteration only - can be used for testing a newly set up pipeline
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -run

    # Reset failed 'buggy_analysis' Jobs to 'READY' state, so that they can be run again
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -analyses_pattern buggy_analysis -reset_failed_jobs

    # Do a cleanup: find and bury dead workers, reclaim their Jobs
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -dead

OPTIONS

Connection parameters

--reg_conf <path>
 Path to a Registry configuration file
--reg_type <string>
 Type of the registry entry (“hive”, “core”, “compara”, etc. - defaults to “hive”)
--reg_alias <string>
 Species / alias name for the eHive DBAdaptor
--url <url string>
 URL defining where eHive database is located
--nosqlvc “No SQL Version Check” - set if you want to force working with a database created by a potentially schema-incompatible API

Configs overriding

--config_file <string>
 JSON file (with absolute path) to override the default configurations (could be multiple)

Looping control

--loop run autonomously, loops and sleeps. Equivalent to -loop_until ANALYSIS_FAILURE
--loop_until

sets the level of event that will cause the Beekeeper to stop looping:

JOB_FAILURE
stop looping if any Job fails
ANALYSIS_FAILURE
stop looping if any Analysis has Job failures exceeding its fault tolerance
NO_WORK
ignore Job and Analysis failures, keep looping until there is no work
FOREVER
ignore failures and no work, keep looping
--keep_alive (Deprecated) alias for -loop_until FOREVER
--max_loops <num>
 perform max this # of loops in autonomous mode. The Beekeeper will stop when it has performed max_loops loops, even in FOREVER mode
--job_id <job_id>
 run one iteration for this job_id
--run run one iteration of automation loop
--sleep <num> when looping, sleep <num> minutes (default 1 min)

Current Meadow control

--meadow_type <string>
 the desired Meadow class name, such as ‘LSF’ or ‘LOCAL’
--total_running_workers_max <num>
 max # workers to be running in parallel
--submit_workers_max <num>
 max # workers to create per loop iteration
--submission_options <string>
 passes <string> to the Meadow submission command as <options> (formerly lsf_options)
--submit_log_dir <dir>
 record submission output+error streams into files under the given directory (to see why some workers fail after submission)

Worker control

--analyses_pattern <string>
 restrict the sync operation, printing of stats or looping of the Beekeeper to the specified subset of Analyses
--nocan_respecialize
 prevent workers from re-specializing into another Analysis (within resource_class) after their previous Analysis is exhausted
--force run all workers with -force (see runWorker.pl)
--killworker <worker_id>
 kill Worker by worker_id
--life_span <num>
 number of minutes each Worker is allowed to run
--job_limit <num>
 Number of Jobs to run before Worker can die naturally
--retry_throwing_jobs
 if a Job dies *knowingly* (e.g. by encountering a die statement in the Runnable), should we retry it by default?
--hive_log_dir <path>
 directory where stdout/stderr of the eHive is redirected
--worker_delay_startup_seconds <number>
 number of seconds each Worker has to wait before first talking to the database (0 by default, useful for debugging)
--worker_crash_on_startup_prob <float>
 probability of each Worker failing at startup (0 by default, useful for debugging)
--debug <debug_level>
 set debug level of the workers

Other commands/options

--help print this help
--versions report both eHive code version and eHive database schema version
--dead detect all unaccounted dead workers and reset their Jobs for resubmission
--sync re-synchronise the ehive
--unkwn detect all workers in UNKWN state and reset their Jobs for resubmission (careful, they *may* reincarnate!)
--big_red_button
 shut everything down: block all beekeepers connected to the pipeline and terminate workers
--alldead tell the database all workers are dead (no checks are performed in this mode, so be very careful!)
--balance_semaphores
 set all Semaphore counters to the numbers of unDONE fan Jobs (emergency use only)
--worker_stats show status of each running Worker
--failed_jobs show all failed Jobs
--job_output <job_id>
 print details for one Job
--reset_job_id <num>
 reset a Job back to READY so it can be rerun
--reset_failed_jobs
 reset FAILED Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--reset_done_jobs
 reset DONE and PASSED_ON Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--reset_all_jobs
 reset FAILED, DONE and PASSED_ON Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--forgive_failed_jobs
 mark FAILED Jobs of analyses matching -analyses_pattern as DONE, and update their Semaphores. NOTE: This does not make them dataflow
--discard_ready_jobs
 mark READY Jobs of analyses matching -analyses_pattern as DONE, and update their Semaphores. NOTE: This does not make them dataflow
--unblock_semaphored_jobs
 set SEMAPHORED Jobs of analyses matching -analyses_pattern to READY so they can start