beekeeper

DESCRIPTION

The Beekeeper is in charge of interfacing between the eHive database a compute resource or ‘compute farm’. Its Job is to synchronise both, to assess the compute requirements of the pipeline and to send the requested number of workers to open machines via the runWorker.pl script.

It is also responsible for identifying workers which died unexpectedly so that dead workers can be released and unfinished Jobs reclaimed.

USAGE EXAMPLES

    # Usually run after the pipeline has been created to calculate the internal statistics necessary for eHive functioning
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname -sync

    # Do not run any additional Workers, just check for the current status of the pipeline:
beekeeper.pl -url mysql://username:secret@hostname:port/ehive_dbname

    # Run the pipeline in automatic mode (-loop), run all the workers locally (-meadow_type LOCAL) and allow for 3 parallel workers (-total_running_workers_max 3)
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -meadow_type LOCAL -total_running_workers_max 3 -loop

    # Run in automatic mode, but only restrict to running blast-related analyses with the exception of analyses 4..6
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -analyses_pattern 'blast%-4..6' -loop

    # Restrict the normal execution to one iteration only - can be used for testing a newly set up pipeline
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -run

    # Reset failed 'buggy_analysis' Jobs to 'READY' state, so that they can be run again
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -analyses_pattern buggy_analysis -reset_failed_jobs

    # Do a cleanup: find and bury dead workers, reclaim their Jobs
beekeeper.pl -url mysql://username:secret@hostname:port/long_mult_test -dead

OPTIONS

Connection parameters

--reg_conf <path>: Path to a Registry configuration file
--reg_type <string>: Type of the registry entry (“hive”, “core”, “compara”, etc. - defaults to “hive”)
--reg_alias <string>: Species / alias name for the eHive DBAdaptor
--url <url string>: URL defining where eHive database is located
--nosqlvc: “No SQL Version Check” - set if you want to force working with a database created by a potentially schema-incompatible API

Configs overriding

--config_file <string>: JSON file (with absolute path) to override the default configurations (could be multiple)

Looping control

--loop

run autonomously, loops and sleeps. Equivalent to -loop_until ANALYSIS_FAILURE

--loop_until

sets the level of event that will cause the Beekeeper to stop looping:

JOB_FAILURE: stop looping if any Job fails
ANALYSIS_FAILURE: stop looping if any Analysis has Job failures exceeding its fault tolerance
NO_WORK: ignore Job and Analysis failures, keep looping until there is no work
FOREVER: ignore failures and no work, keep looping

--keep_alive

(Deprecated) alias for -loop_until FOREVER

--max_loops <num>

perform max this # of loops in autonomous mode. The Beekeeper will stop when it has performed max_loops loops, even in FOREVER mode

--job_id <job_id>

run one iteration for this job_id

--run

run one iteration of automation loop

--sleep <num>

when looping, sleep <num> minutes (default 1 min)

Current Meadow control

--meadow_type <string>: the desired Meadow class name, such as ‘LSF’ or ‘LOCAL’
--total_running_workers_max <num>: max # workers to be running in parallel
--submit_workers_max <num>: max # workers to create per loop iteration
--submission_options <string>: passes <string> to the Meadow submission command as <options> (formerly lsf_options)
--submit_log_dir <dir>: record submission output+error streams into files under the given directory (to see why some workers fail after submission)

Worker control

--analyses_pattern <string>: restrict the sync operation, printing of stats or looping of the Beekeeper to the specified subset of Analyses
--nocan_respecialize: prevent workers from re-specializing into another Analysis (within resource_class) after their previous Analysis is exhausted
--force: run all workers with -force (see runWorker.pl)
--killworker <worker_id>: kill Worker by worker_id
--life_span <num>: number of minutes each Worker is allowed to run
--job_limit <num>: Number of Jobs to run before Worker can die naturally
--retry_throwing_jobs: if a Job dies *knowingly* (e.g. by encountering a die statement in the Runnable), should we retry it by default?
--hive_log_dir <path>: directory where stdout/stderr of the eHive is redirected
--worker_delay_startup_seconds <number>: number of seconds each Worker has to wait before first talking to the database (0 by default, useful for debugging)
--worker_crash_on_startup_prob <float>: probability of each Worker failing at startup (0 by default, useful for debugging)
--debug <debug_level>: set debug level of the workers

Other commands/options

--help: print this help
--versions: report both eHive code version and eHive database schema version
--dead: detect all unaccounted dead workers and reset their Jobs for resubmission
--sync: re-synchronise the ehive
--unkwn: detect all workers in UNKWN state and reset their Jobs for resubmission (careful, they *may* reincarnate!)
--big_red_button: shut everything down: block all beekeepers connected to the pipeline and terminate workers
--alldead: tell the database all workers are dead (no checks are performed in this mode, so be very careful!)
--balance_semaphores: set all Semaphore counters to the numbers of unDONE fan Jobs (emergency use only)
--worker_stats: show status of each running Worker
--failed_jobs: show all failed Jobs
--job_output <job_id>: print details for one Job
--reset_job_id <num>: reset a Job back to READY so it can be rerun
--reset_failed_jobs: reset FAILED Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--reset_done_jobs: reset DONE and PASSED_ON Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--reset_all_jobs: reset FAILED, DONE and PASSED_ON Jobs of analyses matching -analyses_pattern back to READY so they can be rerun
--forgive_failed_jobs: mark FAILED Jobs of analyses matching -analyses_pattern as DONE, and update their Semaphores. NOTE: This does not make them dataflow
--discard_ready_jobs: mark READY Jobs of analyses matching -analyses_pattern as DONE, and update their Semaphores. NOTE: This does not make them dataflow
--unblock_semaphored_jobs: set SEMAPHORED Jobs of analyses matching -analyses_pattern to READY so they can start