runWorker

DESCRIPTION

runWorker.pl is an eHive component script that does the work of a single Worker. It specialises in one of the analyses and starts executing Jobs of that Analysis one-by-one or batch-by-batch.

Most of the functionality of the eHive is accessible via beekeeper.pl script, but feel free to run the runWorker.pl if you think you need a direct access to the running Jobs.

USAGE EXAMPLES

    # Run one local Worker process in ehive_dbname and let the system pick up the Analysis
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname

    # Run one local Worker process in ehive_dbname and let the system pick up the Analysis from the given resource_class
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -rc_name low_mem

    # Run one local Worker process in ehive_dbname and constrain its initial specialisation within a subset of analyses
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -analyses_pattern '1..15,analysis_X,21'

    # Run one local Worker process in ehive_dbname and allow it to respecialize within a subset of Analyses
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -can_respecialize -analyses_pattern 'blast%-4..6'

    # Run a specific Job in a local Worker process:
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -job_id 123456

OPTIONS

Connection parameters:

--reg_conf <path>

path to a Registry configuration file

--reg_alias <string>

species/alias name for the eHive DBAdaptor

--reg_type <string>

type of the registry entry (“hive”, “core”, “compara”, etc - defaults to “hive”)

--url <url string>

URL defining where database is located

--nosqlvc

“No SQL Version Check” - set if you want to force working with a database created by a potentially schema-incompatible API

Configs overriding

--config_file <string>

JSON file (with absolute path) to override the default configurations (could be multiple)

Task specification parameters:

--rc_id <id>

resource class id

--rc_name <string>

resource class name

--analyses_pattern <string>

restrict the specialisation of the Worker to the specified subset of Analyses

--analysis_id <id>

run a Worker and have it specialise to an Analysis with this analysis_id

--job_id <id>

run a specific Job defined by its database id

--force

set if you want to force running a Worker over a BLOCKED Analysis or to run a specific DONE/SEMAPHORED job_id

Worker control parameters:

--job_limit <num>

number of Jobs to run before the Worker can die naturally

--life_span <num>

number of minutes this Worker is allowed to run

--no_cleanup

don’t perform temp directory cleanup when the Worker exits

--no_write

don’t write_output or auto_dataflow input_job

--worker_base_temp_dir <path>

The base directory that this worker will use for temporary operations. This overrides the default set in the JSON config file and in the code (/tmp)

--hive_log_dir <path>

directory where stdout/stderr of the whole eHive of workers is redirected

--worker_log_dir <path>

directory where stdout/stderr of this particular Worker is redirected

--retry_throwing_jobs

By default, Jobs are allowed to fail a few times (up to the Analysis’ max_retry_count parameter) until the systems “gives up” and considers them as FAILED. retry Jobs if the Job dies knowingly (e.g. due to encountering a die statement in the Runnable)

--can_respecialize

allow this Worker to re-specialise into another Analysis (within resource_class) after it has exhausted all Jobs of the current one

--worker_delay_startup_seconds <number>

number of seconds each Worker has to wait before first talking to the database (0 by default, useful for debugging)

--worker_crash_on_startup_prob <float>

probability of each Worker failing at startup (0 by default, useful for debugging)

Other options:

--help

print this help

--versions

report both eHive code version and eHive database schema version

--debug <level>

turn on debug messages at <level>