runWorker

DESCRIPTION

runWorker.pl is an eHive component script that does the work of a single Worker. It specialises in one of the analyses and starts executing Jobs of that Analysis one-by-one or batch-by-batch.

Most of the functionality of the eHive is accessible via beekeeper.pl script, but feel free to run the runWorker.pl if you think you need a direct access to the running Jobs.

USAGE EXAMPLES

    # Run one local Worker process in ehive_dbname and let the system pick up the Analysis
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname

    # Run one local Worker process in ehive_dbname and let the system pick up the Analysis from the given resource_class
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -rc_name low_mem

    # Run one local Worker process in ehive_dbname and constrain its initial specialisation within a subset of analyses
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -analyses_pattern '1..15,analysis_X,21'

    # Run one local Worker process in ehive_dbname and allow it to respecialize within a subset of Analyses
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -can_respecialize -analyses_pattern 'blast%-4..6'

    # Run a specific Job in a local Worker process:
runWorker.pl -url mysql://username:secret@hostname:port/ehive_dbname -job_id 123456

OPTIONS

Connection parameters:

--reg_conf <path>: path to a Registry configuration file
--reg_alias <string>: species/alias name for the eHive DBAdaptor
--reg_type <string>: type of the registry entry (“hive”, “core”, “compara”, etc - defaults to “hive”)
--url <url string>: URL defining where database is located
--nosqlvc: “No SQL Version Check” - set if you want to force working with a database created by a potentially schema-incompatible API

Configs overriding

--config_file <string>: JSON file (with absolute path) to override the default configurations (could be multiple)

Task specification parameters:

--rc_id <id>: resource class id
--rc_name <string>: resource class name
--analyses_pattern <string>: restrict the specialisation of the Worker to the specified subset of Analyses
--analysis_id <id>: run a Worker and have it specialise to an Analysis with this analysis_id
--job_id <id>: run a specific Job defined by its database id
--force: set if you want to force running a Worker over a BLOCKED Analysis or to run a specific DONE/SEMAPHORED job_id

Worker control parameters:

--job_limit <num>: number of Jobs to run before the Worker can die naturally
--life_span <num>: number of minutes this Worker is allowed to run
--no_cleanup: don’t perform temp directory cleanup when the Worker exits
--no_write: don’t write_output or auto_dataflow input_job
--worker_base_temp_dir <path>: The base directory that this worker will use for temporary operations. This overrides the default set in the JSON config file and in the code (/tmp)
--hive_log_dir <path>: directory where stdout/stderr of the whole eHive of workers is redirected
--worker_log_dir <path>: directory where stdout/stderr of this particular Worker is redirected
--retry_throwing_jobs: By default, Jobs are allowed to fail a few times (up to the Analysis’ max_retry_count parameter) until the systems “gives up” and considers them as FAILED. retry Jobs if the Job dies knowingly (e.g. due to encountering a die statement in the Runnable)
--can_respecialize: allow this Worker to re-specialise into another Analysis (within resource_class) after it has exhausted all Jobs of the current one
--worker_delay_startup_seconds <number>: number of seconds each Worker has to wait before first talking to the database (0 by default, useful for debugging)
--worker_crash_on_startup_prob <float>: probability of each Worker failing at startup (0 by default, useful for debugging)

Other options:

--help: print this help
--versions: report both eHive code version and eHive database schema version
--debug <level>: turn on debug messages at <level>