load_resource_usage
DESCRIPTION
This script obtains resource usage data for your pipeline from the
Meadow and stores it in the worker_resource_usage
table. Your Meadow
class/plugin has to support offline examination of resources in order
for this script to work.
Based on the start time of the first Worker and end time of the last
Worker (as recorded in the pipeline database), it pulls the relevant
data out of your Meadow (runs the bacct
script in case of LSF),
parses the report and stores in the worker_resource_usage
table. You
can join this table to the worker
table
USING(meadow_name,process_id) in the usual MySQL way to filter by
analysis_id, do various stats, etc.
You can optionally provide an an external filename or command to get the data from it (don’t forget to append a “|” to the end!) and then the data will be taken from your source and parsed from there.
USAGE EXAMPLES
# Just run it the usual way: query and store the relevant data into "worker_resource_usage" table:
load_resource_usage.pl -url mysql://username:secret@hostname:port/long_mult_test
# The same, but assuming another user "someone_else" ran the pipeline:
load_resource_usage.pl -url mysql://username:secret@hostname:port/long_mult_test -username someone_else
# Assuming the dump file existed. Load the dumped bacct data into "worker_resource_usage" table:
load_resource_usage.pl -url mysql://username:secret@hostname:port/long_mult_test -source long_mult.bacct
# Provide your own command to fetch and parse the worker_resource_usage data from:
load_resource_usage.pl -url mysql://username:secret@hostname:port/long_mult_test -source "bacct -l -C 2012/01/25/13:33,2012/01/25/14:44 |" -meadow_type LSF
OPTIONS
- --help
print this help
- --url <url string>
URL defining where eHive database is located
- --username <username>
if it wasn’t you who ran the pipeline, the name of that user can be provided
- --source <filename>
alternative source of worker_resource_usage data. Can be a filename or a pipe-from command.
- --meadow_type <type>
only used when -source is given. Tells which meadow type the source filename relates to. Defaults to the first available meadow (LOCAL being considered as the last available)
- --nosqlvc
“No SQL Version Check” - set if you want to force working with a database created by a potentially schema-incompatible API