11 seed_pipeline.pl -url <url> -logic_name find_files -input_id
"{ 'directory' => 'dumps' }"
15 This is an example pipeline put together from two basic building blocks:
17 Analysis_1: JobFactory.pm is used to turn the list of files in a given directory into jobs
19 these jobs are sent down the branch #2 into the second analysis
21 Analysis_2: SystemCmd.pm is used to
run these compression/decompression jobs in parallel.
25 See the NOTICE file distributed with
this work
for additional information
26 regarding copyright ownership.
28 Licensed under the Apache License, Version 2.0 (the
"License"); you may not use
this file except in compliance with the License.
29 You may obtain a copy of the License at
33 Unless required by applicable law or agreed to in writing, software distributed under the License
34 is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
35 See the License
for the specific language governing permissions and limitations under the License.
39 Please subscribe to the Hive mailing list: http:
44 package Bio::EnsEMBL::Hive::Examples::Factories::PipeConfig::CompressFiles_conf;
49 use base (
'Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf'); # All Hive databases configuration files should inherit from HiveGeneric, directly or indirectly
52 =head2 pipeline_analyses
55 Here it defines two analyses:
57 * 'find_files' generates a list of files whose names match the pattern #only_files#
58 Each job of this analysis will dataflow (create jobs) via branch #2 into 'compress_a_file' analysis.
60 * 'compress_a_file' actually performs the (un)gzipping of the files in parallel
64 sub pipeline_analyses {
67 { -logic_name =>
'find_files',
68 -module =>
'Bio::EnsEMBL::Hive::RunnableDB::JobFactory',
70 'inputcmd' =>
'find #directory# -type f',
71 'column_names' => [
'filename' ],
74 2 => [
'compress_a_file' ], # will create a fan of jobs
78 { -logic_name =>
'compress_a_file',
79 -module =>
'Bio::EnsEMBL::Hive::RunnableDB::SystemCmd',
81 'cmd' =>
'gzip #filename#',
83 -analysis_capacity => 4, # limit the number of workers that will be performing jobs in parallel