9 Please refer to Bio::EnsEMBL::Hive::PipeConfig::GCPct_conf pipeline configuration file
10 to see how
this runnable fits into the %GC example pipeline.
14 'Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CountATGC' counts the occurrences of A/T and G/C bases in
15 the sequences in a .fasta file. It takes a .fasta file with DNA sequences as input. It flows out two parameters:
16 at_count and gc_count.
20 See the NOTICE file distributed with
this work
for additional information
21 regarding copyright ownership.
23 Licensed under the Apache License,
Version 2.0 (the
"License"); you may not use
this file except in compliance with the License.
24 You may obtain a copy of the License at
28 Unless required by applicable law or agreed to in writing, software distributed under the License
29 is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
30 See the License
for the specific language governing permissions and limitations under the License.
34 Please subscribe to the
Hive mailing list: http:
39 package Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CountATGC;
46 use base (
'Bio::EnsEMBL::Hive::Process');
51 Description : Implements param_defaults()
interface method of
Bio::EnsEMBL::Hive::Process that defines module defaults for parameters.
58 'take_time' => 0, # how much time
run() method will spend in sleeping state
65 Description : Implements fetch_input()
interface method of
Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
66 There are no hard and fast rules on whether to fetch parameters in fetch_input(), or to wait until
run() to fetch them.
67 In general, fetch_input() is a place to validate parameter existence and values for errors before the worker get set into RUN state
68 from the FETCH_INPUT state.
70 In this case, we decide to try and open our input file in fetch_input(), so that it will fail early if there is a problem with the
78 my $chunkfile = $self->param_required(
'chunk_name');
79 my $chunk_in = Bio::SeqIO->new(-file =>
"$chunkfile");
80 $self->param(
'chunk_in', $chunk_in);
86 Description : Implements
run()
interface method of
Bio::EnsEMBL::Hive::Process that is used to perform the
main bulk of the job.
87 Here, we use the file opened in fetch_input, read in the sequences from the file, and tally up the number of
88 AT and GC bases seen. We then store these in parameters named at_count and gc_count.
97 foreach my $chunkseq ($self->param(
'chunk_in')->next_seq()) {
98 my $seqstring = $chunkseq->seq();
99 $at_count += @{[$seqstring =~ /([AaTt])/g]};
100 $gc_count += @{[$seqstring =~ /([GgCc])/g]};
103 $self->param(
'at_count', $at_count);
104 $self->param(
'gc_count', $gc_count);
106 sleep( $self->param(
'take_time') );
111 Description : Implements write_output()
interface method of
Bio::EnsEMBL::Hive::Process that is used to deal with the
112 job's output after the execution.
113 The AT and GC counts dataflow down branch 1 in two parameters: 'at_count' and 'gc_count'.
117 sub write_output { # but
this time we have something to store
120 $self->dataflow_output_id( {
121 'at_count' => $self->param(
'at_count'),
122 'gc_count' => $self->param(
'gc_count'),