9 Please refer to Bio::EnsEMBL::Hive::PipeConfig::GCPct_conf pipeline configuration file
10 to see how
this runnable fits into the %GC example pipeline.
14 'Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CalcOverallPercentage' is the
final step of the pipeline.
15 It sums up the GC and AT counts from all of the chunked subsequences, then divides the GC count by the GC + AT
16 count to determine %GC
20 See the NOTICE file distributed with
this work
for additional information
21 regarding copyright ownership.
23 Licensed under the Apache License,
Version 2.0 (the
"License"); you may not use
this file except in compliance with the License.
24 You may obtain a copy of the License at
28 Unless required by applicable law or agreed to in writing, software distributed under the License
29 is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
30 See the License
for the specific language governing permissions and limitations under the License.
34 Please subscribe to the
Hive mailing list: http:
39 package Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CalcOverallPercentage;
44 use List::Util qw(sum);
46 use base (
'Bio::EnsEMBL::Hive::Process');
51 Description : Implements fetch_input()
interface method of
Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
53 There are no hard and fast rules on whether to fetch parameters in fetch_input(), or to wait until
run() to fetch them.
54 In general, fetch_input() is a place to validate parameter existence and values for errors before the worker get set into RUN state
55 from the FETCH_INPUT state. In this case, since it's a simple computation, we don't do anything in fetch_input() and instead just
56 handle the parameters in
run()
66 Description : Implements
run()
interface method of
Bio::EnsEMBL::Hive::Process that is used to perform the
main bulk of the job.
67 Here, we fetch AT and GC counts, then call a subroutine to calculate %GC from the counts.
74 my $at_count = $self->param_required(
'at_count');
75 my $gc_count = $self->param_required(
'gc_count');
77 my $percentage = $self->_calc_pct($at_count, $gc_count);
79 $self->param(
'result', $percentage);
80 $self->warning(
"percentage is $percentage");
86 Description : Implements write_output()
interface method of
Bio::EnsEMBL::Hive::Process that is used to deal with job's output
88 Here, it flows the result from the %GC calculation out into branch 1 in a parameter called 'result'.
93 sub write_output { # dataflow
96 $self->dataflow_output_id({
97 'result' => $self->param(
'result'),
103 Description : This is a
private method that does the actual %GC calculation.
104 $at_count is an arrayref pointing to a list of AT counts;
105 likewise, $gc_count is an arrayref pointint to a list of GC counts. In the
106 %GC pipeline, each element in the array is the count of AT or GC in one of the chunked
109 Here, we sum up the counts from all of the chunks, then divide the total GC count by the
110 total AT + GC count to determine a percentage.
115 my ($self, $at_count, $gc_count) = @_;
117 # using sum from List::Util
118 my $at_sum = sum @{$at_count};
119 my $gc_sum = sum @{$gc_count};
122 if (($at_sum + $gc_sum) != 0) {
123 $pct_gc = $gc_sum / ($at_sum + $gc_sum);