ensembl-hive  2.7.0
CalcOverallPercentage.pm
Go to the documentation of this file.
1 =pod
2 
3 =head1 NAME
4 
6 
7 =head1 SYNOPSIS
8 
9  Please refer to Bio::EnsEMBL::Hive::PipeConfig::GCPct_conf pipeline configuration file
10  to see how this runnable fits into the %GC example pipeline.
11 
12 =head1 DESCRIPTION
13 
14  'Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CalcOverallPercentage' is the final step of the pipeline.
15  It sums up the GC and AT counts from all of the chunked subsequences, then divides the GC count by the GC + AT
16  count to determine %GC
17 
18 =head1 LICENSE
19 
20  See the NOTICE file distributed with this work for additional information
21  regarding copyright ownership.
22 
23  Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
24  You may obtain a copy of the License at
25 
26  http://www.apache.org/licenses/LICENSE-2.0
27 
28  Unless required by applicable law or agreed to in writing, software distributed under the License
29  is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
30  See the License for the specific language governing permissions and limitations under the License.
31 
32 =head1 CONTACT
33 
34  Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates
35 
36 =cut
37 
38 
39 package Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CalcOverallPercentage;
40 
41 use strict;
42 use warnings;
43 
44 use List::Util qw(sum);
45 
46 use base ('Bio::EnsEMBL::Hive::Process');
47 
48 
49 =head2 fetch_input
50 
51  Description : Implements fetch_input() interface method of Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
52 
53  There are no hard and fast rules on whether to fetch parameters in fetch_input(), or to wait until run() to fetch them.
54  In general, fetch_input() is a place to validate parameter existence and values for errors before the worker get set into RUN state
55  from the FETCH_INPUT state. In this case, since it's a simple computation, we don't do anything in fetch_input() and instead just
56  handle the parameters in run()
57 
58 =cut
59 
60 sub fetch_input {
61 
62 }
63 
64 =head2 run
65 
66  Description : Implements run() interface method of Bio::EnsEMBL::Hive::Process that is used to perform the main bulk of the job.
67  Here, we fetch AT and GC counts, then call a subroutine to calculate %GC from the counts.
68 
69 =cut
70 
71 sub run {
72  my $self = shift @_;
73 
74  my $at_count = $self->param_required('at_count');
75  my $gc_count = $self->param_required('gc_count');
76 
77  my $percentage = $self->_calc_pct($at_count, $gc_count);
78 
79  $self->param('result', $percentage);
80  $self->warning("percentage is $percentage");
81 
82 }
83 
84 =head2 write_output
85 
86  Description : Implements write_output() interface method of Bio::EnsEMBL::Hive::Process that is used to deal with job's output
87  after the execution.
88  Here, it flows the result from the %GC calculation out into branch 1 in a parameter called 'result'.
89 
90 =cut
91 
92 
93 sub write_output { # dataflow
94  my $self = shift @_;
95 
96  $self->dataflow_output_id({
97  'result' => $self->param('result'),
98  }, 1);
99 }
100 
101 =head2 _calc_pct
102 
103  Description : This is a private method that does the actual %GC calculation.
104  $at_count is an arrayref pointing to a list of AT counts;
105  likewise, $gc_count is an arrayref pointint to a list of GC counts. In the
106  %GC pipeline, each element in the array is the count of AT or GC in one of the chunked
107  sequence files.
108 
109  Here, we sum up the counts from all of the chunks, then divide the total GC count by the
110  total AT + GC count to determine a percentage.
111 
112 =cut
113 
114 sub _calc_pct {
115  my ($self, $at_count, $gc_count) = @_;
116 
117  # using sum from List::Util
118  my $at_sum = sum @{$at_count};
119  my $gc_sum = sum @{$gc_count};
120 
121  my $pct_gc = 0;
122  if (($at_sum + $gc_sum) != 0) {
123  $pct_gc = $gc_sum / ($at_sum + $gc_sum);
124  }
125  return $pct_gc;
126 }
127 
128 1;
129 
Bio::EnsEMBL::Hive::Version
Definition: Version.pm:19
main
public main()
run
public run()
Bio::EnsEMBL::Hive
Definition: Hive.pm:38
Bio
Definition: AltAlleleGroup.pm:4
Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CalcOverallPercentage
Definition: CalcOverallPercentage.pm:22