ensembl-hive  2.8.1
CountATGC.pm
Go to the documentation of this file.
1 =pod
2 
3 =head1 NAME
4 
6 
7 =head1 SYNOPSIS
8 
9  Please refer to Bio::EnsEMBL::Hive::PipeConfig::GCPct_conf pipeline configuration file
10  to see how this runnable fits into the %GC example pipeline.
11 
12 =head1 DESCRIPTION
13 
14  'Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CountATGC' counts the occurrences of A/T and G/C bases in
15  the sequences in a .fasta file. It takes a .fasta file with DNA sequences as input. It flows out two parameters:
16  at_count and gc_count.
17 
18 =head1 LICENSE
19 
20  See the NOTICE file distributed with this work for additional information
21  regarding copyright ownership.
22 
23  Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
24  You may obtain a copy of the License at
25 
26  http://www.apache.org/licenses/LICENSE-2.0
27 
28  Unless required by applicable law or agreed to in writing, software distributed under the License
29  is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
30  See the License for the specific language governing permissions and limitations under the License.
31 
32 =head1 CONTACT
33 
34  Please subscribe to the Hive mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/ehive-users to discuss Hive-related questions or to be notified of our updates
35 
36 =cut
37 
38 
39 package Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CountATGC;
40 
41 use strict;
42 use warnings;
43 
44 use Bio::SeqIO;
45 
46 use base ('Bio::EnsEMBL::Hive::Process');
47 
48 
49 =head2 param_defaults
50 
51  Description : Implements param_defaults() interface method of Bio::EnsEMBL::Hive::Process that defines module defaults for parameters.
52 
53 =cut
54 
55 sub param_defaults {
56 
57  return {
58  'take_time' => 0, # how much time run() method will spend in sleeping state
59  };
60 }
61 
62 
63 =head2 fetch_input
64 
65  Description : Implements fetch_input() interface method of Bio::EnsEMBL::Hive::Process that is used to read in parameters and load data.
66  There are no hard and fast rules on whether to fetch parameters in fetch_input(), or to wait until run() to fetch them.
67  In general, fetch_input() is a place to validate parameter existence and values for errors before the worker get set into RUN state
68  from the FETCH_INPUT state.
69 
70  In this case, we decide to try and open our input file in fetch_input(), so that it will fail early if there is a problem with the
71  file open operation.
72 
73 =cut
74 
75 sub fetch_input {
76  my $self = shift @_;
77 
78  my $chunkfile = $self->param_required('chunk_name');
79  my $chunk_in = Bio::SeqIO->new(-file => "$chunkfile");
80  $self->param('chunk_in', $chunk_in);
81 
82 }
83 
84 =head2 run
85 
86  Description : Implements run() interface method of Bio::EnsEMBL::Hive::Process that is used to perform the main bulk of the job.
87  Here, we use the file opened in fetch_input, read in the sequences from the file, and tally up the number of
88  AT and GC bases seen. We then store these in parameters named at_count and gc_count.
89 
90 =cut
91 
92 sub run {
93  my $self = shift @_;
94 
95  my $at_count = 0;
96  my $gc_count = 0;
97  foreach my $chunkseq ($self->param('chunk_in')->next_seq()) {
98  my $seqstring = $chunkseq->seq();
99  $at_count += @{[$seqstring =~ /([AaTt])/g]};
100  $gc_count += @{[$seqstring =~ /([GgCc])/g]};
101  }
102 
103  $self->param('at_count', $at_count);
104  $self->param('gc_count', $gc_count);
105 
106  sleep( $self->param('take_time') );
107 }
108 
109 =head2 write_output
110 
111  Description : Implements write_output() interface method of Bio::EnsEMBL::Hive::Process that is used to deal with the
112  job's output after the execution.
113  The AT and GC counts dataflow down branch 1 in two parameters: 'at_count' and 'gc_count'.
114 
115 =cut
116 
117 sub write_output { # but this time we have something to store
118  my $self = shift @_;
119 
120  $self->dataflow_output_id( {
121  'at_count' => $self->param('at_count'),
122  'gc_count' => $self->param('gc_count'),
123  }, 1);
124 }
125 
126 1;
127 
Bio::EnsEMBL::Hive::Examples::GC::RunnableDB::CountATGC
Definition: CountATGC.pm:22
main
public main()
run
public run()
Bio
Definition: AltAlleleGroup.pm:4