3 See the NOTICE file distributed with
this work
for additional information
4 regarding copyright ownership.
6 Licensed under the Apache License, Version 2.0 (the
"License");
7 you may not use
this file except in compliance with the License.
8 You may obtain a copy of the License at
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an
"AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License
for the specific language governing permissions and
16 limitations under the License.
23 Please email comments or questions to the
public Ensembl
24 developers list at <http:
26 Questions may also be sent to the Ensembl help desk at
34 utility
class to post-process projections from one assembly to another
38 # connect to an old database
40 -host =>
'ensembldb.ensembl.org',
43 -dbname =>
'mus_musculus_core_46_36g',
47 # connect to the new database containing the mapping between old and
50 -host =>
'ensembldb.ensembl.org',
53 -dbname =>
'mus_musculus_core_47_37',
58 -OLD_ASSEMBLY =>
'NCBIM36',
59 -NEW_ASSEMBLY =>
'NCBIM37',
61 -EXTERNAL_SOURCE => 1,
62 -MERGE_FRAGMENTS => 1,
66 # fetch a slice on the old assembly
67 my $slice_adaptor = $dba_old->get_SliceAdaptor;
69 $slice_adaptor->fetch_by_region(
'chromosome', 1, undef, undef,
72 my $new_slice = $assembly_projector->old_to_new($slice);
74 print $new_slice->name,
" (", $assembly_projector->last_status,
")\n";
78 This
class implements some utility functions for converting coordinates
79 between assemblies. A mapping between the two assemblies has to present
80 the database for this to work, see the 'Related Modules' section below
81 on how to generate the mapping.
83 In addition to the "raw" projecting of features and slices, the methods
84 in this module also apply some sensible rules to the results of the
85 projection (like discarding unwanted results or merging fragmented
86 projections). These are the rules (depending on configuration):
88 Discard the projected feature/slice if:
90 1. it doesn't project at all (no segments returned)
91 2. [unless MERGE_FRAGMENTS is set] the projection is fragmented (more
93 3. [if CHECK_LENGTH is set] the projection doesn't have the same
94 length as the original feature/slice
95 4. all segments are on same chromosome and strand
97 If a projection fails any of these rules, undef is returned instead of
98 a projected feature/slice. You can use the last_status() method to find
99 out
about the results of the rules tests.
101 Also note that when projecting features, only a shallow projection is
102 performed, i.e. other features attached to your features (e.g. the
103 transcripts of a gene) are not projected automatically, so it will be
104 the responsability of the user code project all levels of features
120 =head1 RELATED MODULES
122 The process of creating a whole genome alignment between two assemblies
123 (which is the basis for the use of the methods in this class) is done by
124 a series of scripts. Please see
126 ensembl/misc-scripts/assembly/README
128 for a high-level description of this process, and POD in the individual
129 scripts for the details.
137 no warnings qw(uninitialized);
142 use Scalar::Util qw(weaken);
147 for a database containing the assembly mapping
148 Arg [EXTERNAL_SOURCE] : (optional) Boolean $external_source - indicates if
149 source is from a different database
150 Arg [OLD_ASSEMBLY] : name of the old assembly
151 Arg [OLD_ASSEMBLY] : name of the new assembly
152 Arg [OBJECT_TYPE] : (optional) object type ('slice' or 'feature')
153 Arg [MERGE_FRAGMENTS] : (optional) Boolean - determines if segments are merged
154 to return a single object spanning all segments
156 Arg [CHECK_LENGTH] : (optional) Boolean - determines if projected objects
157 have to have same length as original (default: false)
160 -OLD_ASSEMBLY => NCBIM36,
161 -NEW_ASSEMBLY => NCBIM37,
163 Description : Constructor.
165 Exceptions : thrown on missing arguments
166 thrown on invalid OBJECT_TYPE
175 my $class = ref($caller) || $caller;
177 my ($adaptor, $external_source, $old_assembly, $new_assembly,
178 $merge_fragments, $check_length) = rearrange([qw(ADAPTOR EXTERNAL_SOURCE
179 OLD_ASSEMBLY NEW_ASSEMBLY MERGE_FRAGMENTS CHECK_LENGTH)], @_);
181 unless ($adaptor and ref($adaptor) and
182 $adaptor->isa(
'Bio::EnsEMBL::DBSQL::DBAdaptor')) {
183 throw(
"You must provide a DBAdaptor to a database containing the assembly mapping.");
186 unless ($old_assembly and $new_assembly) {
187 throw(
"You must provide an old and new assembly name.");
191 bless ($self, $class);
194 $self->adaptor($adaptor);
195 $self->{
'old_assembly'} = $old_assembly;
196 $self->{
'new_assembly'} = $new_assembly;
198 # by default, merge fragments
199 $self->{
'merge_fragments'} = $merge_fragments || 1;
201 # by default, do not check length
202 $self->{
'check_length'} = $check_length || 0;
204 # by default, features and slices are expected in same database as the
206 $self->{
'external_source'} = $external_source || 0;
215 the
object to project
216 Arg[2] : String $to_assembly - assembly to project to
217 Example : my $new_slice = $assembly_projector->
project($old_slice,
219 Description : Projects a Slice or Feature to the specified assembly.
221 Several tests are performed on the result to discard unwanted
222 results. All projection segments have to be on the same
223 seq_region and strand. If -MERGE_FRAGMENTS is set, gaps will be
224 bridged by creating a single
object from first_segment_start to
225 last_segment_end. If -CHECK_LENGTH is set, the projected
object
226 will have to have the same length as the original. You can use
227 the last_status() method to find out what the result of some of
228 these rule tests were. Please see the comments in the code for
229 more details
about these rules.
231 The return value of this method will always be a single
object,
232 or undef if the projection fails any of the rules.
234 Note that when projecting features, only a "shallow" projection
235 is performed, i.e. attached features aren't projected
236 automatically! (e.g. if you project a gene, its transcripts will
237 have to be projected manually before storing the new gene)
238 Return type : same a Arg 1, or undef if projection fails any of the rules
239 Exceptions : thrown on invalid arguments
240 Caller : general, $self->old_to_new, $self->new_to_old
247 my ($self, $object, $to_assembly) = @_;
249 throw(
"Need an assembly version to project to.") unless ($to_assembly);
250 throw(
"Need an object to project.") unless ($object and ref($object));
252 my ($slice, $object_type);
254 if ($object->isa(
'Bio::EnsEMBL::Feature')) {
255 $object_type =
'feature';
256 } elsif ($object->isa(
'Bio::EnsEMBL::Slice')) {
257 $object_type =
'slice';
259 throw(
"Need a Feature or Slice to project.");
262 # if the feature or slice is sourced from another db, we have to "transfer"
263 # it to the db that contains the assembly mapping. the transfer is very
264 # shallow but that should do for our purposes
265 if ($self->external_source) {
266 my $slice_adaptor = $self->adaptor->get_SliceAdaptor;
268 if ($object_type eq
'feature') {
270 # createa a new slice from the target db
271 my $f_slice = $object->slice;
272 my $target_slice = $slice_adaptor->fetch_by_name($f_slice->name);
274 # now change the feature so that it appears it's from the target db
275 $object->slice($target_slice);
279 # createa a new slice from the target db
280 $object = $slice_adaptor->fetch_by_name($object->name);
285 if ($object_type eq
'feature') {
286 $slice = $object->feature_Slice;
291 # warn if trying to project to assembly version the object already is on
292 if ($slice->coord_system->version eq $to_assembly) {
293 warning(
"Assembly version to project to ($to_assembly) is the same as your object's assembly (".$slice->coord_system->version.
").");
296 # now project the slice
297 my $cs_name = $slice->coord_system_name;
298 my @segments = @{ $slice->project($cs_name, $to_assembly) };
300 # we need to reverse the projection segment list if the orignial
301 if ($slice->strand == -1) {
302 @segments = reverse(@segments);
305 # apply rules to projection results
307 # discard the projected feature/slice if
308 # 1. it doesn't project at all (no segments returned)
309 # 2. [unless MERGE_FRAGMENTS is set] the projection is fragmented (more
311 # 3. [if CHECK_LENGTH is set] the projection doesn't have the same length
312 # as the original feature/slice
313 # 4. all segments are on same chromosome and strand
315 # keep track of the status of applied rules
319 return undef unless (@segments);
320 #warn "DEBUG: passed test 1\n";
323 return undef
if (!($self->merge_fragments) and scalar(@segments) > 1);
324 push @status,
'fragmented' if (scalar(@segments) > 1);
325 #warn "DEBUG: passed test 2\n";
328 my $first_slice = $segments[0]->to_Slice;
329 my $last_slice = $segments[-1]->to_Slice;
330 my $length_mismatch = (($last_slice->end - $first_slice->start + 1) !=
332 return undef
if ($self->check_length and $length_mismatch);
333 push @status,
'length_mismatch' if ($length_mismatch);
334 #warn "DEBUG: passed test 3\n";
339 foreach my $seg (@segments) {
340 my $sl = $seg->to_Slice;
341 $sr_names{$sl->seq_region_name}++;
342 $strands{$sl->strand}++;
344 return undef
if (scalar(keys %sr_names) > 1 or scalar(keys %strands) > 1);
345 #warn "DEBUG: passed test 4\n";
347 # remember rule status
348 $self->last_status(join(
'|', @status));
350 # everything looks fine, so adjust the coords of your feature/slice
351 my $new_slice = $first_slice;
352 $new_slice->{
'end'} = $last_slice->end;
354 if ($object_type eq
'slice') {
358 $object->start($new_slice->start);
359 $object->end($new_slice->end);
360 $object->strand($new_slice->strand);
361 $object->slice($new_slice->seq_region_Slice);
363 # undef dbID and adaptor so you can store the feature in the target db
364 $object->dbID(undef);
365 $object->adaptor(undef);
376 the
object to project
377 Example : my $new_slice = $assembly_projector->old_to_new($old_slice);
378 Description : Projects a Slice or Feature from old to
new assembly.
379 This method is just a convenience wrapper
for $self->
project.
380 Return type : same a Arg 1, or undef
389 my ($self, $object) = @_;
390 return $self->project($object, $self->new_assembly);
397 the
object to project
398 Example : my $old_slice = $assembly_projector->new_to_old($new_slice, 1);
399 Description : Projects a Slice or Feature from
new to old assembly.
400 This method is just a convenience wrapper
for $self->
project.
401 Return type : same a Arg 1, or undef
410 my ($self, $object) = @_;
411 return $self->project($object, $self->old_assembly);
420 weaken($self->{
'adaptor'} = shift)
if (@_);
421 return $self->{
'adaptor'};
425 sub external_source {
427 $self->{
'external_source'} = shift
if (@_);
428 return $self->{
'external_source'};
434 $self->{
'old_assembly'} = shift
if (@_);
435 return $self->{
'old_assembly'};
441 $self->{
'new_assembly'} = shift
if (@_);
442 return $self->{
'new_assembly'};
446 sub merge_fragments {
448 $self->{
'merge_fragments'} = shift
if (@_);
449 return $self->{
'merge_fragments'};
455 $self->{
'check_length'} = shift
if (@_);
456 return $self->{
'check_length'};
462 $self->{
'last_status'} = shift
if (@_);
463 return $self->{
'last_status'};