3 See the NOTICE file distributed with
this work
for additional information
4 regarding copyright ownership.
6 Licensed under the Apache License, Version 2.0 (the
"License");
7 you may not use
this file except in compliance with the License.
8 You may obtain a copy of the License at
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an
"AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License
for the specific language governing permissions and
16 limitations under the License.
23 Please email comments or questions to the
public Ensembl
24 developers list at <http:
26 Questions may also be sent to the Ensembl help desk at
37 Allows features created externally from Ensembl in a single
38 coordinate system to be retrieved in several other (Ensembl-style)
39 coordinate systems. This is intended to be a replacement
for the old
45 -host =>
'kaka.sanger.ac.uk',
46 -dbname =>
'homo_sapiens_core_9_30',
50 $xf_adaptor =
new ExternalFeatureAdaptorSubClass;
52 # Connect the Ensembl core database:
53 $xf_adaptor->db($database_adaptor);
55 # get some features in vontig coords
57 @{ $xf_adaptor->fetch_all_by_contig_name(
'AC000087.2.1.42071') };
59 # get some features in assembly coords
61 @{ $xf_adaptor->fetch_all_by_chr_start_end(
'X', 100000, 200000 ) };
63 # get some features in clone coords
64 @feats = @{ $xf_adaptor->fetch_all_by_clone_accession(
'AC000087') };
66 # Add the adaptor to the ensembl core dbadaptor (implicitly sets db
68 $database_adaptor->add_ExternalFeatureAdaptor($xf_adaptor);
70 # get some features in Slice coords
71 $slice_adaptor = $database_adaptor->get_SliceAdaptor;
73 $slice_adaptor->fetch_all_by_chr_start_end( 1, 100000, 200000 );
74 @feats = @{ $xf_adaptor->fetch_all_by_Slice($slice) };
76 # now features can be retrieved directly from Slice
77 @feats = @{ $slice->get_all_ExternalFeatures };
81 This
class is intended to be used as a method of getting external
82 features into
EnsEMBL. To work, this class must be extended and must
83 implement the the coordinate_systems method. As well, the subclass
84 is required to implement a single fetch method so that the external
85 features may be retrieved. By implementing a single fetch_method in a
87 methods become available for retrieving the data in several different
90 The coordinate_systems method should return a list of strings indicating
91 which coordinate system(s) have been implemented. If a given string is
92 returned from the coordinate_systems method then the corresponding fetch
93 method must be implemented. The reverse is also true: if a fetch method
94 is implemented then coordinate_systems must return the appropriate
95 string in its list of return values. The following are the valid
96 coordinate system values and the corresponding fetch methods that must
99 COORD SYSTEM STRING FETCH_METHOD
100 ------------------- ------------
101 'ASSEMBLY' fetch_all_by_chr_start_end
102 'CLONE' fetch_all_by_clone_accession
103 'CONTIG' fetch_all_by_contig_name
104 'SUPERCONTIG' fetch_all_by_supercontig_name
105 'SLICE' fetch_all_by_Slice
107 The objects returned by the fetch methods should be
EnsEMBL or BioPerl
108 style
Feature objects. These objects MUST have start, end and strand
112 be called an
EnsEMBL core database adaptor must be attached to the
114 the remappings between various coordinate system. This may be done
116 through a call to the DBAdaptor add_ExternalFeatureAdaptor method or
135 extend this constructor and provide your own set of paremeters.
146 return bless {}, ref $class;
149 return bless {}, $class;
157 Example : $external_feature_adaptor->ensembl_db($new_val);
166 my ($self, $value) = @_;
169 $self->{
'ensembl_db'} = $value;
172 return $self->{
'ensembl_db'};
177 =head2 coordinate_systems
180 Example : @implemented_coord_systems = $ext_adaptor->coordinate_systems;
181 Description: ABSTRACT method. Must be implemented by all
182 ExternalFeatureAdaptor subclasses. This method returns a list
183 of coordinate systems which are implemented by the subclass.
184 A minimum of on valid coordinate system must be implemented.
185 Valid coordinate systems are:
'SLICE',
'ASSEMBLY',
'CONTIG',
187 Returntype : list of strings
193 sub coordinate_systems {
196 throw(
"abstract method coordinate_systems not implemented\n");
205 Example : $track_name = $xf_adaptor->track_name;
206 Description: Currently
this is not really used. In the future it may be
207 possible to have ExternalFeatures automatically displayed by
208 the
EnsEMBL web code. By
default this method returns
209 'External features' but you are encouraged to
override this
210 method and provide your own meaningful name
for the features
211 your adaptor provides. This also allows you to distinguish the
212 type of features retrieved from Slices. See
225 return 'External features';
233 Example : $feature_type = $xf_adaptor->track_name
234 Description: Currently
this is not used. In the future it may be possible
235 to have ExternalFeatures automatically displayed by the
EnsEMBL
236 web code. This method would then be used
do determine the
237 type of glyphs used to draw the features which are returned
238 from
this external adaptor.
253 =head2 fetch_all_by_Slice
256 Example : @features = @{$ext_adaptor->fetch_all_by_Slice($slice)};
257 Description: Retrieves all features which lie in the region defined
258 by $slice in slice coordinates.
260 If
this method is overridden then the coordinate_systems method
261 must
return 'SLICE' as one of its values.
263 This method will work as is (i.e. without overriding it)
264 providing at least one of the other fetch methods is overridden.
265 Returntype : reference to a list of Bio::SeqFeature objects in the Slice
267 Exceptions : Thrown on incorrect input arguments
268 Caller : general, fetch_all_by_chr_start_end
272 sub fetch_all_by_Slice {
273 my ($self, $slice) = @_;
275 unless($slice && ref $slice && $slice->isa(
'Bio::EnsEMBL::Slice')) {
276 throw(
"[$slice] is not a Bio::EnsEMBL::Slice");
281 my $csa = $self->ensembl_db->get_CoordSystemAdaptor();
283 my $slice_start = $slice->
start;
284 my $slice_end = $slice->end;
285 my $slice_strand = $slice->strand;
286 my $slice_seq_region = $slice->seq_region_name;
287 my $slice_seq_region_id = $slice->get_seq_region_id;
288 my $coord_system = $slice->coord_system;
290 if($self->_supported(
'SLICE')) {
291 throw(
"ExternalFeatureAdaptor supports SLICE coordinate system" .
292 " but fetch_all_by_Slice not implemented");
296 my $from_coord_system;
301 # Get all of the features from whatever coord system they are computed in
303 if($self->_supported(
'CLONE')) {
304 $fetch_method = sub {
307 my ($acc, $ver) = split(/\./, $name);
308 $self->fetch_all_by_clone_accession($acc,$ver,@_);
310 $from_coord_system = $csa->fetch_by_name(
'clone');
311 } elsif($self->_supported(
'ASSEMBLY')) {
312 $from_coord_system = $csa->fetch_by_name(
'chromosome');
313 $fetch_method = $self->can(
'fetch_all_by_chr_start_end');
314 } elsif($self->_supported(
'CONTIG')) {
315 $from_coord_system = $csa->fetch_by_name(
'contig');
316 $fetch_method = $self->can(
'fetch_all_by_contig_name');
317 } elsif($self->_supported(
'SUPERCONTIG')) {
318 $from_coord_system = $csa->fetch_by_name(
'supercontig');
319 $fetch_method = $self->can(
'fetch_all_by_supercontig_name');
321 $self->_no_valid_coord_systems();
324 if($from_coord_system->equals($coord_system)) {
325 $features{$slice_seq_region} = &$fetch_method($self, $slice_seq_region,
326 $slice_start,$slice_end);
328 foreach my $segment (@{$slice->project($from_coord_system->name,
329 $from_coord_system->version)}) {
330 my ($start,$end,$pslice) = @$segment;
331 $features{$pslice->seq_region_name } ||= [];
332 push @{$features{$pslice->seq_region_name }},
333 @{&$fetch_method($self, $pslice->seq_region_name,
341 if(!$coord_system->equals($from_coord_system)) {
342 my $asma = $self->ensembl_db->get_AssemblyMapperAdaptor();
343 my $mapper = $asma->fetch_by_CoordSystems($from_coord_system,
346 my $slice_adaptor = $self->ensembl_db->get_SliceAdaptor();
349 #convert the coordinates of each of the features retrieved
350 foreach my $fseq_region (keys %features) {
351 my $feats = $features{$fseq_region};
353 $slice_setter = _guess_slice_setter($feats)
if(!$slice_setter);
355 foreach my $f (@$feats) {
356 my($sr_id, $start, $end, $strand) =
357 $mapper->fastmap($fseq_region,$f->start,$f->end,$f->strand,
361 next
if(!defined($sr_id));
363 #maps to unexpected seq region, probably error in the externally
364 if($sr_id ne $slice_seq_region_id) {
365 warning(
"Externally created Feature mapped to [$sr_id] " .
366 "which is not on requested seq_region_id [$slice_seq_region_id]");
370 #update the coordinates of the feature
371 &$slice_setter($f,$slice);
379 #we already know the seqregion the featues are on, we just have
380 #to put them on the slice
381 @out = @{$features{$slice_seq_region}};
382 my $slice_setter = _guess_slice_setter(\@out);
384 foreach my $f (@out) {
385 &$slice_setter($f,$slice);
389 # convert from assembly coords to slice coords
390 # handle the circular slice case
391 my $seq_region_len = $slice->seq_region_length();
392 foreach my $f (@out) {
393 my($f_start, $f_end, $f_strand);
395 if ($slice->strand == 1) { # Positive strand
396 $f_start = $f->start - $slice_start + 1;
397 $f_end = $f->end - $slice_start + 1;
398 $f_strand = $f->strand;
400 if ($slice->is_circular()) { # Handle cicular chromosomes
401 if ($f_start > $f_end) { # Looking at a feature overlapping the chromsome origin.
402 if ($f_end > $slice_start) {
403 # Looking at the region in the beginning of the chromosome.
404 $f_start -= $seq_region_len;
408 $f_end += $seq_region_len;
411 if ($slice_start > $slice_end && $f_end < 0) {
412 # Looking at the region overlapping the chromosome origin and
413 # a feature which is at the beginning of the chromosome.
414 $f_start += $seq_region_len;
415 $f_end += $seq_region_len;
419 }
else { # Negative strand
420 my ($seq_region_start, $seq_region_end) = ($f->start, $f->end);
421 $f_start = $slice_end - $seq_region_end + 1;
422 $f_end = $slice_end - $seq_region_start + 1;
423 $f_strand = $f->strand * -1;
425 if ($slice->is_circular()) {
426 if ($slice_start > $slice_end) { # slice spans origin or replication
427 if ($seq_region_start >= $slice_start) {
428 $f_end += $seq_region_len;
429 $f_start += $seq_region_len
430 if $seq_region_end > $slice_start;
432 } elsif ($seq_region_start <= $slice_end) {
434 } elsif ($seq_region_end >= $slice_start) {
435 $f_start += $seq_region_len;
436 $f_end += $seq_region_len;
437 } elsif ($seq_region_end <= $slice_end) {
438 $f_end += $seq_region_len
440 } elsif ($seq_region_start > $seq_region_end) {
441 $f_end += $seq_region_len;
444 if ($seq_region_start <= $slice_end and $seq_region_end >= $slice_start) {
446 } elsif ($seq_region_start > $seq_region_end) {
447 if ($seq_region_start <= $slice_end) {
448 $f_start -= $seq_region_len;
449 } elsif ($seq_region_end >= $slice_start) {
450 $f_end += $seq_region_len;
459 $f->strand($f_strand);
466 sub _guess_slice_setter {
467 my $features = shift;
469 #we do not know what type of features these are. They might
470 #be bioperl features or old ensembl features, hopefully they are new
471 #style features. Try to come up with a setter method for the
474 return undef
if(!@$features);
476 my ($f) = @$features;
479 foreach my $method (qw(slice contig attach_seq)) {
480 last
if($slice_setter = $f->can($method));
484 if($f->can(
'seqname')) {
485 $slice_setter = sub { $_[0]->seqname($_[1]->seq_region_name()); };
487 $slice_setter = sub{}
if(!$slice_setter);
491 return $slice_setter;
495 =head2 fetch_all_by_contig_name
497 Arg [1] :
string $contig_name
498 Arg [2] :
int $start (optional)
499 The start of the region on the contig to retrieve features on
500 if not specified the whole of the contig is used.
501 Arg [3] :
int $end (optional)
502 The end of the region on the contig to retrieve features on
503 if not specified the whole of the contig is used.
504 Example : @fs = @{$self->fetch_all_by_contig_name(
'AB00879.1.1.39436')};
505 Description: Retrieves features on the contig defined by the name
506 $contig_name in contig coordinates.
508 If
this method is overridden then the coordinate_systems
509 method must
return 'CONTIG' as one of its values.
511 This method will work as is (i.e. without being overridden)
512 providing at least one other fetch method has
514 Returntype : reference to a list of Bio::SeqFeature objects in the contig
516 Exceptions : thrown
if the input argument is incorrect
517 thrown
if the coordinate_systems method returns the value
518 'CONTIG' and
this method has not been overridden.
519 Caller : general, fetch_all_by_Slice
523 sub fetch_all_by_contig_name {
524 my ($self, $contig_name, $start, $end) = @_;
526 unless($contig_name) {
527 throw(
"contig_name argument not defined");
530 if($self->_supported(
'CONTIG')) {
531 throw(
"ExternalFeatureAdaptor supports CONTIG coordinate system" .
532 " but fetch_all_by_contig_name is not implemented");
535 unless($self->ensembl_db) {
536 throw(
'DB attribute not set. This value must be set for the ' .
537 'ExternalFeatureAdaptor to function correctly');
540 my $slice_adaptor = $self->ensembl_db->get_SliceAdaptor();
541 my $slice = $slice_adaptor->fetch_by_region(
'contig', $contig_name,
543 return $self->fetch_all_by_Slice($slice);
548 =head2 fetch_all_by_supercontig_name
550 Arg [1] :
string $supercontig_name
551 Arg [2] :
int $start (optional)
552 The start of the region on the contig to retrieve features on
553 if not specified the whole of the contig is used.
554 Arg [3] :
int $end (optional)
555 The end of the region on the contig to retrieve features on
556 if not specified the whole of the contig is used.
557 Example : @fs = @{$self->fetch_all_by_contig_name(
'NT_004321')};
558 Description: Retrieves features on the contig defined by the name
559 $supercontigname in supercontig coordinates.
561 If
this method is overridden then the coordinate_systems
562 method must
return 'SUPERCONTIG' as one of its values.
564 This method will work as is (i.e. without being overridden)
565 providing at least one other fetch method has
567 Returntype : reference to a list of Bio::SeqFeature objects in the contig
569 Exceptions : thrown
if the input argument is incorrect
570 thrown
if the coordinate_systems method returns the value
571 'SUPERCONTIG' and
this method has not been overridden.
572 Caller : general, fetch_all_by_Slice
577 sub fetch_all_by_supercontig_name {
578 my ($self, $supercontig_name, $start, $end) = @_;
580 unless($supercontig_name) {
581 throw(
"supercontig_name argument not defined");
584 if($self->_supported(
'SUPERCONTIG')) {
585 throw(
"ExternalFeatureAdaptor supports SUPERCONTIG coordinate system" .
586 " but fetch_all_by_supercontig_name is not implemented");
589 unless($self->ensembl_db) {
590 throw(
'DB attribute not set. This value must be set for the ' .
591 'ExternalFeatureAdaptor to function correctly');
594 my $slice_adaptor = $self->ensembl_db->get_SliceAdaptor();
595 my $slice = $slice_adaptor->fetch_by_region(
'supercontig', $supercontig_name,
597 return $self->fetch_all_by_Slice($slice);
601 =head2 fetch_all_by_clone_accession
603 Arg [1] :
string $acc
604 The EMBL
accession number of the clone to fetch features from.
605 Arg [2] : (optional)
string $ver
606 Arg [3] : (optional)
int $start
607 Arg [4] : (optional)
int $end
609 Example : @fs = @{$self->fetch_all_by_clone_accession(
'AC000093')};
610 Description: Retrieves features on the clone defined by the $acc arg in
613 If
this method is overridden then the coordinate_systems method
614 must
return 'CLONE' as one of its values. The arguments
615 start, end, version are passed
if this method is overridden and
616 can optionally be used to reduce the scope of the query and
619 This method will work as is - providing at least one other
620 fetch method has been overridden.
621 Returntype : reference to a list of Bio::SeqFeature objects in the Clone
623 Exceptions : thrown
if the input argument is incorrect
624 thrown
if the coordinate system method returns the value
'CLONE'
625 and
this method is not overridden.
626 thrown
if the coordinate systems method does not
return any
628 Caller : general, fetch_all_by_clone_accession
632 sub fetch_all_by_clone_accession {
633 my ($self, $acc, $version, $start, $end) = @_;
636 throw(
"clone accession argument not defined");
639 if($self->_supported(
'CLONE')) {
640 throw(
'ExternalFeatureAdaptor supports CLONE coordinate system ' .
641 'but does not implement fetch_all_by_clone_accession');
644 unless($self->ensembl_db) {
645 throw(
'DB attribute not set. This value must be set for the ' .
646 'ExternalFeatureAdaptor to function correctly');
649 if(defined($version)) {
650 $acc =
"$acc.$version";
651 } elsif(!$acc =~ /\./) {
655 my $slice_adaptor = $self->ensembl_db->get_SliceAdaptor;
657 my $slice = $slice_adaptor->fetch_by_region(
'clone', $acc, $start, $end);
659 return $self->fetch_all_by_Slice($slice);
664 =head2 fetch_all_by_chr_start_end
666 Arg [1] :
string $chr_name
667 The name of the chromosome to retrieve features from
669 The start coordinate of the chromosomal region to retrieve
672 The end coordinate of the chromosomal region to retrieve
675 Description: Retrieves features on the region defined by the $chr_name,
676 $start, and $end args in assembly (chromosomal) coordinates.
678 If
this method is overridden then the coordinate_systems method
679 must
return 'ASSEMBLY' as one of its values.
681 This method will work as is (i.e. without overriding it)
682 providing at least one of the other fetch methods is overridden.
683 Returntype : reference to a list of Bio::SeqFeatures
684 Exceptions : Thrown
if the coordinate_systems method returns ASSEMBLY as a
685 value and
this method is not overridden.
686 Thrown
if any of the input arguments are incorrect
687 Caller : general, fetch_all_by_Slice
691 sub fetch_all_by_chr_start_end {
692 my ($self, $chr_name, $start, $end) = @_;
694 unless($chr_name && defined $start && defined $end && $start < $end) {
695 throw(
"Incorrect start [$start] end [$end] or chr [$chr_name] arg");
698 unless($self->ensembl_db) {
699 throw(
'DB attribute not set. This value must be set for the ' .
700 'ExternalFeatureAdaptor to function correctly');
703 my $slice_adaptor = $self->ensembl_db->get_SliceAdaptor();
705 my $slice = $slice_adaptor->fetch_by_region(
'toplevel', $chr_name, $start,
708 return $self->fetch_all_by_Slice($slice);
712 =head2 _no_valid_coord_system
716 Description: PRIVATE method -
throws an error with a descriptive message
718 Exceptions : always thrown
723 sub _no_valid_coord_system {
726 throw(
"This ExternalFeatureAdaptor does not support a known " .
727 "coordinate system.\n Valid coordinate systems are: " .
728 "[SLICE,ASSEMBLY,SUPERCONTIG,CONTIG,CLONE].\n This External Adaptor " .
729 "supports: [" . join(
', ', $self->coordinate_systems) .
"]");
737 Arg [1] :
string $system
738 Example : print
"CONTIG system supported" if($self->_supported(
'CONTIG'));
739 Description: PRIVATE method. Tests
if the coordinate system defined by
740 the $system argument is implemented.
748 my ($self, $system) = @_;
750 #construct the hash of supported features if it has not been already
751 unless(exists $self->{_supported}) {
752 $self->{_supported} = {};
753 foreach my $coord_system ($self->coordinate_systems) {
754 $self->{_supported}->{$coord_system} = 1;
758 return $self->{_supported}->{$system};