ensembl-hive  2.6
Bio::EnsEMBL::BaseAlignFeature Class Reference
+ Inheritance diagram for Bio::EnsEMBL::BaseAlignFeature:

Public Member Functions

public Bio::EnsEMBL::BaseAlignFeature new ()
 
public String cigar_string ()
 
public String align_type ()
 
public Int alignment_length ()
 
protected Int _ensembl_cigar_alignment_length ()
 
public List ungapped_features ()
 
public Int strands_reversed ()
 
public void reverse_complement ()
 
protected void _ensembl_reverse_complement ()
 
public Bio::EnsEMBL::BaseAlignFeature transform ()
 
protected List _parse_ensembl_cigar ()
 
protected _parse_cigar ()
 
protected void _parse_features ()
 
protected _parse_ensembl_features ()
 
protected Int _hit_unit ()
 
protected Int _query_unit ()
 
protected Int _mdtag_alignment_length ()
 
protected Array _get_mdz_chunks ()
 
protected Array _get_mdz_alignment_length ()
 
protected String _get_mdz_chunk_type ()
 
protected Array _mdz_alignment_string ()
 
- Public Member Functions inherited from Bio::EnsEMBL::FeaturePair
public Bio::EnsEMBL::FeaturePair new ()
 
public String hseqname ()
 
public Int hstart ()
 
public Int hend ()
 
public hstrand ()
 
public Bio::EnsEMBL::Slice hslice ()
 
public String hseq_region_name ()
 
public Boolean hseq_region_strand ()
 
public Int hseq_region_start ()
 
public Int hseq_region_end ()
 
public Float score ()
 
public Float percent_id ()
 
public String species ()
 
public String hspecies ()
 
public String coverage ()
 
public String hcoverage ()
 
public String external_db_id ()
 
public String db_name ()
 
public String db_display_name ()
 
public Float p_value ()
 
public String hdescription ()
 
public String display_id ()
 
public Int identical_matches ()
 
public Int positive_matches ()
 
public Int group_id ()
 
public Int level_id ()
 
public void invert ()
 
public extra_data ()
 
public type ()
 
- Public Member Functions inherited from Bio::EnsEMBL::Feature
public Bio::EnsEMBL::Feature new ()
 
public Int start ()
 
public Int end ()
 
public Int strand ()
 
public void move ()
 
public Int length ()
 
public Bio::EnsEMBL::Analysis analysis ()
 
public Bio::EnsEMBL::Slice slice ()
 
public Boolean Or Undef equals ()
 
public Bio::EnsEMBL::Feature transform ()
 
public Bio::EnsEMBL::Feature transfer ()
 
public Listref project_to_slice ()
 
public Listref project ()
 
public String seqname ()
 
public String display_id ()
 
public String version ()
 
public Bio::EnsEMBL::Slice feature_Slice ()
 
public String seq_region_name ()
 
public Int seq_region_length ()
 
public Boolean seq_region_strand ()
 
public Int seq_region_start ()
 
public Int seq_region_end ()
 
public String coord_system_name ()
 
public String seq ()
 
public Listref get_all_alt_locations ()
 
public Boolean overlaps ()
 
public Boolean overlaps_local ()
 
public List get_overlapping_Genes ()
 
public Bio::EnsEMBL::Gene get_nearest_Gene ()
 
public String feature_so_acc ()
 
public String feature_so_term ()
 
public Hashref summary_as_hash ()
 
public String species ()
 
public sub_SeqFeature ()
 
public add_sub_SeqFeature ()
 
public flush_sub_SeqFeature ()
 
- Public Member Functions inherited from Bio::EnsEMBL::Storable
public Bio::EnsEMBL::Storable new ()
 
public Instance new_fast ()
 
public Int dbID ()
 
public Bio::EnsEMBL::DBSQL::BaseAdaptor adaptor ()
 
public Boolean is_stored ()
 
public get_all_DAS_Features ()
 

Detailed Description

Synopsis

-slice => $slice,
-start => 100,
-end => 120,
-strand => 1,
-hseqname => 'SP:RF1231',
-hstart => 200,
-hend => 220,
-analysis => $analysis,
-cigar_string => '10M3D5M2I',
-align_type => 'ensembl'
);
where $analysis is a Bio::EnsEMBL::Analysis object.
Alternatively if you have an array of ungapped features:
my $feat =
new Bio::EnsEMBL::DnaPepAlignFeature( -features => \@features );
where @features is an array of Bio::EnsEMBL::FeaturePair objects.
There is a method to (re)create ungapped features from the cigar_string:
my @ungapped_features = $feat->ungapped_features();
Bio::EnsEMBL::BaseAlignFeature inherits from:
Bio::EnsEMBL::FeaturePair, which in turn inherits from:
Bio::EnsEMBL::Feature,
thus methods from both parent classes are available.
The cigar_string is a condensed representation of the matches and gaps
which make up the gapped alignment (where CIGAR stands for
Concise Idiosyncratic Gapped Alignment Report).
CIGAR format is: n <matches> [ x <deletes or inserts> m <matches> ]*
where M = match, D = delete, I = insert; n, m are match lengths;
x is delete or insert length.
Spaces are omitted, thus: "23M4I12M2D1M"
as are counts for any lengths of 1, thus 1M becomes M: "23M4I12M2DM"
To make things clearer this is how a blast HSP would be parsed:
>AK014066
Length = 146
Minus Strand HSPs:
Score = 76 (26.8 bits), Expect = 1.4, P = 0.74
Identities = 20/71 (28%), Positives = 29/71 (40%), Frame = -1
Query: 479 GLQAPPPTPQGCRLIPPPPLGLQAPLPTLRAVGSSHHHP*GRQGSSLSSFRSSLASKASA 300
G APPP PQG R P P G + P L + + ++ R +A +
Sbjct: 7 GALAPPPAPQG-RWAFPRPTG-KRPATPLHGTARQDRQVRRSEAAKVTGCRGRVAPHVAP 64
Query: 299 SSPHNPSPLPS 267
H P+P P+
Sbjct: 65 PLTHTPTPTPT 75
The alignment goes from 479 down to 267 in the query sequence on the reverse
strand, and from 7 to 75 in the subject sequence.
The alignment is made up of the following ungapped pieces:
query_seq start 447 , sbjct_seq hstart 7 , match length 33 , strand -1
query_seq start 417 , sbjct_seq hstart 18 , match length 27 , strand -1
query_seq start 267 , sbjct_seq hstart 27 , match length 147 , strand -1
When assembled into a DnaPepAlignFeature where:
(seqname, start, end, strand) refer to the query sequence,
(hseqname, hstart, hend, hstrand) refer to the subject sequence,
these ungapped pieces are represented by the cigar string:
33M3I27M3I147M
with start 267, end 479, strand -1, and hstart 7, hend 75, hstrand 1.

Definition at line 90 of file BaseAlignFeature.pm.

Member Function Documentation

◆ _ensembl_cigar_alignment_length()

protected Int Bio::EnsEMBL::BaseAlignFeature::_ensembl_cigar_alignment_length ( )
  Arg [1]    : None
  Description: return the alignment length (including indels) based on the cigar_string
  Returntype : int
  Exceptions :
  Caller     :
  Status     : Stable
 
Code:
click to view

◆ _ensembl_reverse_complement()

protected void Bio::EnsEMBL::BaseAlignFeature::_ensembl_reverse_complement ( )
  Args       : none
  Description: reverse complement the FeaturePair for ensembl cigar string,
               modifing strand, hstrand and cigar_string in consequence
  Returntype : none
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ _get_mdz_alignment_length()

protected Array Bio::EnsEMBL::BaseAlignFeature::_get_mdz_alignment_length ( )
  Arg [1]    : array of strings
  Description: calculate the alignment length from the given chunks
  Returntype : array of strings
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ _get_mdz_chunk_type()

protected String Bio::EnsEMBL::BaseAlignFeature::_get_mdz_chunk_type ( )
  Arg [1]    : char
  Description: get the chunk type
  Returntype : string
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ _get_mdz_chunks()

protected Array Bio::EnsEMBL::BaseAlignFeature::_get_mdz_chunks ( )
  Arg [1]    : mdtag string - MD Z String for mismatching positions. Regex : [0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)* (Refer:  SAM/BAM specification)
  Description: parses the mdtag string and group it according the type
               eg: MD:Z:35^VIVALE31^GRPLIQPRRKKAYQLEHTFQGLLGKRSLFTE10 returns ['35', '^', 'VIVALE', '31', '^', 'GRPLIQPRRKKAYQLEHTFQGLLGKRSLFTE', '10']
  Returntype : array of strings
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ _hit_unit()

protected Int Bio::EnsEMBL::BaseAlignFeature::_hit_unit ( )
  Args       : none
  Description: abstract method, overwrite with something that returns
               one or three
  Returntype : int 1,3
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ _mdtag_alignment_length()

protected Int Bio::EnsEMBL::BaseAlignFeature::_mdtag_alignment_length ( )
  Arg [1]    : None
  Description: return the alignment length (including indels) based on the mdtag (mdz) string
  Returntype : int
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ _mdz_alignment_string()

protected Array Bio::EnsEMBL::BaseAlignFeature::_mdz_alignment_string ( )
  Arg [1]    : input sequence
  Arg [2]    : MD Z String for mismatching positions. Regex : [0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)* (Refer:  SAM/BAM specification)
               eg: MD:Z:96^RHKTDSFVGLMGKRALNS0V14
  Example    :
$pf->alignment_strings
  Description: Allows to rebuild the alignment string of both the seq and hseq sequence
  Returntype : array reference containing 2 strings
               the first corresponds to seq
               the second corresponds to hseq
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ _parse_cigar()

protected Bio::EnsEMBL::BaseAlignFeature::_parse_cigar ( )

Undocumented method

Code:
click to view

◆ _parse_ensembl_cigar()

protected List Bio::EnsEMBL::BaseAlignFeature::_parse_ensembl_cigar ( )
  Args       : none
  Description: PRIVATE (internal) method - creates ungapped features from 
               internally stored cigar line in ensembl format
  Returntype : list of Bio::EnsEMBL::FeaturePair
  Exceptions : none
  Caller     : ungapped_features
  Status     : Stable
 
Code:
click to view

◆ _parse_ensembl_features()

protected Bio::EnsEMBL::BaseAlignFeature::_parse_ensembl_features ( )

Undocumented method

Code:
click to view

◆ _parse_features()

protected void Bio::EnsEMBL::BaseAlignFeature::_parse_features ( )
  Arg  [1]   : listref Bio::EnsEMBL::FeaturePair $ungapped_features
  Description: creates internal cigar_string and start,end hstart,hend
               entries.
  Returntype : none, fills in values of self
  Exceptions : argument list undergoes many sanity checks - throws under many
               invalid conditions
  Caller     : new
  Status     : Stable
 
Code:
click to view

◆ _query_unit()

protected Int Bio::EnsEMBL::BaseAlignFeature::_query_unit ( )
  Args       : none
  Description: abstract method, overwrite with something that returns
               one or three
  Returntype : int 1,3
  Exceptions : none
  Caller     : internal
  Status     : Stable
 
Code:
click to view

◆ align_type()

public String Bio::EnsEMBL::BaseAlignFeature::align_type ( )
  Arg [1]    : type $align_type
  Example    :
$feature->align_type( "ensembl" );
  Description: get/set for attribute align_type.
               align_type specifies which cigar string 
               is used to describe the alignment:
               The default is 'ensembl'
  Returntype : string
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ alignment_length()

public Int Bio::EnsEMBL::BaseAlignFeature::alignment_length ( )
  Arg [1]    : None
  Description: return the alignment length (including indels) based on the alignment_type ('ensembl', 'mdtag')
  Returntype : int
  Exceptions : 
  Caller     : 
  Status     : Stable
 
Code:
click to view

◆ cigar_string()

public String Bio::EnsEMBL::BaseAlignFeature::cigar_string ( )
  Arg [1]    : string $cigar_string
  Example    :
$feature->cigar_string( "12MI3M" );
  Description: get/set for attribute cigar_string.
               cigar_string describes the alignment:
                 "xM" stands for x matches (or mismatches),
                 "xI" for x inserts into the query sequence,
                 "xD" for x deletions from the query sequence
                 where the query sequence is specified by (seqname, start, ...)
                 and the subject sequence by (hseqname, hstart, ...).
               An "x" that is 1 can be omitted.
               See the SYNOPSIS for an example.
  Returntype : string
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ new()

public Bio::EnsEMBL::BaseAlignFeature Bio::EnsEMBL::BaseAlignFeature::new ( )
  Arg [..]   : List of named arguments. (-cigar_string , -features, -align_type) defined
               in this constructor, others defined in FeaturePair and 
               SeqFeature superclasses.  Either cigar_string or a list
               of ungapped features should be provided - not both.
  Example    :
$baf = new BaseAlignFeatureSubclass(-cigar_string => '3M3I12M', -align_type => 'ensembl');
  Description: Creates a new BaseAlignFeature using either a cigar string or
               a list of ungapped features.  BaseAlignFeature is an abstract
               baseclass and should not actually be instantiated - rather its
               subclasses should be.
  Returntype : Bio::EnsEMBL::BaseAlignFeature
  Exceptions : thrown if both feature and cigar string args are provided
               thrown if neither feature nor cigar string args are provided
               warn if cigar string is provided without cigar type
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ reverse_complement()

public void Bio::EnsEMBL::BaseAlignFeature::reverse_complement ( )
  Args       : none
  Description: reverse complement the FeaturePair based on the cigar type
               modifing strand, hstrand and cigar_string in consequence
  Returntype : none
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ strands_reversed()

public Int Bio::EnsEMBL::BaseAlignFeature::strands_reversed ( )
 
  Arg [1]    : int $strands_reversed
  Description: get/set for attribute strands_reversed
               0 means that strand and hstrand are the original strands obtained
                 from the alignment program used
               1 means that strand and hstrand have been flipped as compared to
                 the original result provided by the alignment program used.
                 You may want to use the reverse_complement method to restore the
                 original strandness.
  Returntype : int
  Exceptions : none
  Caller     : general
  Status     : Stable
 
Code:
click to view

◆ transform()

public Bio::EnsEMBL::BaseAlignFeature Bio::EnsEMBL::BaseAlignFeature::transform ( )
  Arg  1     : String $coordinate_system_name
  Arg [2]    : String $coordinate_system_version
  Example    :
$feature = $feature->transform('contig');
$feature = $feature->transform('chromosome', 'NCBI33');
  Description: Moves this AlignFeature to the given coordinate system.
               If the feature cannot be transformed to the destination 
               coordinate system undef is returned instead.
  Returntype : Bio::EnsEMBL::BaseAlignFeature;
  Exceptions : wrong parameters
  Caller     : general
  Status     : Medium Risk
 
Code:
click to view

◆ ungapped_features()

public List Bio::EnsEMBL::BaseAlignFeature::ungapped_features ( )
  Args       : none
  Example    :
@ungapped_features = $align_feature->get_feature
  Description: converts the internal cigar_string into an array of
               ungapped feature pairs
  Returntype : list of Bio::EnsEMBL::FeaturePair
  Exceptions : cigar_string not set internally
  Caller     : general
  Status     : Stable
 
Code:
click to view

The documentation for this class was generated from the following file:
Bio::EnsEMBL::DnaPepAlignFeature
Definition: DnaPepAlignFeature.pm:12
Bio::EnsEMBL::BaseAlignFeature::_ensembl_cigar_alignment_length
protected Int _ensembl_cigar_alignment_length()
Bio::EnsEMBL::Feature::end
public Int end()
Bio::EnsEMBL::BaseAlignFeature::_ensembl_reverse_complement
protected void _ensembl_reverse_complement()
Bio::EnsEMBL::BaseAlignFeature::align_type
public String align_type()
Bio::EnsEMBL::Feature::analysis
public Bio::EnsEMBL::Analysis analysis()
Bio::EnsEMBL::BaseAlignFeature::alignment_length
public Int alignment_length()
Bio::EnsEMBL::BaseAlignFeature::_query_unit
protected Int _query_unit()
Bio::EnsEMBL::Feature::strand
public Int strand()
Bio::EnsEMBL::BaseAlignFeature::_get_mdz_alignment_length
protected Array _get_mdz_alignment_length()
Bio::EnsEMBL::BaseAlignFeature::transform
public Bio::EnsEMBL::BaseAlignFeature transform()
Bio::EnsEMBL::BaseAlignFeature::cigar_string
public String cigar_string()
Bio::EnsEMBL::BaseAlignFeature::ungapped_features
public List ungapped_features()
Bio::EnsEMBL::FeaturePair::hstart
public Int hstart()
Bio::EnsEMBL::BaseAlignFeature::_parse_cigar
protected _parse_cigar()
Bio::EnsEMBL::Feature::start
public Int start()
Bio::EnsEMBL::FeaturePair
Definition: FeaturePair.pm:56
Bio::EnsEMBL::FeaturePair::hend
public Int hend()
Bio::EnsEMBL::Analysis
Definition: PairAlign.pm:3
Bio::EnsEMBL::BaseAlignFeature::_get_mdz_chunk_type
protected String _get_mdz_chunk_type()
Bio::EnsEMBL::Feature::length
public Int length()
Bio::EnsEMBL::Feature::project
public Listref project()
Bio::EnsEMBL::BaseAlignFeature::_parse_ensembl_features
protected _parse_ensembl_features()
Bio::EnsEMBL::FeaturePair::new
public Bio::EnsEMBL::FeaturePair new()
Bio::EnsEMBL::BaseAlignFeature::strands_reversed
public Int strands_reversed()
Bio::EnsEMBL::Feature::slice
public Bio::EnsEMBL::Slice slice()
Bio::EnsEMBL::BaseAlignFeature::_parse_ensembl_cigar
protected List _parse_ensembl_cigar()
Bio::EnsEMBL::BaseAlignFeature::_mdtag_alignment_length
protected Int _mdtag_alignment_length()
Bio::EnsEMBL::BaseAlignFeature::_parse_features
protected void _parse_features()
Bio::EnsEMBL::Feature::seq
public String seq()
Bio::EnsEMBL::BaseAlignFeature::_hit_unit
protected Int _hit_unit()
Bio::EnsEMBL::BaseAlignFeature::_mdz_alignment_string
protected Array _mdz_alignment_string()
Bio::EnsEMBL::BaseAlignFeature::reverse_complement
public void reverse_complement()
Bio::EnsEMBL::FeaturePair::hseqname
public String hseqname()
Bio::EnsEMBL::BaseAlignFeature::_get_mdz_chunks
protected Array _get_mdz_chunks()