Sequence alignments
The alignments in bio
are primarily designed for exploratory use, for aligning relatively short (up to ~30Kb long sequences), visually investigating the alignments, interacting with the sequences before and after alignment. In such cases the alignments will be generated in reasonable amounts of time (5sec per 10Kb). The implementations are mathematically optimal but the libraries that we rely on do not scale well to longer sequences.
Use a specially designed software that relies on heuristics to perform studies needing high throughput alignments. Specialzied software will operate (many) orders of magnitude faster. Depending on your needs blast
, blat
, mummer
, minimap2
, lastz
, lastal
, exonerate
, vsearch
, diamon
will be far better suited for genome wide analyses.
DNA alignment
Align the DNA corresponding to protein S
bio align ncov:S ratg13:S --end 60
# Ident=57(95.0%) Mis=3(5.0%) Gaps=0(0.0%) Target=(1, 60) Query=(1, 60) Length=60 Score=273.0 NUC.4.4(11,1)
YP_009724390 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACC
||||||||||||||||||||||||||||||||.||||||||||||||||||||.|||||. 60
QHR63300.2 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTTTCTAGTCAGTGTGTTAATCTAACAACT
DNA alignment with 1 letter amino acid codes
bio align ratg13:S ncov:S --end 60 -1
# Ident=57(95.0%) Mis=3(5.0%) Gaps=0(0.0%) Target=(1, 60) Query=(1, 60) Length=60 Score=273.0 NUC.4.4(11,1)
M F V F L V L L P L V S S Q C V N L T T
QHR63300.2 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTTTCTAGTCAGTGTGTTAATCTAACAACT
||||||||||||||||||||||||||||||||.||||||||||||||||||||.|||||. 60
YP_009724390 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACC
M F V F L V L L P L V S S Q C V N L T T
Reading frame will follow the slice!
DNA alignment with 3 letter amino acid codes
bio align ratg13:S ncov:S --end 60 -3
# Ident=57(95.0%) Mis=3(5.0%) Gaps=0(0.0%) Target=(1, 60) Query=(1, 60) Length=60 Score=273.0 NUC.4.4(11,1)
MetPheValPheLeuValLeuLeuProLeuValSerSerGlnCysValAsnLeuThrThr
QHR63300.2 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTTTCTAGTCAGTGTGTTAATCTAACAACT
||||||||||||||||||||||||||||||||.||||||||||||||||||||.|||||. 60
YP_009724390 ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACC
MetPheValPheLeuValLeuLeuProLeuValSerSerGlnCysValAsnLeuThrThr
Reading frame will follow the slice!
DNA alignment, tabular output
bio align ncov:S ratg13:S --end 90 --table
query target pident ident mism gaps score alen tlen tstart tend qlen qstart qend
QHR63300.2 YP_009724390.1 92.2 83 7 0 387.0 90 90 1 90 90 1 90
Align the translated regions
bio align ncov:S ratg13:S --end 90 --translate
# Ident=30(100.0%) Mis=0(0.0%) Gaps=0(0.0%) Target=(1, 30) Query=(1, 30) Length=30 Score=153.0 BLOSUM62(11,1)
YP_009724390 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN
|||||||||||||||||||||||||||||| 30
QHR63300.2 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN
Align the protein corresponding to gene S
The protein sequence is fetched from the data (if exists) and is not a translated DNA.
bio align ncov:S ratg13:S --end 30 --protein
# Ident=30(100.0%) Mis=0(0.0%) Gaps=0(0.0%) Target=(1, 30) Query=(1, 30) Length=30 Score=153.0 BLOSUM62(11,1)
YP_009724390 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN
|||||||||||||||||||||||||||||| 30
QHR63300.2 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTN
The slice now applies to the protein sequence.
Default alignment is global
With the default global alignment end gaps are have no penalty.
bio align THISLINE ISALIGNED -i
# Ident=4(36.4%) Mis=2(18.2%) Gaps=5(45.5%) Target=(3, 8) Query=(1, 8) Length=11 Score=8.0 BLOSUM62(11,1)
TARGET THISLI--NE-
--||..--||- 11
QUERY --ISALIGNED
There is a strict mode that applies end gap penalties.
Tabular output
All alignment may be formatted with tabular output
bio align THISLINE ISALIGNED -i --table
query target pident ident mism gaps score alen tlen tstart tend qlen qstart qend
QUERY TARGET 36.4 4 2 5 8.0 11 8 3 8 9 1 8
Local alignment
Will produce all local alignments.
bio align THISLINE ISALIGNED -i --local
# Ident=2(100.0%) Mis=0(0.0%) Gaps=0(0.0%) Target=(7, 8) Query=(7, 8) Length=2 Score=11.0 BLOSUM62(11,1)
TARGET TH
|| 2
QUERY NE
Global alignment
bio align THISLINE ISALIGNED -i --global
# Ident=4(36.4%) Mis=2(18.2%) Gaps=5(45.5%) Target=(3, 8) Query=(1, 8) Length=11 Score=8.0 BLOSUM62(11,1)
TARGET THISLI--NE-
--||..--||- 11
QUERY --ISALIGNED
Semiglobal alignment
Same as zero endgap global but reports only the aligned region:
bio align THISLINE ISALIGNED -i --semiglobal
# Ident=4(50.0%) Mis=2(25.0%) Gaps=2(25.0%) Target=(3, 8) Query=(1, 8) Length=8 Score=8.0 BLOSUM62(11,1)
TARGET ISLI--NE
||..--|| 8
QUERY ISALIGNE
Strict global alignment
Applies end gap penalities.
bio align THISLINE ISALIGNED -i --global --strict
# Ident=2(22.2%) Mis=6(66.7%) Gaps=1(11.1%) Target=(1, 8) Query=(1, 8) Length=9 Score=-7.0 BLOSUM62(11,1)
TARGET THISLINE-
......||- 9
QUERY ISALIGNED