Alignment tools¶
Copyright (C) 2018 Arthur Zwaenepoel
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: arzwa@psb.vib-ugent.be
-
wgd.alignment.
align
(in_file, out_file, aligner)¶ Alignment wrapper
Parameters: - aligner – alignment program
- in_file – input fasta file
- out_file – output fasta file
-
wgd.alignment.
align_mafft
(in_file, out_file, bin_path='mafft')¶ Perform multiple sequence alignment with MAFFT
Parameters: - bin_path – path to MAFFT executable
- in_file – input fasta file
- out_file – output fasta file
-
wgd.alignment.
align_muscle
(in_file, out_file, bin_path='muscle')¶ Perform multiple sequence alignment with MUSCLE
Parameters: - bin_path – path to MUSCLE executable
- in_file – input fasta file
- out_file – output fasta file
-
wgd.alignment.
align_prank
(in_file, out_file, bin_path='prank')¶ Perform multiple sequence alignment with PRANK
Parameters: - bin_path – path to PRANK executable
- in_file – input fasta file
- out_file – output fasta file
-
wgd.alignment.
get_pairwise_alns
(aln, nuc_seqs, min_length=3)¶ Get all pairwise alignments and pairwise statistics.
Parameters: - aln – alignment file
- nuc_seqs – nucleotide sequences dictionary
- min_length – minimum stripped alignment length necessary to include a pair
-
wgd.alignment.
hamming_distance
(s1, s2)¶ Return the Hamming distance between equal-length sequences
Parameters: - s1 – string 1
- s2 – string 2
Returns: the Hamming distances between s1 and s2
-
wgd.alignment.
pairwise_alignment_stats
(aln)¶ Get pairwise alignment statistics.
Parameters: aln – alignment dictionary Returns: dictionary with pairwise statistics
-
wgd.alignment.
pal2nal
(pal, nuc_seqs)¶ Protein alignment to nucleotide alignment converter.
Parameters: - pal – protein alignment dictionary
- nuc_seqs – nucleotide sequences
Returns: nucleotide alignment dictionary
-
wgd.alignment.
prepare_aln
(msa_file, nuc_seqs)¶ Main wrapper function for alignments for Ks distributions, takes an alignment file as input (protein or nucleotide), and returns a nucleotide alignment (codon) file and pairwise alignment statistics.
Parameters: - msa_file – multiple sequence alignment file path
- nuc_seqs – nucleotide sequences (set to
None
if input is a nucleotide alignment).
Returns: file path to nucleotide alignment, pairwise alignment statistics
-
wgd.alignment.
strip_gaps_pair
(s1, s2)¶ Strip gaps for an aligned sequence pair.
Parameters: - s1 – sequence 1
- s2 – sequence 2
Returns: two stripped sequences
-
wgd.alignment.
write_alignment_codeml
(alignment, file_name)¶ Write an alignment file for codeml
Parameters: - alignment – alignment dictionary
- file_name – output file name