Alignment tools


Copyright (C) 2018 Arthur Zwaenepoel

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Contact: arzwa@psb.vib-ugent.be


wgd.alignment.align(in_file, out_file, aligner)

Alignment wrapper

Parameters:
  • aligner – alignment program
  • in_file – input fasta file
  • out_file – output fasta file
wgd.alignment.align_mafft(in_file, out_file, bin_path='mafft')

Perform multiple sequence alignment with MAFFT

Parameters:
  • bin_path – path to MAFFT executable
  • in_file – input fasta file
  • out_file – output fasta file
wgd.alignment.align_muscle(in_file, out_file, bin_path='muscle')

Perform multiple sequence alignment with MUSCLE

Parameters:
  • bin_path – path to MUSCLE executable
  • in_file – input fasta file
  • out_file – output fasta file
wgd.alignment.align_prank(in_file, out_file, bin_path='prank')

Perform multiple sequence alignment with PRANK

Parameters:
  • bin_path – path to PRANK executable
  • in_file – input fasta file
  • out_file – output fasta file
wgd.alignment.get_pairwise_alns(aln, nuc_seqs, min_length=3)

Get all pairwise alignments and pairwise statistics.

Parameters:
  • aln – alignment file
  • nuc_seqs – nucleotide sequences dictionary
  • min_length – minimum stripped alignment length necessary to include a pair
wgd.alignment.hamming_distance(s1, s2)

Return the Hamming distance between equal-length sequences

Parameters:
  • s1 – string 1
  • s2 – string 2
Returns:

the Hamming distances between s1 and s2

wgd.alignment.pairwise_alignment_stats(aln)

Get pairwise alignment statistics.

Parameters:aln – alignment dictionary
Returns:dictionary with pairwise statistics
wgd.alignment.pal2nal(pal, nuc_seqs)

Protein alignment to nucleotide alignment converter.

Parameters:
  • pal – protein alignment dictionary
  • nuc_seqs – nucleotide sequences
Returns:

nucleotide alignment dictionary

wgd.alignment.prepare_aln(msa_file, nuc_seqs)

Main wrapper function for alignments for Ks distributions, takes an alignment file as input (protein or nucleotide), and returns a nucleotide alignment (codon) file and pairwise alignment statistics.

Parameters:
  • msa_file – multiple sequence alignment file path
  • nuc_seqs – nucleotide sequences (set to None if input is a nucleotide alignment).
Returns:

file path to nucleotide alignment, pairwise alignment statistics

wgd.alignment.strip_gaps_pair(s1, s2)

Strip gaps for an aligned sequence pair.

Parameters:
  • s1 – sequence 1
  • s2 – sequence 2
Returns:

two stripped sequences

wgd.alignment.write_alignment_codeml(alignment, file_name)

Write an alignment file for codeml

Parameters:
  • alignment – alignment dictionary
  • file_name – output file name