Navigation

  • index
  • modules |
  • next |
  • previous |
  • wgd 1.0.1 documentation »

Markov clustering related functions¶

Python functions that wrap blast and mcl, the Markov clustering algorithm invented and developed by Stijn Van Dongen. (Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000.)


Copyright (C) 2018 Arthur Zwaenepoel

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Contact: arzwa@psb.vib-ugent.be


wgd.blast_mcl.all_v_all_blast(query, db, output_directory='./', output_file='blast.tsv', eval_cutoff=1e-10, n_threads=4)¶

Perform all-versus-all Blastp. Runs a blast of query vs. db.

Parameters:
  • query – query sequences fasta file
  • db – database sequences fasta file
  • output_directory – output directory
  • output_file – output file name
  • eval_cutoff – e-value cut-off
Param:

n_threads: number of threads to use for blastp

Returns:

blast file path

wgd.blast_mcl.ava_blast_to_abc(ava_file, col_1=0, col_2=1, col_3=10)¶

Convert tab separated all-vs-all (ava) Blast results to an input graph in abc format for mcl.

Parameters:
  • ava_file – all-vs-all Blast results file (tab separated)
  • col_1 – column in file for gene 1 (starts from 0)
  • col_2 – column in file for gene 2
  • col_3 – column in file for e-value (weight in graph)
Returns:

graph in abc format for mcl

wgd.blast_mcl.get_one_v_one_orthologs_rbh(blast_file, output_dir)¶

Get one-vs-one orthologs (using RBHs). Implemented for an arbitrary number of species. note that every gene ID in the blast file should be prefixed with a species ID e.g. ath|AT1G01000.

Parameters:
  • blast_file – all vs. all blastp results, gene IDs should be prefixed
  • output_dir – output directory
Returns:

the last output file that was written

wgd.blast_mcl.run_mcl_ava(graph, output_dir='./', inflation=2, output_file='out.mcl', preserve=False, return_dict=False)¶

Run mcl on all-vs-all Blast results for a species of interest. Note if the parameter output_file is not given and the parameter return_dict is set to True, only a python dictionary is returned and no file is written.

Parameters:
  • graph – input graph (abc format)
  • output_dir – output_dir
  • output_file – output_file (optional)
  • inflation – inflation factor for mcl
  • return_dict – boolean, return results as dictionary?
  • preserve – boolean, preserve tmp/intermediate files?
Returns:

results as output file or as gene family dictionary

Previous topic

Command line interface

Next topic

Codeml python wrapper

This Page

  • Show Source

Quick search

Navigation

  • index
  • modules |
  • next |
  • previous |
  • wgd 1.0.1 documentation »
© Copyright 2018, Arthur Zwaenepoel. Created using Sphinx 1.8.6.