Markov clustering related functions¶
Python functions that wrap blast
and mcl
, the Markov clustering
algorithm invented and developed by Stijn Van Dongen. (Stijn van Dongen, Graph
Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000.)
Copyright (C) 2018 Arthur Zwaenepoel
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: arzwa@psb.vib-ugent.be
-
wgd.blast_mcl.
all_v_all_blast
(query, db, output_directory='./', output_file='blast.tsv', eval_cutoff=1e-10, n_threads=4)¶ Perform all-versus-all Blastp. Runs a blast of
query
vs.db
.Parameters: - query – query sequences fasta file
- db – database sequences fasta file
- output_directory – output directory
- output_file – output file name
- eval_cutoff – e-value cut-off
Param: n_threads: number of threads to use for blastp
Returns: blast file path
-
wgd.blast_mcl.
ava_blast_to_abc
(ava_file, col_1=0, col_2=1, col_3=10)¶ Convert tab separated all-vs-all (ava) Blast results to an input graph in
abc
format formcl
.Parameters: - ava_file – all-vs-all Blast results file (tab separated)
- col_1 – column in file for gene 1 (starts from 0)
- col_2 – column in file for gene 2
- col_3 – column in file for e-value (weight in graph)
Returns: graph in
abc
format formcl
-
wgd.blast_mcl.
get_one_v_one_orthologs_rbh
(blast_file, output_dir)¶ Get one-vs-one orthologs (using RBHs). Implemented for an arbitrary number of species. note that every gene ID in the blast file should be prefixed with a species ID e.g.
ath|AT1G01000
.Parameters: - blast_file – all vs. all blastp results, gene IDs should be prefixed
- output_dir – output directory
Returns: the last output file that was written
-
wgd.blast_mcl.
run_mcl_ava
(graph, output_dir='./', inflation=2, output_file='out.mcl', preserve=False, return_dict=False)¶ Run
mcl
on all-vs-all Blast results for a species of interest. Note if the parameteroutput_file
is not given and the parameterreturn_dict
is set to True, only a python dictionary is returned and no file is written.Parameters: - graph – input graph (abc format)
- output_dir – output_dir
- output_file – output_file (optional)
- inflation – inflation factor for
mcl
- return_dict – boolean, return results as dictionary?
- preserve – boolean, preserve tmp/intermediate files?
Returns: results as output file or as gene family dictionary