Co-linearity analyses module


Copyright (C) 2018 Arthur Zwaenepoel

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Contact: arzwa@psb.vib-ugent.be


The co-linearity analysis functions currently only support intragenomic analyses. It can be a bit finicky, but that’s mainly due to I-ADHoRe and people messing up the GFF format, not me! (definitely also me)

wgd.colinearity._write_gene_lists(genome, output_dir='gene_lists')

Write out the gene lists

Parameters:
  • genome – Genome object
  • output_dir – output directory
wgd.colinearity.get_anchor_pairs(anchors, ks_distribution=None, out_file='anchors_ks.csv')

Get anchor pairs and their corresponding Ks values (if provided)

Parameters:
  • anchors – anchorpoints.txt output from I-ADHoRe 3.0
  • ks_distribution – Ks distribution dataf rame
  • out_file – output file name
Returns:

pandas dataframe(s): anchors and data frame

wgd.colinearity.run_adhore(config_file)

Run I-ADHoRe for a given config file

Parameters:config_file – path to I-ADHoRe configuration file
wgd.colinearity.segments_to_chords_table(segments_file, genome, output_file='chords.tsv')

Create chords table for visualization in a chord diagram. Uses the segments.txt output of I-ADHoRe. Chords are defined by a source chromosome and a target chromosome and begin and end coordinates for each chromosome respectively.

TODO: the length of each syntenic block should be included in the table as well with length defined as number of genes, not physical length.

Parameters:
  • segments_file – pat to the I-ADHoRe segments.txt output file
  • genome – a gff_parser.Genome object()
  • output_file – output file name
wgd.colinearity.write_config_adhore(gene_lists, families, config_file_name='i-adhore.conf', genome='genome', output_path='i-adhore_out', gap_size=30, cluster_gap=35, q_value=0.75, prob_cutoff=0.01, anchor_points=3, alignment_method='gg2', level_2_only='false', table_type='family', multiple_hypothesis_correction='FDR', visualize_ghm='false', visualize_alignment='true')

Write out the config file for I-ADHoRe. See I-ADHoRe manual for information on parameter settings.

Parameters:
  • gene_lists – directory with gene lists per chromosome
  • families – file with gene to family mapping
  • config_file_name – name for the config file
  • genome – genome name
  • output_path – output path name
  • gap_size – see I-ADHoRe 3.0 documentation
  • cluster_gap – see I-ADHoRe 3.0 documentation
  • q_value – see I-ADHoRe 3.0 documentation
  • prob_cutoff – see I-ADHoRe 3.0 documentation
  • anchor_points – see I-ADHoRe 3.0 documentation
  • alignment_method – see I-ADHoRe 3.0 documentation
  • level_2_only – see I-ADHoRe 3.0 documentation
  • table_type – see I-ADHoRe 3.0 documentation
  • multiple_hypothesis_correction – see I-ADHoRe 3.0 documentation
  • visualize_ghm – see I-ADHoRe 3.0 documentation
  • visualize_alignment – see I-ADHoRe 3.0 documentation
Returns:

configuration file see I-ADHoRe 3.0 documentation

wgd.colinearity.write_families_file(families, all_genes, output_file='families.tsv')

Write out families file

Parameters:
  • families – gene families
  • all_genes – set object with all genes for the species of interest
  • output_file – output file name
Returns:

nada