Co-linearity analyses module¶
Copyright (C) 2018 Arthur Zwaenepoel
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Contact: arzwa@psb.vib-ugent.be
The co-linearity analysis functions currently only support intragenomic analyses. It can be a bit finicky, but that’s mainly due to I-ADHoRe and people messing up the GFF format, not me! (definitely also me)
-
wgd.colinearity.
_write_gene_lists
(genome, output_dir='gene_lists')¶ Write out the gene lists
Parameters: - genome – Genome object
- output_dir – output directory
-
wgd.colinearity.
get_anchor_pairs
(anchors, ks_distribution=None, out_file='anchors_ks.csv')¶ Get anchor pairs and their corresponding Ks values (if provided)
Parameters: - anchors – anchorpoints.txt output from I-ADHoRe 3.0
- ks_distribution – Ks distribution dataf rame
- out_file – output file name
Returns: pandas dataframe(s): anchors and data frame
-
wgd.colinearity.
run_adhore
(config_file)¶ Run I-ADHoRe for a given config file
Parameters: config_file – path to I-ADHoRe configuration file
-
wgd.colinearity.
segments_to_chords_table
(segments_file, genome, output_file='chords.tsv')¶ Create chords table for visualization in a chord diagram. Uses the segments.txt output of I-ADHoRe. Chords are defined by a source chromosome and a target chromosome and begin and end coordinates for each chromosome respectively.
TODO: the length of each syntenic block should be included in the table as well with length defined as number of genes, not physical length.
Parameters: - segments_file – pat to the I-ADHoRe segments.txt output file
- genome – a
gff_parser.Genome object()
- output_file – output file name
-
wgd.colinearity.
write_config_adhore
(gene_lists, families, config_file_name='i-adhore.conf', genome='genome', output_path='i-adhore_out', gap_size=30, cluster_gap=35, q_value=0.75, prob_cutoff=0.01, anchor_points=3, alignment_method='gg2', level_2_only='false', table_type='family', multiple_hypothesis_correction='FDR', visualize_ghm='false', visualize_alignment='true')¶ Write out the config file for I-ADHoRe. See I-ADHoRe manual for information on parameter settings.
Parameters: - gene_lists – directory with gene lists per chromosome
- families – file with gene to family mapping
- config_file_name – name for the config file
- genome – genome name
- output_path – output path name
- gap_size – see I-ADHoRe 3.0 documentation
- cluster_gap – see I-ADHoRe 3.0 documentation
- q_value – see I-ADHoRe 3.0 documentation
- prob_cutoff – see I-ADHoRe 3.0 documentation
- anchor_points – see I-ADHoRe 3.0 documentation
- alignment_method – see I-ADHoRe 3.0 documentation
- level_2_only – see I-ADHoRe 3.0 documentation
- table_type – see I-ADHoRe 3.0 documentation
- multiple_hypothesis_correction – see I-ADHoRe 3.0 documentation
- visualize_ghm – see I-ADHoRe 3.0 documentation
- visualize_alignment – see I-ADHoRe 3.0 documentation
Returns: configuration file see I-ADHoRe 3.0 documentation
-
wgd.colinearity.
write_families_file
(families, all_genes, output_file='families.tsv')¶ Write out families file
Parameters: - families – gene families
- all_genes – set object with all genes for the species of interest
- output_file – output file name
Returns: nada