.. _viz_info:

Visualization module
********************

The visualization module allows both interactive visualization using bokeh, as
well as generating static image files. Below a screenshot of the interactive
interface is included:

.. image:: ath_cpa_viz.png

The interactive interface allows modification of key parameters, such as the
histogram bin-width and KDE bandwidth. You are strongly encouraged to observe
the effects of modifications in these parameters, as they may reveal
visualization artifacts. As one can see from the screenshot, it allows
overlaying multiple distributions, overlaying histograms and KDEs, and
dynamically hiding and showing of distributions (by clicking the entries in the
legend). Note that to run the interactive visualization, a bokeh server should
be running, which you can initiate with the following command::

    bokeh serve &

Note that ``bokeh`` should be installed automatically when installing ``wgd``.

Alternatively, the ``viz`` module also allows generating static images when the
``--interactive`` flag is not set.

A note on histogram visualization
=================================

|Ks| distributions can be visualized in three main ways, (1) a pairwise |Ks|
value histogram, (2) a node-averaged histogram and (3) a node-weighted histogram.

In the first case all pairwise estimates are added with equal weight to the
distribution, however, more ancient duplications will therefore end up in the
|Ks| distribution with multiple estimates. Such a representation is thus rather
flawed, as it will artifically amplify peaks in high |Ks| regions because there
are simply more estimates for older duplication events. This representation is
not used in ``wgd``, however it can be simply generated by simply plotting the
|Ks| column of the tsv output from ``wgd ksd`` in R or Python.

Node-averaging addresses this problem by averaging |Ks| estimates for a particular
duplication node in a gene family tree. This is the default distribution used for
modeling purposes such as **mixture modeling and KDEs**.

Node-weighted |Ks| values use the same principle as node averaging, but keep the
original values. Instead of plotting a histogram of averages for all nodes, a
histogram is plotted where every |Ks| estimate for a particular duplication node
is added with equal weight such that the weights of all estimates for that node
sum up to one. Since this is arguably the representation closest to the actual
data, this is the default output when running ``wgd ksd``. They can also be
plotted using the ``--weighted`` flag in ``wgd viz``.

Another subtle point is whether the weights or averages are computed before or
after filtering steps are applied. By default ``wgd`` employs a strategy where
weights or averages are computed `after` filtering, effectively designating the
filtered values as outliers. The ``wgd viz`` tool gives the option to look at the
effect of calculating averages before filtering.

Reference
=========

.. automodule:: wgd.viz
    :members:
    :private-members:
    :special-members: __init__

.. |Ks| replace:: K\ :sub:`S`
.. |Ka| replace:: K\ :sub:`A`