Sleipnir: Funcographer

Funcographer mines heavy subgraphs (clusters) from three inputs: gene set functional associations (e.g. from Funcifier), functional activities between datasets and functions (e.g. from BNTruster), and correlations between datasets' functional activities. This provides a way of mining large data collections for pathways that are interacting and the datasets in which those pathways are most active.

Usage

Basic Usage

 Funcographer -f <functions.dab> -t <trusts.pcl> -d <datasets.dab> -o <cograph.dab>
        -r <initial_specificity> -w <final_specificity_ratio> -Z <datasets>

Using the process/process associations in functions.dab, the dataset/process associations in trusts.pcl (from BNTruster), and the dataset/dataset similarities in datasets.dab, construct a single graph cograph.dab containing both node types and all edges. Then mine the functions.dab portion of this graph, outputting (to standard output) all heavy subgraphs with an initial specificity ratio of at least initial_specificity, a final specificity ratio that's at least final_specificity_ratio fraction of the initial value, and the datasets datasets most strongly associated with those functions.

Detailed Usage

package "Funcographer"
version "1.0"
purpose "Function and dataset interaction network builder"

section "Main"
option  "functions"         f   "Function association DAT/DAB"
                                string  typestr="filename"  yes
option  "trusts"            t   "Trusts PCL"
                                string  typestr="filename"  yes
option  "datasets"          d   "Shared dataset activity DAT/DAB"
                                string  typestr="filename"  yes
option  "output"            o   "Merged network DAT/DAB"
                                string  typestr="filename"

section "Miscellaneous"
option  "adjust_data"       a   "Adjustment to dataset z-scores"
                                double  default="0"

section "Subgraphs"
option  "subgraphs"         n   "Number of function subgraphs to explore"
                                int default="-1"
option  "heavy"             w   "Minimum final subgraph specificity fraction"
                                double  default="0.5"
option  "specificity"       r   "Minimum initial subgraph specificity ratio"
                                double  default="25"
option  "size_functions"    z   "Minimum size of subgraphs"
                                int default="0"
option  "size_datasets"     Z   "Number of associated datasets to output"
                                int default="10"

section "Optional"
option  "skip"              s   "Skip columns"
                                int default="0"
option  "memmap"            m   "Memory map input"
                                flag    off
option  "verbosity"         v   "Message verbosity"
                                int default="5"

Flag	Default	Type	Description
-f	None	DAT/DAB file	Input DAT/DAB file containing pairwise functional associations between gene sets (e.g. from Funcifier).
-t	None	PCL text file	Input PCL file containing functional activities for each dataset/gene set pair (e.g. from BNTruster).
-d	None	DAT/DAB file	Input DAT/DAB file containing shared functional activity scores between datasets (e.g. from running Distancer on BNTruster output).
-o	stdou	DAT/DAB file	Output DAT/DAB file containing nodes for both gene sets and datasets; gene set/gene set scores are taken from `-f`, gene set/dataset scores from `-t`, and dataset/dataset scores from `-d`.
-a	0	Double	Adjustment value added to each dataset/dataset score; useful for ensuring that all three score types fall within more or less the same range.
-n	-1	Integer	Number of dense subgraphs to output; -1 will run until no appropriate subgraphs are available.
-w	0.5	Double	Ratio of initial to final specificity scores for heavy subgraphs. For example, if searching for dense subgraphs with a seed specificity of 25, a ratio of 0.5 will stop building the subgraph when a specificity of 12.5 is reached. Value should not exceed `-r` or fall below 1 / `-r`.
-r	25	Double	Initial specificity score for heavy subgraphs. Guarantees that the ratio of in- to out-connectivity for a cluster seed is at least the given value. A value of 0 will find full cliques of edges above `-c` instead of dense subgraphs.
-z	0	Integer	Minimum number of gene sets a heavy subgraph must contain to be output.
-Z	10	Integer	Number of datasets to be output in association with each heavy subgraph.
-s	0	Integer	Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.