Sleipnir
|
Funcographer mines heavy subgraphs (clusters) from three inputs: gene set functional associations (e.g. from Funcifier), functional activities between datasets and functions (e.g. from BNTruster), and correlations between datasets' functional activities. This provides a way of mining large data collections for pathways that are interacting and the datasets in which those pathways are most active.
Funcographer -f <functions.dab> -t <trusts.pcl> -d <datasets.dab> -o <cograph.dab> -r <initial_specificity> -w <final_specificity_ratio> -Z <datasets>
Using the process/process associations in functions.dab
, the dataset/process associations in trusts.pcl
(from BNTruster), and the dataset/dataset similarities in datasets.dab
, construct a single graph cograph.dab
containing both node types and all edges. Then mine the functions.dab
portion of this graph, outputting (to standard output) all heavy subgraphs with an initial specificity ratio of at least initial_specificity
, a final specificity ratio that's at least final_specificity_ratio
fraction of the initial value, and the datasets
datasets most strongly associated with those functions.
package "Funcographer"
version "1.0"
purpose "Function and dataset interaction network builder"
section "Main"
option "functions" f "Function association DAT/DAB"
string typestr="filename" yes
option "trusts" t "Trusts PCL"
string typestr="filename" yes
option "datasets" d "Shared dataset activity DAT/DAB"
string typestr="filename" yes
option "output" o "Merged network DAT/DAB"
string typestr="filename"
section "Miscellaneous"
option "adjust_data" a "Adjustment to dataset z-scores"
double default="0"
section "Subgraphs"
option "subgraphs" n "Number of function subgraphs to explore"
int default="-1"
option "heavy" w "Minimum final subgraph specificity fraction"
double default="0.5"
option "specificity" r "Minimum initial subgraph specificity ratio"
double default="25"
option "size_functions" z "Minimum size of subgraphs"
int default="0"
option "size_datasets" Z "Number of associated datasets to output"
int default="10"
section "Optional"
option "skip" s "Skip columns"
int default="0"
option "memmap" m "Memory map input"
flag off
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
-f | None | DAT/DAB file | Input DAT/DAB file containing pairwise functional associations between gene sets (e.g. from Funcifier). |
-t | None | PCL text file | Input PCL file containing functional activities for each dataset/gene set pair (e.g. from BNTruster). |
-d | None | DAT/DAB file | Input DAT/DAB file containing shared functional activity scores between datasets (e.g. from running Distancer on BNTruster output). |
-o | stdou | DAT/DAB file | Output DAT/DAB file containing nodes for both gene sets and datasets; gene set/gene set scores are taken from -f , gene set/dataset scores from -t , and dataset/dataset scores from -d . |
-a | 0 | Double | Adjustment value added to each dataset/dataset score; useful for ensuring that all three score types fall within more or less the same range. |
-n | -1 | Integer | Number of dense subgraphs to output; -1 will run until no appropriate subgraphs are available. |
-w | 0.5 | Double | Ratio of initial to final specificity scores for heavy subgraphs. For example, if searching for dense subgraphs with a seed specificity of 25, a ratio of 0.5 will stop building the subgraph when a specificity of 12.5 is reached. Value should not exceed -r or fall below 1 / -r . |
-r | 25 | Double | Initial specificity score for heavy subgraphs. Guarantees that the ratio of in- to out-connectivity for a cluster seed is at least the given value. A value of 0 will find full cliques of edges above -c instead of dense subgraphs. |
-z | 0 | Integer | Minimum number of gene sets a heavy subgraph must contain to be output. |
-Z | 10 | Integer | Number of datasets to be output in association with each heavy subgraph. |
-s | 0 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
-m | off | Flag | If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped. |