Sleipnir
|
SpeciesConnector calculates the mutual information (or other similarity measure) between pairs of input datasets. This can be used to approximate how much information is shared between two experimental datasets or how similar two predicted functional relationship networks are.
SpeciesConnector <data.dab>*
Compute pairwise mutual information scores for each pair of datasets in data.dab
and output them to standard output.
Note that to use Counter with the output from SpeciesConnector
, you should first convert the table of raw (bit) mutual information scores to exponentially scaled sums of relative shared information. This can be done using the half2relative.rb
and half2weights.rb
scripts included with Sleipnir. The combination of these two files' outputs creates a weights file appropriate for use with Counter 's alphas parameters.
package "SpeciesConnector"
version "1.0"
purpose "Cross species connection calculator."
section "Main"
option "ddirectory" d "Data directory"
string typestr="directory" default="."
option "adirectory" w "Answer directory"
string typestr="directory" default="."
option "odirectory" o "Output directory"
string typestr="directory" default="."
option "jdirectory" p "Learned joint directory"
string typestr="directory" default="."
option "l1directory" j "Likelihood wrt 1 directory"
string typestr="directory" default="."
option "l0directory" k "Likelihood wrt 0 directory"
string typestr="directory" default="."
section "Stage: Learn/Prediction"
option "learn" L "Learn flag"
flag off
section "Network Features"
option "zeros" Z "Read zeroed node IDs/outputs from the given file"
string typestr="filename"
section "Optional"
option "memmap" m "Memory map input/output"
flag off
option "random" r "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
option "genex" G "Gene exclusion file"
string typestr="filename"
option "genelist" l "Print gene list on the screen"
flag off
option "uniformjoint" u "Uniform joint distribution"
flag off
option "threshold" t "Threshold for joint"
float default="0.5"
option "holdout" h "Holdout target dataset"
flag off
Flag | Default | Type | Description |
---|---|---|---|
None | None | DAT/DAB files | Datasets for which pairwise mutual information (or other similarity measure) will be calculated. |
-d | mi | mi, pearson, quickpear, euclidean, kendalls, kolm-smir, hypergeom, innerprod, bininnerprod | Similarity measure to be used for dataset comparisons. |
-z | off | Flag | If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin). |
-Z | None | Tab-delimited text file | If given, argument must be a tab-delimited text file containing two columns, the first node IDs (see BNCreator) and the second bin numbers (zero indexed). For each node ID present in this file, missing values will be substituted with the given bin number. |
-R | on | Flag | If on, assign missing values randomly; this generally results in much better approximations of mutual information. |
-t | on | Flag | If on, format output as a tab-delimited table; otherwise, format as one pair per line. |
-y | -1 | Integer | If nonnegative, process only pairs of datasets containing (and beginning with) the given dataset index. This can be used to parallelize many mutual information calculations by running processes with different -y values. |
-m | off | Flag | If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped. |