Sleipnir: MIer

MIer calculates the mutual information (or other similarity measure) between pairs of input datasets. This can be used to approximate how much information is shared between two experimental datasets or how similar two predicted functional relationship networks are.

Usage

Basic Usage

 MIer <data.dab>*

Compute pairwise mutual information scores for each pair of datasets in data.dab and output them to standard output.

Detailed Usage

Note that to use Counter with the output from MIer, you should first convert the table of raw (bit) mutual information scores to exponentially scaled sums of relative shared information. This can be done using the half2relative.rb and half2weights.rb scripts included with Sleipnir. The combination of these two files' outputs creates a weights file appropriate for use with Counter 's alphas parameters. Also note that the calculated measure is on quantized values if .quant files exist in the same directory as .dab files. If no .quant files exist, MIer will write to the error log "could not open quant file" and will proceed using non-quantized values.

package "MIer"
version "1.0"
purpose "Data set pairwise mutual information calculator."

section "Main"
option  "distance"  d   "Similarity measure"
                        values="pearson","quickpear","euclidean","kendalls","kolm-smir",
                        "hypergeom","innerprod","bininnerprod","mi"

section "Network Features"
option  "zero"      z   "Zero missing values"
                        flag    off
option  "zeros"     Z   "Read zeroed node IDs/outputs from the given file"
                        string  typestr="filename"
option  "randomize" R   "Assign missing values randomly"
                        flag    on

section "Optional"
option  "subsample" s   "Maximum pairs to subsample"
                        int default="100000"
option  "table"     t   "Format output as a 2D table"
                        flag    on
option  "only"      y   "Process only the given input file"
                        int default="-1"
option  "threads"   T   "Number of threads to use, note that enough memory is required to load threads number of datasets concurrently. This doesn't change memory requirements under bigmem."
                        int default="1"
option  "memmap"    m   "Memory map input/output"
                        flag    off
option  "bigmem"    M   "Load complete collection of datasets/networks into memory, faster but requires enough memory to hold all datasets."
                        flag    off
option  "random"    r   "Seed random generator"
                        int default="0"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag	Default	Type	Description
None	None	DAT/DAB files	Datasets for which pairwise mutual information (or other similarity measure) will be calculated.
-d	mi	mi, pearson, quickpear, euclidean, kendalls, kolm-smir, hypergeom, innerprod, bininnerprod	Similarity measure to be used for dataset comparisons.
-z	off	Flag	If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin).
-Z	None	Tab-delimited text file	If given, argument must be a tab-delimited text file containing two columns, the first node IDs (see BNCreator) and the second bin numbers (zero indexed). For each node ID present in this file, missing values will be substituted with the given bin number.
-R	on	Flag	If on, assign missing values randomly; this generally results in much better approximations of mutual information.
-t	on	Flag	If on, format output as a tab-delimited table; otherwise, format as one pair per line.
-y	-1	Integer	If nonnegative, process only pairs of datasets containing (and beginning with) the given dataset index. This can be used to parallelize many mutual information calculations by running processes with different `-y` values.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.