Sleipnir: BNUnraveler

BNUnraveler is a multithreaded tool for performing inference on many context-specific Bayesian classifiers to produce predicted functional relationship networks. It is generally paired with BNWeaver to learn classifiers from data and then infer context-specific functional relationships.

Usage

Basic Usage

 BNUnraveler -i <contexts_dir> -d <data_dir> -o <networks_dir> [-t <threads>]
        <contexts.txt>*

For each biological context contexts.txt, load a context-specific Bayesian classifier (of the same name) from contexts_dir and use Bayesian inference with the data (named identically to the node IDs in the classifier) from data_dir to create a context-specific functional relationship network in networks_dir; optionally, use threads parallel threads.

Detailed Usage

package "BNUnraveler"
version "1.0"
purpose "Bayes net evaluation from data"

section "Main"
option  "input"         i   "Input (X)DSL file directory"
                            string  typestr="directory" default="."
option  "directory"     d   "Data directory"
                            string  typestr="directory" default="."
option  "output"        o   "Output directory"
                            string  typestr="directory" default="."

section "Miscellaneous"
option  "everything"    e   "Evaluate non-term pairs"
                            flag    on
option  "answers"       w   "Answer file"
                            string  typestr="filename"

section "Learning/Evaluation"
option  "genes"         g   "Gene inclusion file"
                            string  typestr="filename"
option  "genome"        G   "Gene list of interest"
                            string  typestr="filename"

section "Network Features"
option  "zero"          z   "Zero missing values"
                            flag    off
option  "zeros"         Z   "Read zeroed node IDs/outputs from the given file"
                            string  typestr="filename"

section "Optional"
option  "memmap"        m   "Memory map input files"
                            flag    off
option  "threads"       t   "Maximum number of threads to spawn"
                            int default="-1"
option  "xdsl"          x   "Assume XDSL input rather than DSL"
                            flag    on
option  "group"         u   "Group identical inputs"
                            flag    on
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
None	None	Gene text files	Gene sets representing biological contexts (sets of related genes) for which Bayesian classifiers have been learned. Must have filenames corresponding to the (X)DSL files to be loaded, e.g. if "mitotic_cell_cycle.txt" is given on the command line, "mitotic_cell_cycle.xdsl" must exist in `-i`.
-i	.	Directory	Directory from which (X)DSL Baysian classifier files are read. Must be naive classifiers with identical structure (but presumably different parameters).
-o	.	Directory	Directory into which inferred functional relationship networks (DAB files) are placed.
-d	.	Directory	Directory from which data files are read. Must be DAT/DAB files with names identical to the nodes of the Bayesian classifiers.
-e	on	Flag	If on, predict relationship probabilities for all genes with any data, regardless of context.
-w	None	DAT/DAB file	If given, predict relationship probabilities only for gene pairs in the given answer file.
-g	None	Text gene list	If given, predict relationship probabilities for all gene pairs for which both genes are in the list (regardless of context).
-G	None	Text gene list	If given, predict relationship probabilities only for gene pairs for which both genes are in the list (in addition to context filtering).
-z	off	Flag	If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin).
-Z	None	Tab-delimited text file	If given, argument must be a tab-delimited text file containing two columns, the first node IDs (see BNCreator) and the second bin numbers (zero indexed). For each node ID present in this file, missing values will be substituted with the given bin number.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.
-t	1	Integer	Number of simultaneous threads to use for individual CPT inferences. Threads are per classifier node (dataset), so the number of threads actually used is the minimum of `-t` and the number of datasets.
-x	on	Flag	If on, assume XDSL files will be used instead of DSL files.
-u	on	Flag	If on, group identical examples into one heavily weighted example. This greatly improves efficiency, and there's essentially never a reason to deactivate it.