Sleipnir: BNWeaver

BNWeaver is a multithreaded tool for learning context-specific naive Bayesian classifiers from datasets (DAT/DAB files). It is generally paired with BNUnraveler to learn classifiers from data and then infer context-specific functional relationships.

Usage

Basic Usage

 BNWeaver -w <answers.dab> -o <contexts_dir> -d <data_dir> [-b <global.xdsl>] [-t <threads>]
        <contexts.txt>*

Using the gold standard answers in answers.dab and all DAT/DAB files in data_dir, create one context-specific Bayesian classifier in contexts_dir for each biological context contexts.txt; optionally, use probability tables from global.xdsl as fallbacks when insufficient data is available for context-specific learning, and use threads parallel threads.

Detailed Usage

package "BNWeaver"
version "1.0"
purpose "Bayes net construction and training from data"

section "Main"
option  "answers"   w   "Answer file"
                        string  typestr="filename"  yes
option  "output"    o   "Output directory"
                        string  typestr="directory" default="."
option  "directory" d   "Data directory"
                        string  typestr="directory" default="."

section "Learning/Evaluation"
option  "genex"     G   "Gene exclusion file"
                        string  typestr="filename"
option  "negatives" n   "Gene set for negative pairs"
                        string  typestr="filename"
option  "randomize" a   "Randomize data before training"
                        flag    off

section "Network Features"
option  "default"   b   "Bayes net containing defaults for cases with missing data"
                        string  typestr="filename"
option  "zero"      z   "Zero missing values"
                        flag    off
option  "zeros"     Z   "Read zeroed node IDs/outputs from the given file"
                        string  typestr="filename"

section "Optional"
option  "memmap"    m   "Memory map input files"
                        flag    off
option  "threads"   t   "Maximum number of threads to spawn"
                        int default="-1"
option  "xdsl"      x   "Generate XDSL output rather than DSL"
                        flag    on
option  "group"     u   "Group identical inputs"
                        flag    on
option  "random"    r   "Seed random generator"
                        int default="0"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag	Default	Type	Description
None	None	Gene text files	Gene sets representing biological contexts (sets of related genes) for which Bayesian classifiers will be learned.
-w	None	DAT/DAB file	Functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN).
-o	.	Directory	Directory into which learned naive Bayesian classifiers ((X)DSL files) are placed.
-d	.	Directory	Directory from which data files are read. Must be DAT/DAB files with names from which the node IDs of the Bayesian classifiers can be created.
-G	None	Text gene list	If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes.
-n	None	Text gene list	If given, use only gene pairs including at least one gene from the given set For details, see Sleipnir::CDat::FilterGenes.
-a	off	Flag	If on, randomly shuffle all data values (by gene pair) before learning.
-b	None	(X)DSL file	If present during learning, parameters from the given (X)DSL file are used instead of learned parameters for probability tables with too few examples. For details, see Sleipnir::CBayesNetSmile::SetDefault.
-z	off	Flag	If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin).
-Z	None	Tab-delimited text file	If given, argument must be a tab-delimited text file containing two columns, the first node IDs (see BNCreator) and the second bin numbers (zero indexed). For each node ID present in this file, missing values will be substituted with the given bin number.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.
-t	1	Integer	Number of simultaneous threads to use for individual CPT learning. Threads are per classifier node (dataset), so the number of threads actually used is the minimum of `-t` and the number of datasets.
-x	on	Flag	If on, assume XDSL files will be used instead of DSL files.
-u	on	Flag	If on, group identical examples into one heavily weighted example. This greatly improves efficiency, and there's essentially never a reason to deactivate it.