Sleipnir: DataDumper

DataDumper opens an answer file, multiple datasets, and a Bayesian network and displays the data exactly as it would be provided to the network for Bayesian learning or evaluation. This provides information on exactly what genes/gene pairs and data are being used for learning under different circumstances.

Usage

Basic Usage

 DataDumper -w <answers.dab> <data.dab>*

Output (to standard output) the discretized answers for all gene pairs in answers.dab with data from the DAT/DAB files data.dab (and associated QUANT files) exactly as it would be used in Bayesian learning. In combination with the other command line arguments, this allows a user to see exactly what data is being used for learning/evaluation under specific circumstances, e.g. context-specific learning, a holdout test set, etc.

Detailed Usage

package "DataDumper"
version "1.0"
purpose "Examination of data used for large scale Bayes net learning"

section "Main"
option  "answers"   w   "Answer file"
                        string  typestr="filename"  yes

section "Learning/Evaluation"
option  "genes"     g   "Gene inclusion file"
                        string  typestr="filename"
option  "genex"     G   "Gene exclusion file"
                        string  typestr="filename"
option  "genet"     c   "Term inclusion file"
                        string  typestr="filename"

section "Network Features"
option  "zero"      z   "Zero missing values"
                        flag    off
option  "zeros"     Z   "Read zeroed node IDs/outputs from the given file"
                        string  typestr="filename"

section "Optional"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag	Default	Type	Description
None	None	DAT/DAB files	Input DAT/DAB files from which data is drawn for display in the output.
-w	None	DAT/DAB file	Functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN).
-g	None	Text gene list	If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-G	None	Text gene list	If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes.
-c	None	Text gene list	If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes.
-z	off	Flag	If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin).
-Z	None	Tab-delimited text file	If given, argument must be a tab-delimited text file containing two columns, the first node IDs and the second bin numbers (zero indexed). For each node ID present in this file, missing values will be substituted with the given bin number.