Sleipnir: Data2Bnt

Data2Bnt converts a collection of DAT/DAB files into a features file appropriate for use with either the MATLAB Bayes Net Toolkit (BNT) or the Weka machine learning package. Data2Bnt provides one way to bridge the single gene function/gene pair functional relationship gap by producing one example per gene pair, but labeling these examples based on the inclusion of one of that pair's genes in a positive gene set (e.g. pathway/process/complex/etc.) Similar to Data2Features.

Usage

Basic Usage

 Data2Bnt -i <positives.txt> -f <features.txt> -d <data.txt> -q <bins.quant> <data.pcl/dab>*

Generate a feature matrix compatible with BNT in which each row represents a gene pair, labeled positive if one of the genes is in positives.txt, with features as described in features.txt, default values drawn from data.txt, and feature values from the microarray PCL files data.pcl or DAT/DAB files data.dab. If PCL files are used, they are discretized using the bin edges in the QUANT file bins.quant.

Detailed Usage

package "Data2Bnt"
version "1.0"
purpose "BNT data file generation from data"

section "Main"
option  "input"         i   "Positive gene list"
                            string  typestr="filename"
option  "features"      f   "List of features (nodes) and default values"
                            string  typestr="filename"  yes
option  "data"          d   "Feature values for each data set"
                            string  typestr="filename"  yes
option  "quants"        q   "Quantization file for membership values"
                            string  typestr="filename"  yes

section "Miscellaneous"
option  "genome"        g   "SGD features file"
                            string  typestr="filename"
option  "fraction"      p   "Fraction of genome to cover with default values"
                            double  default="1"

section "Output Format"
option  "sparse"        s   "Output sparse matrix"
                            flag    off
option  "comments"      c   "Include informational comments"
                            flag    off
option  "xrff"          x   "Generate XRFF formatted output"
                            flag    off
option  "weights"       w   "Weight XRFF features"
                            flag    off

section "Optional"
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
None	None	DAT/DAB files	Input DAT/DAB files from which data is drawn for features in the output BNT/Weka file.
-i	stdin	Gene text file	Set of genes such that pairs including them will be labeled as positive examples.
-f	None	Text file	Tab-delimited text file containing three columns: feature name, \|-delimited feature values, and an optional default value. Lines starting with # are ignored as comments.
-d	None	Text file	Tab-delimited text file containing one dataset per line. The first tab-delimited token of each line should be a dataset name, with all subsequent tokens of the form <feature name>\|<feature value>.
-q	None	QUANT text file	Tab-delimited QUANT file containing exactly one line of bin edges; these are used to discretize all continuous input data. For details, see Sleipnir::CDataPair.
-g	None	SGD features text file	SGD_features.tab file; if given, process only genes appearing in this file.
-p	1	Double	Randomly subsample the requested fraction of the available data.
-s	off	Flag	If on, display only non-default values in the output matrix. This is incompatible with BNT/Weka and should only be used for informational purposes.
-c	off	Flag	If on, include #-prefixed comments in the output indicating which examples represent which gene pairs. This is incompatible with BNT/Weka and should only be used for informational purposes.
-x	off	Flag	If on, generate Weka-compatible XRFF output; if off, generate BNT-compatible textual output.
-w	off	Flag	If on, generated weighted XRFF output by combining and upweighting identical examples.