Sleipnir
|
Data2Bnt converts a collection of DAT/DAB files into a features file appropriate for use with either the MATLAB Bayes Net Toolkit (BNT) or the Weka machine learning package. Data2Bnt provides one way to bridge the single gene function/gene pair functional relationship gap by producing one example per gene pair, but labeling these examples based on the inclusion of one of that pair's genes in a positive gene set (e.g. pathway/process/complex/etc.) Similar to Data2Features.
Data2Bnt -i <positives.txt> -f <features.txt> -d <data.txt> -q <bins.quant> <data.pcl/dab>*
Generate a feature matrix compatible with BNT in which each row represents a gene pair, labeled positive if one of the genes is in positives.txt
, with features as described in features.txt
, default values drawn from data.txt
, and feature values from the microarray PCL files data.pcl
or DAT/DAB files data.dab
. If PCL files are used, they are discretized using the bin edges in the QUANT file bins.quant
.
package "Data2Bnt"
version "1.0"
purpose "BNT data file generation from data"
section "Main"
option "input" i "Positive gene list"
string typestr="filename"
option "features" f "List of features (nodes) and default values"
string typestr="filename" yes
option "data" d "Feature values for each data set"
string typestr="filename" yes
option "quants" q "Quantization file for membership values"
string typestr="filename" yes
section "Miscellaneous"
option "genome" g "SGD features file"
string typestr="filename"
option "fraction" p "Fraction of genome to cover with default values"
double default="1"
section "Output Format"
option "sparse" s "Output sparse matrix"
flag off
option "comments" c "Include informational comments"
flag off
option "xrff" x "Generate XRFF formatted output"
flag off
option "weights" w "Weight XRFF features"
flag off
section "Optional"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
None | None | DAT/DAB files | Input DAT/DAB files from which data is drawn for features in the output BNT/Weka file. |
-i | stdin | Gene text file | Set of genes such that pairs including them will be labeled as positive examples. |
-f | None | Text file | Tab-delimited text file containing three columns: feature name, |-delimited feature values, and an optional default value. Lines starting with # are ignored as comments. |
-d | None | Text file | Tab-delimited text file containing one dataset per line. The first tab-delimited token of each line should be a dataset name, with all subsequent tokens of the form <feature name>|<feature value>. |
-q | None | QUANT text file | Tab-delimited QUANT file containing exactly one line of bin edges; these are used to discretize all continuous input data. For details, see Sleipnir::CDataPair. |
-g | None | SGD features text file | SGD_features.tab file; if given, process only genes appearing in this file. |
-p | 1 | Double | Randomly subsample the requested fraction of the available data. |
-s | off | Flag | If on, display only non-default values in the output matrix. This is incompatible with BNT/Weka and should only be used for informational purposes. |
-c | off | Flag | If on, include #-prefixed comments in the output indicating which examples represent which gene pairs. This is incompatible with BNT/Weka and should only be used for informational purposes. |
-x | off | Flag | If on, generate Weka-compatible XRFF output; if off, generate BNT-compatible textual output. |
-w | off | Flag | If on, generated weighted XRFF output by combining and upweighting identical examples. |