|
Sleipnir
|
Data2Bnt converts a collection of DAT/DAB files into a features file appropriate for use with either the MATLAB Bayes Net Toolkit (BNT) or the Weka machine learning package. Data2Bnt provides one way to bridge the single gene function/gene pair functional relationship gap by producing one example per gene pair, but labeling these examples based on the inclusion of one of that pair's genes in a positive gene set (e.g. pathway/process/complex/etc.) Similar to Data2Features.
Data2Bnt -i <positives.txt> -f <features.txt> -d <data.txt> -q <bins.quant> <data.pcl/dab>*
Generate a feature matrix compatible with BNT in which each row represents a gene pair, labeled positive if one of the genes is in positives.txt, with features as described in features.txt, default values drawn from data.txt, and feature values from the microarray PCL files data.pcl or DAT/DAB files data.dab. If PCL files are used, they are discretized using the bin edges in the QUANT file bins.quant.
package "Data2Bnt"
version "1.0"
purpose "BNT data file generation from data"
section "Main"
option "input" i "Positive gene list"
string typestr="filename"
option "features" f "List of features (nodes) and default values"
string typestr="filename" yes
option "data" d "Feature values for each data set"
string typestr="filename" yes
option "quants" q "Quantization file for membership values"
string typestr="filename" yes
section "Miscellaneous"
option "genome" g "SGD features file"
string typestr="filename"
option "fraction" p "Fraction of genome to cover with default values"
double default="1"
section "Output Format"
option "sparse" s "Output sparse matrix"
flag off
option "comments" c "Include informational comments"
flag off
option "xrff" x "Generate XRFF formatted output"
flag off
option "weights" w "Weight XRFF features"
flag off
section "Optional"
option "verbosity" v "Message verbosity"
int default="5"
| Flag | Default | Type | Description |
|---|---|---|---|
| None | None | DAT/DAB files | Input DAT/DAB files from which data is drawn for features in the output BNT/Weka file. |
| -i | stdin | Gene text file | Set of genes such that pairs including them will be labeled as positive examples. |
| -f | None | Text file | Tab-delimited text file containing three columns: feature name, |-delimited feature values, and an optional default value. Lines starting with # are ignored as comments. |
| -d | None | Text file | Tab-delimited text file containing one dataset per line. The first tab-delimited token of each line should be a dataset name, with all subsequent tokens of the form <feature name>|<feature value>. |
| -q | None | QUANT text file | Tab-delimited QUANT file containing exactly one line of bin edges; these are used to discretize all continuous input data. For details, see Sleipnir::CDataPair. |
| -g | None | SGD features text file | SGD_features.tab file; if given, process only genes appearing in this file. |
| -p | 1 | Double | Randomly subsample the requested fraction of the available data. |
| -s | off | Flag | If on, display only non-default values in the output matrix. This is incompatible with BNT/Weka and should only be used for informational purposes. |
| -c | off | Flag | If on, include #-prefixed comments in the output indicating which examples represent which gene pairs. This is incompatible with BNT/Weka and should only be used for informational purposes. |
| -x | off | Flag | If on, generate Weka-compatible XRFF output; if off, generate BNT-compatible textual output. |
| -w | off | Flag | If on, generated weighted XRFF output by combining and upweighting identical examples. |
1.7.6.1