Sleipnir
|
SVMer learns and evaluates support vector machine models from DAT/DAB datasets in a variety of ways. If given PCL inputs, SVMer will construct one example per gene pair by concatenating the two genes' expression vectors to create features. If given DAT/DAB inputs, SVMer will construct one example per gene pair using each dataset as a feature. In genewise mode, SVMer will learn one model per gene, with one example constructed for each pair in which that gene participates.
SVMer -i <answers.dab> -m <learned.svm> <data.pcl/dab>*
Learn an SVM model learned.svm
for gene pairs using labels from answers.dab
and data from data.pcl
(for features built from PCL conditions) or data.dab
(for feature values drawn from DAT/DAB files).
SVMer -m <learned.svm> -o <predictions.dab> <data.pcl/dab>*
Using the SVM model in learned.svm
, predict labels for gene pairs using data from data.pcl
or data.dab
and store the resulting predicted functional interaction network in predictions.dab
.
package "SVMer"
version "1.0"
purpose "SVM training and evaluation"
section "Main"
option "input" i "Input answer DAT/DAB file"
string typestr="filename"
option "output" o "Output prediction DAT/DAB file"
string typestr="filename"
option "model" m "SVM model file or directory"
string typestr="filename/directory"
section "Feature Mode"
option "pcl" p "PCL input mode"
flag on
option "binary" b "Input binary training file"
string typestr="filename"
option "genewise" w "Learn per-gene SVMs for pairwise predictions"
flag off
option "genel" l "Gene skip file for per-gene SVMs"
string typestr="filename"
section "Learning/Evaluation"
option "genes" g "Gene inclusion file"
string typestr="filename"
option "genex" G "Gene exclusion file"
string typestr="filename"
option "genet" c "Term inclusion file"
string typestr="filename"
section "SVM"
option "kernel" k "SVM kernel function"
values="linear","poly","rbf" default="linear"
option "cache" e "SVM cache size"
int default="40"
option "tradeoff" C "Classification tradeoff"
float
option "gamma" M "RBF gamma"
float default="1"
option "degree" d "Polynomial degree"
int default="3"
option "alphas" a "SVM alphas file"
string typestr="filename"
option "iterations" t "SVM iterations"
int default="100000"
section "Optional"
option "skip" s "Columns to skip in input PCLs"
int default="2"
option "random" r "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
None | None | PCL or DAT/DAB files | Input data files from which features are constructed, either PCLs from which expression vectors are concatenated or DAT/DABs from which pairwise values are read. |
-i | stdin | DAT/DAB file | If given, functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN). If not given, evaluation is assumed and SVM model(s) is/are read from -m . |
-o | stdout | DAT/DAB file | Output predictions from the SVM model(s) for each available gene pair. |
-m | None | SVM model file or directory | In standard mode, output learned SVM model file (if -i is given) or input SVM model file to be evaluated (if it is not). If genewise mode, directory containing output learned or input evaluated SVM model files. |
-p | on | Flag | If on, assume input files are PCLs from which features are constructed by concatenation of expression vectors. If off, assume input files are DAT/DABs from which one feature is drawn per dataset for each gene pair example. |
-b | None | Binary feature file | If given, ignore other inputs and assume the given binary file is to be used for model evaluation (if -o is specified) or learning (if it is not). |
-w | off | Flag | If on, learn/evaluate one SVM model per gene, using only the gene pairs including that gene (and thus each example represents one other gene). If off, learn/evaluate one global SVM model in which each feature represents a gene pair. |
-l | None | Gene text file | If given, in genewise mode, learn/evaluate models only for genes in the given gene set. |
-g | None | Text gene list | If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes. |
-G | None | Text gene list | If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes. |
-c | None | Text gene list | If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes. |
-k | linear | linear, poly, or rbf | SVM kernel type: linear, polynomial, or radial basis function. |
-e | 40 | Integer (MB) | SVM cache size in megabytes. |
-C | None | Float | SVM tradeoff between misclassification and margin; an appropriate default is calculated if no value is given. |
-M | 1 | Float | Gamma parameter for RBF kernel. |
-d | 3 | Integer | Degree parameter for polynomial kernel. |
-a | None | Alphas file | If given, SVM Light alphas file used to initialize the SVM model. |
-t | 100000 | Integer | Maximum number of iterations to run per SVM learning epoch. |
-s | 2 | Integer | Number of columns to skip in any PCL data files between the initial ID column and the experimental data columns. Must be the same number for all PCL files. |