Sleipnir
SVMer

SVMer learns and evaluates support vector machine models from DAT/DAB datasets in a variety of ways. If given PCL inputs, SVMer will construct one example per gene pair by concatenating the two genes' expression vectors to create features. If given DAT/DAB inputs, SVMer will construct one example per gene pair using each dataset as a feature. In genewise mode, SVMer will learn one model per gene, with one example constructed for each pair in which that gene participates.

Usage

Basic Usage

 SVMer -i <answers.dab> -m <learned.svm> <data.pcl/dab>*

Learn an SVM model learned.svm for gene pairs using labels from answers.dab and data from data.pcl (for features built from PCL conditions) or data.dab (for feature values drawn from DAT/DAB files).

 SVMer -m <learned.svm> -o <predictions.dab> <data.pcl/dab>*

Using the SVM model in learned.svm, predict labels for gene pairs using data from data.pcl or data.dab and store the resulting predicted functional interaction network in predictions.dab.

Detailed Usage

package "SVMer"
version "1.0"
purpose "SVM training and evaluation"

section "Main"
option  "input"         i   "Input answer DAT/DAB file"
                            string  typestr="filename"
option  "output"        o   "Output prediction DAT/DAB file"
                            string  typestr="filename"
option  "model"         m   "SVM model file or directory"
                            string  typestr="filename/directory"

section "Feature Mode"
option  "pcl"           p   "PCL input mode"
                            flag    on
option  "binary"        b   "Input binary training file"
                            string  typestr="filename"
option  "genewise"      w   "Learn per-gene SVMs for pairwise predictions"
                            flag    off
option  "genel"         l   "Gene skip file for per-gene SVMs"
                            string  typestr="filename"

section "Learning/Evaluation"
option  "genes"         g   "Gene inclusion file"
                            string  typestr="filename"
option  "genex"         G   "Gene exclusion file"
                            string  typestr="filename"
option  "genet"         c   "Term inclusion file"
                            string  typestr="filename"

section "SVM"
option  "kernel"        k   "SVM kernel function"
                            values="linear","poly","rbf"    default="linear"
option  "cache"         e   "SVM cache size"
                            int default="40"
option  "tradeoff"      C   "Classification tradeoff"
                            float
option  "gamma"         M   "RBF gamma"
                            float   default="1"
option  "degree"        d   "Polynomial degree"
                            int default="3"
option  "alphas"        a   "SVM alphas file"
                            string  typestr="filename"
option  "iterations"    t   "SVM iterations"
                            int default="100000"

section "Optional"
option  "skip"          s   "Columns to skip in input PCLs"
                            int default="2"
option  "random"        r   "Seed random generator"
                            int default="0"
option  "verbosity"     v   "Message verbosity"
                            int default="5"
Flag Default Type Description
None None PCL or DAT/DAB files Input data files from which features are constructed, either PCLs from which expression vectors are concatenated or DAT/DABs from which pairwise values are read.
-i stdin DAT/DAB file If given, functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN). If not given, evaluation is assumed and SVM model(s) is/are read from -m.
-o stdout DAT/DAB file Output predictions from the SVM model(s) for each available gene pair.
-m None SVM model file or directory In standard mode, output learned SVM model file (if -i is given) or input SVM model file to be evaluated (if it is not). If genewise mode, directory containing output learned or input evaluated SVM model files.
-p on Flag If on, assume input files are PCLs from which features are constructed by concatenation of expression vectors. If off, assume input files are DAT/DABs from which one feature is drawn per dataset for each gene pair example.
-b None Binary feature file If given, ignore other inputs and assume the given binary file is to be used for model evaluation (if -o is specified) or learning (if it is not).
-w off Flag If on, learn/evaluate one SVM model per gene, using only the gene pairs including that gene (and thus each example represents one other gene). If off, learn/evaluate one global SVM model in which each feature represents a gene pair.
-l None Gene text file If given, in genewise mode, learn/evaluate models only for genes in the given gene set.
-g None Text gene list If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-G None Text gene list If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes.
-c None Text gene list If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes.
-k linear linear, poly, or rbf SVM kernel type: linear, polynomial, or radial basis function.
-e 40 Integer (MB) SVM cache size in megabytes.
-C None Float SVM tradeoff between misclassification and margin; an appropriate default is calculated if no value is given.
-M 1 Float Gamma parameter for RBF kernel.
-d 3 Integer Degree parameter for polynomial kernel.
-a None Alphas file If given, SVM Light alphas file used to initialize the SVM model.
-t 100000 Integer Maximum number of iterations to run per SVM learning epoch.
-s 2 Integer Number of columns to skip in any PCL data files between the initial ID column and the experimental data columns. Must be the same number for all PCL files.