Sleipnir: SVMer

SVMer learns and evaluates support vector machine models from DAT/DAB datasets in a variety of ways. If given PCL inputs, SVMer will construct one example per gene pair by concatenating the two genes' expression vectors to create features. If given DAT/DAB inputs, SVMer will construct one example per gene pair using each dataset as a feature. In genewise mode, SVMer will learn one model per gene, with one example constructed for each pair in which that gene participates.

Usage

Basic Usage

 SVMer -i <answers.dab> -m <learned.svm> <data.pcl/dab>*

Learn an SVM model learned.svm for gene pairs using labels from answers.dab and data from data.pcl (for features built from PCL conditions) or data.dab (for feature values drawn from DAT/DAB files).

 SVMer -m <learned.svm> -o <predictions.dab> <data.pcl/dab>*

Using the SVM model in learned.svm, predict labels for gene pairs using data from data.pcl or data.dab and store the resulting predicted functional interaction network in predictions.dab.

Detailed Usage

package "SVMer"
version "1.0"
purpose "SVM training and evaluation"

section "Main"
option  "input"         i   "Input answer DAT/DAB file"
                            string  typestr="filename"
option  "output"        o   "Output prediction DAT/DAB file"
                            string  typestr="filename"
option  "model"         m   "SVM model file or directory"
                            string  typestr="filename/directory"

section "Feature Mode"
option  "pcl"           p   "PCL input mode"
                            flag    on
option  "binary"        b   "Input binary training file"
                            string  typestr="filename"
option  "genewise"      w   "Learn per-gene SVMs for pairwise predictions"
                            flag    off
option  "genel"         l   "Gene skip file for per-gene SVMs"
                            string  typestr="filename"

section "Learning/Evaluation"
option  "genes"         g   "Gene inclusion file"
                            string  typestr="filename"
option  "genex"         G   "Gene exclusion file"
                            string  typestr="filename"
option  "genet"         c   "Term inclusion file"
                            string  typestr="filename"

section "SVM"
option  "kernel"        k   "SVM kernel function"
                            values="linear","poly","rbf"    default="linear"
option  "cache"         e   "SVM cache size"
                            int default="40"
option  "tradeoff"      C   "Classification tradeoff"
                            float
option  "gamma"         M   "RBF gamma"
                            float   default="1"
option  "degree"        d   "Polynomial degree"
                            int default="3"
option  "alphas"        a   "SVM alphas file"
                            string  typestr="filename"
option  "iterations"    t   "SVM iterations"
                            int default="100000"

section "Optional"
option  "skip"          s   "Columns to skip in input PCLs"
                            int default="2"
option  "random"        r   "Seed random generator"
                            int default="0"
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
None	None	PCL or DAT/DAB files	Input data files from which features are constructed, either PCLs from which expression vectors are concatenated or DAT/DABs from which pairwise values are read.
-i	stdin	DAT/DAB file	If given, functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN). If not given, evaluation is assumed and SVM model(s) is/are read from `-m`.
-o	stdout	DAT/DAB file	Output predictions from the SVM model(s) for each available gene pair.
-m	None	SVM model file or directory	In standard mode, output learned SVM model file (if `-i` is given) or input SVM model file to be evaluated (if it is not). If genewise mode, directory containing output learned or input evaluated SVM model files.
-p	on	Flag	If on, assume input files are PCLs from which features are constructed by concatenation of expression vectors. If off, assume input files are DAT/DABs from which one feature is drawn per dataset for each gene pair example.
-b	None	Binary feature file	If given, ignore other inputs and assume the given binary file is to be used for model evaluation (if `-o` is specified) or learning (if it is not).
-w	off	Flag	If on, learn/evaluate one SVM model per gene, using only the gene pairs including that gene (and thus each example represents one other gene). If off, learn/evaluate one global SVM model in which each feature represents a gene pair.
-l	None	Gene text file	If given, in genewise mode, learn/evaluate models only for genes in the given gene set.
-g	None	Text gene list	If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-G	None	Text gene list	If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes.
-c	None	Text gene list	If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes.
-k	linear	linear, poly, or rbf	SVM kernel type: linear, polynomial, or radial basis function.
-e	40	Integer (MB)	SVM cache size in megabytes.
-C	None	Float	SVM tradeoff between misclassification and margin; an appropriate default is calculated if no value is given.
-M	1	Float	Gamma parameter for RBF kernel.
-d	3	Integer	Degree parameter for polynomial kernel.
-a	None	Alphas file	If given, SVM Light alphas file used to initialize the SVM model.
-t	100000	Integer	Maximum number of iterations to run per SVM learning epoch.
-s	2	Integer	Number of columns to skip in any PCL data files between the initial ID column and the experimental data columns. Must be the same number for all PCL files.