Sleipnir
Data2Svm

Data2Svm learns a support vector machine model classifying individual genes in or out (positive or negative) of a given gene set. Constructs features for each example (gene) based on data in an input PCL. Similar to SVMer.

Usage

Basic Usage

 Data2Svm -i <data.pcl> -m <learned.svm> -g <context.txt> -G <holdout.txt>

Learn a support vector machine model (saved as learned.svm) using the microarray expression values in data.pcl as features, labeling the genes in context.txt as positive examples (and all other genes as negatives), and holding out the genes in holdout.txt from training. Outputs (to standard output) the predicted SVM classifications for all genes after learning.

Detailed Usage

package "Data2Svm"
version "1.0"
purpose "SVM evaluation of data for GO term prediction"

section "Main"
option  "input"             i   "Data set to analyze (PCL)"
                                string  typestr="filename"  yes
option  "model"             m   "SVM model file"
                                string  typestr="filename"

section "Learning/Evaluation"
option  "genes"             g   "List of positive genes"
                                string  typestr="filename"
option  "genex"             G   "List of test genes"
                                string  typestr="filename"
option  "heldout"           l   "Evaluate only test genes"
                                flag    off
option  "random_features"   z   "Randomize input features"
                                flag    off
option  "random_output"     Z   "Randomize output values"
                                flag    off

section "SVM"
option  "cache"             e   "SVM cache size"
                                int default="40"
option  "kernel"            k   "SVM kernel function"
                                values="linear","poly","rbf"    default="linear"
option  "tradeoff"          C   "Classification tradeoff"
                                float
option  "gamma"             M   "RBF gamma"
                                float   default="1"
option  "degree"            d   "Polynomial degree"
                                int default="3"
option  "alphas"            a   "SVM alphas file"
                                string  typestr="filename"
option  "iterations"        t   "SVM iterations"
                                int default="100000"

section "Optional"
option  "normalize"         n   "Z-score normalize feature values"
                                flag    off
option  "skip"              s   "Columns to skip in input PCL"
                                int default="2"
option  "random"            r   "Seed random generator"
                                int default="0"
option  "verbosity"         v   "Message verbosity"
                                int default="5"
Flag Default Type Description
-i stdin PCL text file Input PCL file from which features will be drawn to construct SVM examples.
-m stdout SVM model file Output learned SVM model.
-g None Gene text file Set of genes to be labeled as positive examples.
-G None Gene text file If given, set of genes to be held out of training and evaluated as test examples.
-l off Flag If on, evaluate and output SVM predictions only for test genes; if off, evaluate all genes.
-z off Flag If on, randomize input feature values within each row (gene).
-Z off Flag If on, randomize output SVM prediction labels across all genes.
-e 40 Integer (MB) SVM cache size in megabytes.
-k linear linear, poly, or rbf SVM kernel type: linear, polynomial, or radial basis function.
-C None Float SVM tradeoff between misclassification and margin; an appropriate default is calculated if no value is given.
-M 1 Float Gamma parameter for RBF kernel.
-d 3 Integer Degree parameter for polynomial kernel.
-a None Alphas file If given, SVM Light alphas file used to initialize the SVM model.
-t 100000 Integer Maximum number of iterations to run per SVM learning epoch.
-n off Flag If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing.
-s 2 Integer Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.