Sleipnir
|
Data2Svm learns a support vector machine model classifying individual genes in or out (positive or negative) of a given gene set. Constructs features for each example (gene) based on data in an input PCL. Similar to SVMer.
Data2Svm -i <data.pcl> -m <learned.svm> -g <context.txt> -G <holdout.txt>
Learn a support vector machine model (saved as learned.svm
) using the microarray expression values in data.pcl
as features, labeling the genes in context.txt
as positive examples (and all other genes as negatives), and holding out the genes in holdout.txt
from training. Outputs (to standard output) the predicted SVM classifications for all genes after learning.
package "Data2Svm"
version "1.0"
purpose "SVM evaluation of data for GO term prediction"
section "Main"
option "input" i "Data set to analyze (PCL)"
string typestr="filename" yes
option "model" m "SVM model file"
string typestr="filename"
section "Learning/Evaluation"
option "genes" g "List of positive genes"
string typestr="filename"
option "genex" G "List of test genes"
string typestr="filename"
option "heldout" l "Evaluate only test genes"
flag off
option "random_features" z "Randomize input features"
flag off
option "random_output" Z "Randomize output values"
flag off
section "SVM"
option "cache" e "SVM cache size"
int default="40"
option "kernel" k "SVM kernel function"
values="linear","poly","rbf" default="linear"
option "tradeoff" C "Classification tradeoff"
float
option "gamma" M "RBF gamma"
float default="1"
option "degree" d "Polynomial degree"
int default="3"
option "alphas" a "SVM alphas file"
string typestr="filename"
option "iterations" t "SVM iterations"
int default="100000"
section "Optional"
option "normalize" n "Z-score normalize feature values"
flag off
option "skip" s "Columns to skip in input PCL"
int default="2"
option "random" r "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | PCL text file | Input PCL file from which features will be drawn to construct SVM examples. |
-m | stdout | SVM model file | Output learned SVM model. |
-g | None | Gene text file | Set of genes to be labeled as positive examples. |
-G | None | Gene text file | If given, set of genes to be held out of training and evaluated as test examples. |
-l | off | Flag | If on, evaluate and output SVM predictions only for test genes; if off, evaluate all genes. |
-z | off | Flag | If on, randomize input feature values within each row (gene). |
-Z | off | Flag | If on, randomize output SVM prediction labels across all genes. |
-e | 40 | Integer (MB) | SVM cache size in megabytes. |
-k | linear | linear, poly, or rbf | SVM kernel type: linear, polynomial, or radial basis function. |
-C | None | Float | SVM tradeoff between misclassification and margin; an appropriate default is calculated if no value is given. |
-M | 1 | Float | Gamma parameter for RBF kernel. |
-d | 3 | Integer | Degree parameter for polynomial kernel. |
-a | None | Alphas file | If given, SVM Light alphas file used to initialize the SVM model. |
-t | 100000 | Integer | Maximum number of iterations to run per SVM learning epoch. |
-n | off | Flag | If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing. |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |