Sleipnir
SVMperfer

SVMperfer performs SVM learning using the SVMperf library. It supports cross validation and reading from binary PCL files created by PCL2Bin. SVMperfer has been used for Network inference and gene prediction studies.

NOTE: Delimiters are tabs in all the following formats -- doxygen converts them to spaces automatically.

Usage

Basic Usage

 SVMperfer -l <labels_file> -p <params_file> -i <data.bin> -o <output_directory> -a
 SVMperfer -l <labels_file> -c 5 -t 50 -i <PCL/Dat file> -o <output_file> 

The label file and Test label file is assumed to have a example name (i.e. row name of input file) and its known label (-1 for negative examples and 1 for positive examples) separated with tabs. Genes are examples in the following example.

 ACTA2  -1
 ACTN4  1
 ADAM10 -1
 AGRN   1
 AGTR1  -1
 ALDOB  -1
 ALOX12 1
 ANGPT2 1
 APOA4  1
 AQP1   1

Output is of the format

 IGHV1-69   0   1.94073
 DAG1   1   1.9401
 FNDC3B 0   1.93543
 HPGD   -1  1.93181
 TPSAB1 0   1.92928
 CLIC5  1   1.92759

where the first column is the example name, the second column is the known label (given in the label file) and the third column is the SVM prediction (soft value). Unlabelled examples are given a label of 0. Examples are sorted by their predicted SVM output soft value.

The params_file is of the format

 10 0.1 0.5
 10 0.01    0.5
 10 0.001   0.5
 10 0.0001  0.5
 10 0.00001 0.5
 10 0.000001    0.5

where the first column represents the error function, the second column represents the tradeoff constant and the third column represents k_value (for precision at k recall, but unused for the AUC error function in the example above.

Detailed Usage

package "SVMperfer"
version "1.0"
purpose "Wrapper for SVM perf"

section "Main"
option  "labels"                l   "Labels file"
                                        string  typestr="filename"  no
option  "output"                o   "Output file "
                                        string  typestr="filename"  no
option  "input"                 i   "Input PCL file "
                                        string  typestr="filename"  yes
option  "model"                 m   "Model file"
                                        string  typestr="filename"  no
option  "test_labels"           T   "Test Labels file"
                                        string  typestr="filename"  no
option  "all"                   a   "Always classify all genes in PCLs"  
                                        flag off

option  "slack"                 S   "Use slack rescaling (not implemented for ROC loss)"
                                        flag off

section "Options"
option "verbosity"              v   "Sets the svm_struct verbosity"
                                        int default="0" no
option "skip"                   s   "Number of columns to skip in input pcls"
                                        int default="2" no
option  "normalize"             n   "Normalize PCLS to 0 mean 1 variance"
                                        flag    off
option  "cross_validation"      c   "Number of cross-validation sets ( arg of 1 will turn off cross-validation )"
                                        int default="5" no
option "error_function"         e   "Sets the loss function for SVM learning: Choice of:
0\tZero/one loss: 1 if vector of predictions contains error, 0 otherwise.
1\tF1: 100 minus the F1-score in percent.
2\tErrorrate: Percentage of errors in prediction vector.
3\tPrec/Rec Breakeven: 100 minus PRBEP in percent.
4\tPrec@k: 100 minus precision at k in percent.
5\tRec@k: 100 minus recall at k in percent.
10\tROCArea: Percentage of swapped pos/neg pairs (i.e. 100 - ROCArea).\n" 
                                        int default="10" no

option "k_value"                k   "Value of k parameter used for Prec@k and Rec@k in (0,1)"
                                        float default="0.5" no
option "tradeoff"               t   "SVM tradeoff constant C"
                                        float default="1" no
option "simple_model"           A   "Write model files with only linear weights"
                                        flag    on
option "params"                 p   "Parameter file"
                                        string  typestr="filename"   no
option  "mmap"                  M   "Memory map binary input"
                                        flag    off
Flag Default Type Description
-i None PCL/BIN file Input PCL file
-o None Directory Output directory.
-l None Labels file The file with examples formatted as noted above.
-m None Model file If present, output the learned model to this file.
-a off Flag If on output predictions for all genes in the PCL.
-S off Flag If on, use slack rescaling.
-s 2 int Number of columns to skip from PCL file.
-n off Flag Normalize PCL to 0 mean, 1 variance.
-c 5 int Number of cross validation intervals.
-e 10 int Which loss function should be used? (options: 0, 1, 2, 3, 4, 5, 10).
-k 0.5 float value of k for precision or recall.
-t 1 float SVM tradeoff constant C (note that this differs from the version in SVM light by a constant factor, check SVMPerf docs for details).
-p None Filename Parameters file (to test with multiple parameters).
-M off Flag Memory map binary input PCLs (BIN files).