Sleipnir
LibSVMer

LibSVMer performs SVM learning using the LibSVM library. It supports cross validation and reading from binary PCL files created by PCL2Bin.

Usage

Basic Usage

 LibSVMer -l <labels_file> -p <params_file> -i <data.bin> -o <output_directory> -a

The labels file is of the format (NOTE WELL: IN ALL THE FOLLOWING FORMATS DELIMITERS ARE TABS -- doxygen converts them to spaces automatically).

 ACTA2  -1
 ACTN4  1
 ADAM10 -1
 AGRN   1
 AGTR1  -1
 ALDOB  -1
 ALOX12 1
 ANGPT2 1
 APOA4  1
 AQP1   1

where -1 indicates negative and 1 indicates positive. The examples must be separated with tabs.

Output is of the format

 IGHV1-69   0   1.94073
 DAG1   1   1.9401
 FNDC3B 0   1.93543
 HPGD   -1  1.93181
 TPSAB1 0   1.92928
 CLIC5  1   1.92759

where the first column is the example name, the second column is the gold standard status (matching labels) and the third column is the prediction from the SVM.

The params_file is of the format

 10 0.1 0.5
 10 0.01    0.5
 10 0.001   0.5
 10 0.0001  0.5
 10 0.00001 0.5
 10 0.000001    0.5

where the first column represents the error function, the second column represents the tradeoff constant and the third column represents k_value (for precision at k recall, but unused for the AUC error function in the example above.

LibSVMer can also be used to output a model or learn a network, although currently those features are undocumented.

Detailed Usage

package "LibSVMer"
version "1.0"
purpose "Wrapper for LibSVM"

section "Main"
option  "labels"                l   "Labels file"
                                        string  typestr="filename"  no
option  "output"                o   "Output file "
                                        string  typestr="filename"  no
option  "input"                 i   "Input PCL file "
                                        string  typestr="filename"  yes
option  "model"                 m   "Model file"
                                        string  typestr="filename"  no
option  "all"                   a   "Always classify all genes in PCLs"  
                                        flag off

section "Options"
option "skip"                   s   "Number of columns to skip in input pcls"
                                        int default="2" no
option  "normalize"             n   "Normalize PCLS to 0 mean 1 variance"
                                        flag    off
option  "cross_validation"      c   "Number of cross-validation sets ( arg of 1 will turn off cross-validation )"
                                        int default="5" no
option  "num_cv_runs"                   r       "Number of cross-validation runs"
                                                                                int default="1" no
option "svm_type"                       v       "Sets type of SVM (default 0)
0\tC-SVC
1\tnu-SVC
2\tone-class SVM\n"
                                                                                int default="0" no
option "balance"         b   "weight classes such that C_P * n_P = C_N * n_N"
                                                                                flag off
option "tradeoff"               t   "SVM tradeoff constant C of C-SVC"
                                        float default="1" no
option "nu"                             u   "nu parameter of nu-SVC, one-class SVM"
                                                                                float default="0.5" no
option  "mmap"                  M   "Memory map binary input"
                                        flag    off
Flag Default Type Description
-i None PCL/BIN file Input PCL file
-o None Directory Output directory.
-l None Labels file The file with examples formatted as noted above.
-m None Model file If present, output the learned model to this file.
-a off Flag If on output predictions for all genes in the PCL.
-S off Flag If on, use slack rescaling.
-s 2 int Number of columns to skip from PCL file.
-n off Flag Normalize PCL to 0 mean, 1 variance.
-c 5 int Number of cross validation intervals.
-e 10 int Which loss function should be used? (options: 0, 1, 2, 3, 4, 5, 10).
-k 0.5 float value of k for precision or recall.
-t 1 float SVM tradeoff constant C (note that this differs from the version in SVM light by a constant factor, check LibSVM docs for details).
-p None Filename Parameters file (to test with multiple parameters).
-M off Flag Memory map binary input PCLs (BIN files).