Sleipnir
|
SVMperfer performs SVM learning using the SVMperf library. It supports cross validation and reading from binary PCL files created by PCL2Bin. SVMperfer has been used for Network inference and gene prediction studies.
NOTE: Delimiters are tabs in all the following formats -- doxygen converts them to spaces automatically.
SVMperfer -l <labels_file> -p <params_file> -i <data.bin> -o <output_directory> -a SVMperfer -l <labels_file> -c 5 -t 50 -i <PCL/Dat file> -o <output_file>
The label file and Test label file is assumed to have a example name (i.e. row name of input file) and its known label (-1 for negative examples and 1 for positive examples) separated with tabs. Genes are examples in the following example.
ACTA2 -1 ACTN4 1 ADAM10 -1 AGRN 1 AGTR1 -1 ALDOB -1 ALOX12 1 ANGPT2 1 APOA4 1 AQP1 1
Output is of the format
IGHV1-69 0 1.94073 DAG1 1 1.9401 FNDC3B 0 1.93543 HPGD -1 1.93181 TPSAB1 0 1.92928 CLIC5 1 1.92759
where the first column is the example name, the second column is the known label (given in the label file) and the third column is the SVM prediction (soft value). Unlabelled examples are given a label of 0. Examples are sorted by their predicted SVM output soft value.
The params_file is of the format
10 0.1 0.5 10 0.01 0.5 10 0.001 0.5 10 0.0001 0.5 10 0.00001 0.5 10 0.000001 0.5
where the first column represents the error function, the second column represents the tradeoff constant and the third column represents k_value (for precision at k recall, but unused for the AUC error function in the example above.
package "SVMperfer"
version "1.0"
purpose "Wrapper for SVM perf"
section "Main"
option "labels" l "Labels file"
string typestr="filename" no
option "output" o "Output file "
string typestr="filename" no
option "input" i "Input PCL file "
string typestr="filename" yes
option "model" m "Model file"
string typestr="filename" no
option "test_labels" T "Test Labels file"
string typestr="filename" no
option "all" a "Always classify all genes in PCLs"
flag off
option "slack" S "Use slack rescaling (not implemented for ROC loss)"
flag off
section "Options"
option "verbosity" v "Sets the svm_struct verbosity"
int default="0" no
option "skip" s "Number of columns to skip in input pcls"
int default="2" no
option "normalize" n "Normalize PCLS to 0 mean 1 variance"
flag off
option "cross_validation" c "Number of cross-validation sets ( arg of 1 will turn off cross-validation )"
int default="5" no
option "error_function" e "Sets the loss function for SVM learning: Choice of:
0\tZero/one loss: 1 if vector of predictions contains error, 0 otherwise.
1\tF1: 100 minus the F1-score in percent.
2\tErrorrate: Percentage of errors in prediction vector.
3\tPrec/Rec Breakeven: 100 minus PRBEP in percent.
4\tPrec@k: 100 minus precision at k in percent.
5\tRec@k: 100 minus recall at k in percent.
10\tROCArea: Percentage of swapped pos/neg pairs (i.e. 100 - ROCArea).\n"
int default="10" no
option "k_value" k "Value of k parameter used for Prec@k and Rec@k in (0,1)"
float default="0.5" no
option "tradeoff" t "SVM tradeoff constant C"
float default="1" no
option "simple_model" A "Write model files with only linear weights"
flag on
option "params" p "Parameter file"
string typestr="filename" no
option "mmap" M "Memory map binary input"
flag off
Flag | Default | Type | Description |
---|---|---|---|
-i | None | PCL/BIN file | Input PCL file |
-o | None | Directory | Output directory. |
-l | None | Labels file | The file with examples formatted as noted above. |
-m | None | Model file | If present, output the learned model to this file. |
-a | off | Flag | If on output predictions for all genes in the PCL. |
-S | off | Flag | If on, use slack rescaling. |
-s | 2 | int | Number of columns to skip from PCL file. |
-n | off | Flag | Normalize PCL to 0 mean, 1 variance. |
-c | 5 | int | Number of cross validation intervals. |
-e | 10 | int | Which loss function should be used? (options: 0, 1, 2, 3, 4, 5, 10). |
-k | 0.5 | float | value of k for precision or recall. |
-t | 1 | float | SVM tradeoff constant C (note that this differs from the version in SVM light by a constant factor, check SVMPerf docs for details). |
-p | None | Filename | Parameters file (to test with multiple parameters). |
-M | off | Flag | Memory map binary input PCLs (BIN files). |