Sleipnir
|
Provides an interface for learning and evaluating support vector machines using svm_perf. More...
#include <svm.h>
Public Types | |
enum | EKernel { EKernelLinear = 0, EKernelPolynomial = EKernelLinear + 1, EKernelRBF = EKernelPolynomial + 1 } |
Type of kernel used by the SVM. More... | |
Public Member Functions | |
bool | OpenAlphas (std::istream &istm) |
Open an initial file of alphas to be used as a starting point during learning. | |
bool | Open (std::istream &istm) |
Open an SVM model file in the given stream. | |
bool | Save (std::ostream &ostm) const |
Save an SVM model file to the given stream. | |
bool | Learn (const CPCL &PCL, const CGenes &GenesPositive) |
Learn an SVM recognizing the given genes' values in the given PCL. | |
bool | Learn (const CPCL &PCL, const CGenes &GenesPositive, const CGenes &GenesNegative) |
Learn an SVM recognizing the given genes' values in the given PCL. | |
bool | Evaluate (const CPCL &PCL, std::vector< float > &vecdResults) const |
Evaluate the SVM's output for each row of the given PCL. | |
bool | Learn (const char *szData) |
Learn an SVM using the given binary example file. | |
bool | Learn (const CPCLSet &PCLs, const CDataPair &Answers) |
Learn an SVM using pairs of values from the given PCLs. | |
bool | Learn (const IDataset *pData, const CDataPair &Answers) |
Learn an SVM using data from the given dataset. | |
bool | Evaluate (const char *szFile, CDat &DatResults) const |
Evaluate an SVM using the given binary example file. | |
bool | Evaluate (const CPCLSet &PCLs, CDat &DatResults) const |
Evaluate an SVM using pairs of values from the given PCLs. | |
bool | Evaluate (const IDataset *pData, CDat &DatResults) const |
Evaluate an SVM using values from the given dataset. | |
bool | Evaluate (const CPCLSet &PCLs, const CGenes &GenesInclude, CDat &DatResults) const |
Evaluate an SVM using pairs of values from the given PCLs. | |
bool | Evaluate (const IDataset *pData, const CGenes &GenesInclude, CDat &DatResults) const |
Evaluate an SVM using values from the given dataset for the requested genes. | |
void | SetIterations (size_t iIterations) |
Set the maximum number of iterations during SVM learning. | |
void | SetCache (size_t iMegabytes) |
Set the cache size for SVM learning and evaluation. | |
void | SetTradeoff (float dTradeoff) |
Set the error/margin tradeoff for SVM learning. | |
void | SetGamma (float dGamma) |
Set the gamma parameter for learning an RBF kernel. | |
void | SetDegree (size_t iDegree) |
Set the degree parameter for learning a polynomial kernel. | |
void | SetKernel (EKernel eKernel) |
Set the kernel type parameter for SVM learning. | |
void | SetVerbosity (size_t iVerbosity) |
Set the verbosity parameter for svm_perf. |
Provides an interface for learning and evaluating support vector machines using svm_perf.
The CSVM class provides a variety of methods for learning and evaluating SVMs based on biological data types. All SVM manipulation is done using the svm_perf library (http://svmlight.joachims.org/), but the interface between Sleipnir and svm_perf has been optimized to pass appropriate data types (datasets, PCLs, etc.) as efficiently as possible. Note that SVM learning requires the entire dataset to be in memory simultaneously, so subsampling large answer sets is often necessary for SVMs when it would not be for Bayesian learning. On the other hand, individual data points can be evaluated easily, so memory is only a potential issue during SVM learning.
bool Sleipnir::CSVM::Evaluate | ( | const CPCL & | PCL, |
std::vector< float > & | vecdResults | ||
) | const |
Evaluate the SVM's output for each row of the given PCL.
PCL | PCL from which features are read. |
vecdResults | Values output by the SVM for each row of the given PCL. |
Evaluates the current SVM model, assuming each column of the given PCL is a feature and each row a record.
Definition at line 527 of file svm.cpp.
References Sleipnir::CPCL::GetGenes(), and Sleipnir::CPCL::IsMasked().
Referenced by Evaluate().
bool Sleipnir::CSVM::Evaluate | ( | const char * | szFile, |
CDat & | DatResults | ||
) | const [inline] |
Evaluate an SVM using the given binary example file.
szFile | File of examples for which SVM is evaluated. |
DatResults | CDat into which SVM predictions are placed. |
Evaluates the current SVM on an example file roughly equivalent to that given to svm_classify, but in binary form. This can greatly speed the loading of large example files. The binary layout of the file is:
4-byte unsigned integer, number of features F 4-byte unsigned integer, number of examples E E times: 4-byte float, label of the example F times 4-byte float, values of the example's features 4-byte integer, number of characters in the example's user data, MUST be 8 4-byte integer, index of first gene in the pair corresponding to this example 4-byte integer, index of second gene in the pair corresponding to this example 4-byte unsigned integer, total number of genes
Example labels are ignored during evaluation. The output CDat is filled based on the total number of genes and the gene pair indices labeling each example.
Definition at line 208 of file svm.h.
References Evaluate().
bool Sleipnir::CSVM::Evaluate | ( | const CPCLSet & | PCLs, |
CDat & | DatResults | ||
) | const [inline] |
Evaluate an SVM using pairs of values from the given PCLs.
PCLs | PCLs from which features are read. |
DatResults | CDat into which SVM predictions are placed. |
Evaluates the current SVM using the given data. For each gene pair in the given PCLs, a set of features is generated by pairing the vectors of values for the two genes from the PCLs. For example, for two genes A and B, if A has values of [0, 1, 2] in the given PCL, and B has values [2, 4, 0], an SVM example will be generated of the form:
1:0 2:1 3:2 4:2 5:4 6:0
Multiple PCLs within the set are concatenated (e.g. if the PCL set contains two PCLs with two and one condition, respectively, A's values [0, 1] might come from the first PCL and [2] from the second; likewise for B's [2, 4] and [0]).
Definition at line 240 of file svm.h.
References Evaluate().
bool Sleipnir::CSVM::Evaluate | ( | const IDataset * | pData, |
CDat & | DatResults | ||
) | const [inline] |
Evaluate an SVM using values from the given dataset.
pData | Dataset from which features are read. |
DatResults | CDat into which SVM predictions are placed. |
Evaluates the current SVM using the given data. For each gene pair, a set of features is generated using each non-hidden experiment in the given dataset. For example, if the dataset contains three experiments with values of 2, 4, and 0 for the pair AB, an SVM example will be generated of the form:
1:2 2:4 3:0
Definition at line 268 of file svm.h.
References Evaluate().
bool Sleipnir::CSVM::Evaluate | ( | const CPCLSet & | PCLs, |
const CGenes & | GenesInclude, | ||
CDat & | DatResults | ||
) | const [inline] |
Evaluate an SVM using pairs of values from the given PCLs.
PCLs | PCLs from which features are read. |
GenesInclude | Genes to be evaluated. |
DatResults | CDat into which SVM predictions are placed. |
Evaluates the given PCLs as per other Evaluate methods, but only over pairs for which both genes are in the given gene set.
Definition at line 298 of file svm.h.
References Evaluate().
bool Sleipnir::CSVM::Evaluate | ( | const IDataset * | pData, |
const CGenes & | GenesInclude, | ||
CDat & | DatResults | ||
) | const [inline] |
Evaluate an SVM using values from the given dataset for the requested genes.
pData | Dataset from which features are read. |
GenesInclude | Genes to be evaluated. |
DatResults | CDat into which SVM predictions are placed. |
Evaluates the given dataset as per other Evaluate methods, but only over pairs for which both genes are in the given gene set.
Definition at line 328 of file svm.h.
References Evaluate().
bool Sleipnir::CSVM::Learn | ( | const CPCL & | PCL, |
const CGenes & | GenesPositive | ||
) |
Learn an SVM recognizing the given genes' values in the given PCL.
PCL | PCL from which features are read. |
GenesPositive | Set of positive examples to be learned. |
Learns an SVM model using the current settings, assuming each column of the given PCL is a feature and each row a record. Genes in the given positive set become positive examples, and all other genes are assumed to be negative.
Definition at line 341 of file svm.cpp.
References Sleipnir::CGenes::GetGenome().
Referenced by Learn().
bool Sleipnir::CSVM::Learn | ( | const CPCL & | PCL, |
const CGenes & | GenesPositive, | ||
const CGenes & | GenesNegative | ||
) |
Learn an SVM recognizing the given genes' values in the given PCL.
PCL | PCL from which features are read. |
GenesPositive | Set of positive examples to be learned. |
GenesNegative | Set of negative examples to be learned. |
Learns an SVM model using the current settings, assuming each column of the given PCL is a feature and each row a record. Genes in the given positive set become positive examples, genes in the given negative set become negative examples, and all other rows are unlabeled.
Definition at line 366 of file svm.cpp.
References Sleipnir::CGenes::GetGenes(), and Learn().
bool Sleipnir::CSVM::Learn | ( | const char * | szData | ) | [inline] |
Learn an SVM using the given binary example file.
szData | File of examples from which SVM is learned. |
Learns an SVM from an example file roughly equivalent to that given to svm_learn, but in binary form. This can greatly speed the loading of large example files. The binary layout of the file is:
4-byte unsigned integer, number of features F 4-byte unsigned integer, number of examples E E times: 4-byte float, label of the example F times 4-byte float, values of the example's features 4-byte integer, number of characters in the example's comment C C times 1-byte character, example's user data comment
Definition at line 100 of file svm.h.
References Learn().
bool Sleipnir::CSVM::Learn | ( | const CPCLSet & | PCLs, |
const CDataPair & | Answers | ||
) | [inline] |
Learn an SVM using pairs of values from the given PCLs.
PCLs | PCLs from which features are read. |
Answers | Answer set indicating positive and negative examples. |
Learns an SVM using the given answer set. For each gene pair marked as positive (1) or negative (0) in the given answer file, a set of features is generated by pairing the vectors of values for the two genes from the given PCLs. For example, if the answer file indicates that genes A and B are related, A has values of [0, 1, 2] in the given PCL, and B has values [2, 4, 0], an SVM training example will be generated of the form:
+1 1:0 2:1 3:2 4:2 5:4 6:0
Multiple PCLs within the set are concatenated (e.g. if the PCL set contains two PCLs with two and one condition, respectively, A's values [0, 1] might come from the first PCL and [2] from the second; likewise for B's [2, 4] and [0]).
Definition at line 133 of file svm.h.
References Learn().
bool Sleipnir::CSVM::Learn | ( | const IDataset * | pData, |
const CDataPair & | Answers | ||
) | [inline] |
Learn an SVM using data from the given dataset.
pData | Dataset from which features are read. |
Answers | Answer set indicating positive and negative examples. |
Learns an SVM using the given answer set. For each gene pair marked as positive (1) or negative (0) in the given answer file, a set of features is generated using each non-hidden experiment in the given dataset. For example, if the answer file indicates that genes A and B are related, and the dataset contains three experiments with values of 2, 4, and 0 for the pair AB, an SVM training example will be generated of the form:
+1 1:2 2:4 3:0
Definition at line 164 of file svm.h.
References Learn().
bool Sleipnir::CSVM::Open | ( | std::istream & | istm | ) |
Open an SVM model file in the given stream.
istm | Stream from which model file is loaded. |
Definition at line 627 of file svm.cpp.
References Sleipnir::CMeta::c_szWS, and Sleipnir::CMeta::Tokenize().
bool Sleipnir::CSVM::OpenAlphas | ( | std::istream & | istm | ) |
Open an initial file of alphas to be used as a starting point during learning.
istm | Stream containing a file of alphas, one number per line, to be used to initialize learning. |
bool Sleipnir::CSVM::Save | ( | std::ostream & | ostm | ) | const |
void Sleipnir::CSVM::SetCache | ( | size_t | iMegabytes | ) | [inline] |
void Sleipnir::CSVM::SetDegree | ( | size_t | iDegree | ) | [inline] |
void Sleipnir::CSVM::SetGamma | ( | float | dGamma | ) | [inline] |
void Sleipnir::CSVM::SetIterations | ( | size_t | iIterations | ) | [inline] |
void Sleipnir::CSVM::SetKernel | ( | EKernel | eKernel | ) | [inline] |
void Sleipnir::CSVM::SetTradeoff | ( | float | dTradeoff | ) | [inline] |
void Sleipnir::CSVM::SetVerbosity | ( | size_t | iVerbosity | ) | [inline] |