Sleipnir
Public Types | Public Member Functions
Sleipnir::CSVM Class Reference

Provides an interface for learning and evaluating support vector machines using svm_perf. More...

#include <svm.h>

Inheritance diagram for Sleipnir::CSVM:
Sleipnir::CSVMImpl

Public Types

enum  EKernel { EKernelLinear = 0, EKernelPolynomial = EKernelLinear + 1, EKernelRBF = EKernelPolynomial + 1 }
 Type of kernel used by the SVM. More...

Public Member Functions

bool OpenAlphas (std::istream &istm)
 Open an initial file of alphas to be used as a starting point during learning.
bool Open (std::istream &istm)
 Open an SVM model file in the given stream.
bool Save (std::ostream &ostm) const
 Save an SVM model file to the given stream.
bool Learn (const CPCL &PCL, const CGenes &GenesPositive)
 Learn an SVM recognizing the given genes' values in the given PCL.
bool Learn (const CPCL &PCL, const CGenes &GenesPositive, const CGenes &GenesNegative)
 Learn an SVM recognizing the given genes' values in the given PCL.
bool Evaluate (const CPCL &PCL, std::vector< float > &vecdResults) const
 Evaluate the SVM's output for each row of the given PCL.
bool Learn (const char *szData)
 Learn an SVM using the given binary example file.
bool Learn (const CPCLSet &PCLs, const CDataPair &Answers)
 Learn an SVM using pairs of values from the given PCLs.
bool Learn (const IDataset *pData, const CDataPair &Answers)
 Learn an SVM using data from the given dataset.
bool Evaluate (const char *szFile, CDat &DatResults) const
 Evaluate an SVM using the given binary example file.
bool Evaluate (const CPCLSet &PCLs, CDat &DatResults) const
 Evaluate an SVM using pairs of values from the given PCLs.
bool Evaluate (const IDataset *pData, CDat &DatResults) const
 Evaluate an SVM using values from the given dataset.
bool Evaluate (const CPCLSet &PCLs, const CGenes &GenesInclude, CDat &DatResults) const
 Evaluate an SVM using pairs of values from the given PCLs.
bool Evaluate (const IDataset *pData, const CGenes &GenesInclude, CDat &DatResults) const
 Evaluate an SVM using values from the given dataset for the requested genes.
void SetIterations (size_t iIterations)
 Set the maximum number of iterations during SVM learning.
void SetCache (size_t iMegabytes)
 Set the cache size for SVM learning and evaluation.
void SetTradeoff (float dTradeoff)
 Set the error/margin tradeoff for SVM learning.
void SetGamma (float dGamma)
 Set the gamma parameter for learning an RBF kernel.
void SetDegree (size_t iDegree)
 Set the degree parameter for learning a polynomial kernel.
void SetKernel (EKernel eKernel)
 Set the kernel type parameter for SVM learning.
void SetVerbosity (size_t iVerbosity)
 Set the verbosity parameter for svm_perf.

Detailed Description

Provides an interface for learning and evaluating support vector machines using svm_perf.

The CSVM class provides a variety of methods for learning and evaluating SVMs based on biological data types. All SVM manipulation is done using the svm_perf library (http://svmlight.joachims.org/), but the interface between Sleipnir and svm_perf has been optimized to pass appropriate data types (datasets, PCLs, etc.) as efficiently as possible. Note that SVM learning requires the entire dataset to be in memory simultaneously, so subsampling large answer sets is often necessary for SVMs when it would not be for Bayesian learning. On the other hand, individual data points can be evaluated easily, so memory is only a potential issue during SVM learning.

Definition at line 43 of file svm.h.


Member Enumeration Documentation

Type of kernel used by the SVM.

Enumerator:
EKernelLinear 

Linear kernel.

EKernelPolynomial 

Polynomial kernel.

EKernelRBF 

Radial basis function kernel.

Definition at line 49 of file svm.h.


Member Function Documentation

bool Sleipnir::CSVM::Evaluate ( const CPCL PCL,
std::vector< float > &  vecdResults 
) const

Evaluate the SVM's output for each row of the given PCL.

Parameters:
PCLPCL from which features are read.
vecdResultsValues output by the SVM for each row of the given PCL.
Returns:
True if the evaluation completed successfully.

Evaluates the current SVM model, assuming each column of the given PCL is a feature and each row a record.

Definition at line 527 of file svm.cpp.

References Sleipnir::CPCL::GetGenes(), and Sleipnir::CPCL::IsMasked().

Referenced by Evaluate().

bool Sleipnir::CSVM::Evaluate ( const char *  szFile,
CDat DatResults 
) const [inline]

Evaluate an SVM using the given binary example file.

Parameters:
szFileFile of examples for which SVM is evaluated.
DatResultsCDat into which SVM predictions are placed.
Returns:
True if the evaluation completed successfully.

Evaluates the current SVM on an example file roughly equivalent to that given to svm_classify, but in binary form. This can greatly speed the loading of large example files. The binary layout of the file is:

 4-byte unsigned integer, number of features F
 4-byte unsigned integer, number of examples E
 E times:
   4-byte float, label of the example
   F times 4-byte float, values of the example's features
   4-byte integer, number of characters in the example's user data, MUST be 8
   4-byte integer, index of first gene in the pair corresponding to this example
   4-byte integer, index of second gene in the pair corresponding to this example
 4-byte unsigned integer, total number of genes

Example labels are ignored during evaluation. The output CDat is filled based on the total number of genes and the gene pair indices labeling each example.

Remarks:
Sorry the file format is so quirky; it was made to play nice with svm_perf. It's at least arranged so that it can be marginally compatible with the equivalent Learn function. Of course, feeding either method bad input will cause who-knows-what horrific consequences.

Definition at line 208 of file svm.h.

References Evaluate().

bool Sleipnir::CSVM::Evaluate ( const CPCLSet PCLs,
CDat DatResults 
) const [inline]

Evaluate an SVM using pairs of values from the given PCLs.

Parameters:
PCLsPCLs from which features are read.
DatResultsCDat into which SVM predictions are placed.
Returns:
True if the evaluation completed successfully.

Evaluates the current SVM using the given data. For each gene pair in the given PCLs, a set of features is generated by pairing the vectors of values for the two genes from the PCLs. For example, for two genes A and B, if A has values of [0, 1, 2] in the given PCL, and B has values [2, 4, 0], an SVM example will be generated of the form:

 1:0 2:1 3:2 4:2 5:4 6:0

Multiple PCLs within the set are concatenated (e.g. if the PCL set contains two PCLs with two and one condition, respectively, A's values [0, 1] might come from the first PCL and [2] from the second; likewise for B's [2, 4] and [0]).

Definition at line 240 of file svm.h.

References Evaluate().

bool Sleipnir::CSVM::Evaluate ( const IDataset pData,
CDat DatResults 
) const [inline]

Evaluate an SVM using values from the given dataset.

Parameters:
pDataDataset from which features are read.
DatResultsCDat into which SVM predictions are placed.
Returns:
True if the evaluation completed successfully.

Evaluates the current SVM using the given data. For each gene pair, a set of features is generated using each non-hidden experiment in the given dataset. For example, if the dataset contains three experiments with values of 2, 4, and 0 for the pair AB, an SVM example will be generated of the form:

 1:2 2:4 3:0

Definition at line 268 of file svm.h.

References Evaluate().

bool Sleipnir::CSVM::Evaluate ( const CPCLSet PCLs,
const CGenes GenesInclude,
CDat DatResults 
) const [inline]

Evaluate an SVM using pairs of values from the given PCLs.

Parameters:
PCLsPCLs from which features are read.
GenesIncludeGenes to be evaluated.
DatResultsCDat into which SVM predictions are placed.
Returns:
True if the evaluation completed successfully.

Evaluates the given PCLs as per other Evaluate methods, but only over pairs for which both genes are in the given gene set.

Remarks:
The output size (number of genes) of DatResults will be the same as the input size of GenesInclude.

Definition at line 298 of file svm.h.

References Evaluate().

bool Sleipnir::CSVM::Evaluate ( const IDataset pData,
const CGenes GenesInclude,
CDat DatResults 
) const [inline]

Evaluate an SVM using values from the given dataset for the requested genes.

Parameters:
pDataDataset from which features are read.
GenesIncludeGenes to be evaluated.
DatResultsCDat into which SVM predictions are placed.
Returns:
True if the evaluation completed successfully.

Evaluates the given dataset as per other Evaluate methods, but only over pairs for which both genes are in the given gene set.

Remarks:
The output size (number of genes) of DatResults will be the same as the input size of GenesInclude.

Definition at line 328 of file svm.h.

References Evaluate().

bool Sleipnir::CSVM::Learn ( const CPCL PCL,
const CGenes GenesPositive 
)

Learn an SVM recognizing the given genes' values in the given PCL.

Parameters:
PCLPCL from which features are read.
GenesPositiveSet of positive examples to be learned.
Returns:
True if the model was learned successfully.

Learns an SVM model using the current settings, assuming each column of the given PCL is a feature and each row a record. Genes in the given positive set become positive examples, and all other genes are assumed to be negative.

Definition at line 341 of file svm.cpp.

References Sleipnir::CGenes::GetGenome().

Referenced by Learn().

bool Sleipnir::CSVM::Learn ( const CPCL PCL,
const CGenes GenesPositive,
const CGenes GenesNegative 
)

Learn an SVM recognizing the given genes' values in the given PCL.

Parameters:
PCLPCL from which features are read.
GenesPositiveSet of positive examples to be learned.
GenesNegativeSet of negative examples to be learned.
Returns:
True if the model was learned successfully.

Learns an SVM model using the current settings, assuming each column of the given PCL is a feature and each row a record. Genes in the given positive set become positive examples, genes in the given negative set become negative examples, and all other rows are unlabeled.

Definition at line 366 of file svm.cpp.

References Sleipnir::CGenes::GetGenes(), and Learn().

bool Sleipnir::CSVM::Learn ( const char *  szData) [inline]

Learn an SVM using the given binary example file.

Parameters:
szDataFile of examples from which SVM is learned.
Returns:
True if the SVM was learned successfully.

Learns an SVM from an example file roughly equivalent to that given to svm_learn, but in binary form. This can greatly speed the loading of large example files. The binary layout of the file is:

 4-byte unsigned integer, number of features F
 4-byte unsigned integer, number of examples E
 E times:
   4-byte float, label of the example
   F times 4-byte float, values of the example's features
   4-byte integer, number of characters in the example's comment C
   C times 1-byte character, example's user data comment
See also:
Evaluate

Definition at line 100 of file svm.h.

References Learn().

bool Sleipnir::CSVM::Learn ( const CPCLSet PCLs,
const CDataPair Answers 
) [inline]

Learn an SVM using pairs of values from the given PCLs.

Parameters:
PCLsPCLs from which features are read.
AnswersAnswer set indicating positive and negative examples.
Returns:
True if the model was learned successfully.

Learns an SVM using the given answer set. For each gene pair marked as positive (1) or negative (0) in the given answer file, a set of features is generated by pairing the vectors of values for the two genes from the given PCLs. For example, if the answer file indicates that genes A and B are related, A has values of [0, 1, 2] in the given PCL, and B has values [2, 4, 0], an SVM training example will be generated of the form:

 +1 1:0 2:1 3:2 4:2 5:4 6:0

Multiple PCLs within the set are concatenated (e.g. if the PCL set contains two PCLs with two and one condition, respectively, A's values [0, 1] might come from the first PCL and [2] from the second; likewise for B's [2, 4] and [0]).

Definition at line 133 of file svm.h.

References Learn().

bool Sleipnir::CSVM::Learn ( const IDataset pData,
const CDataPair Answers 
) [inline]

Learn an SVM using data from the given dataset.

Parameters:
pDataDataset from which features are read.
AnswersAnswer set indicating positive and negative examples.
Returns:
True if the model was learned successfully.

Learns an SVM using the given answer set. For each gene pair marked as positive (1) or negative (0) in the given answer file, a set of features is generated using each non-hidden experiment in the given dataset. For example, if the answer file indicates that genes A and B are related, and the dataset contains three experiments with values of 2, 4, and 0 for the pair AB, an SVM training example will be generated of the form:

 +1 1:2 2:4 3:0

Definition at line 164 of file svm.h.

References Learn().

bool Sleipnir::CSVM::Open ( std::istream &  istm)

Open an SVM model file in the given stream.

Parameters:
istmStream from which model file is loaded.
Returns:
True if the model file was loaded successfully.
Remarks:
Equivalent to the model file generated by svm_learn or input to svm_classify.
See also:
Save

Definition at line 627 of file svm.cpp.

References Sleipnir::CMeta::c_szWS, and Sleipnir::CMeta::Tokenize().

bool Sleipnir::CSVM::OpenAlphas ( std::istream &  istm)

Open an initial file of alphas to be used as a starting point during learning.

Parameters:
istmStream containing a file of alphas, one number per line, to be used to initialize learning.
Returns:
True if the file was opened successfully.
Remarks:
Equivalent to the svm_learn -y parameter.

Definition at line 249 of file svm.cpp.

bool Sleipnir::CSVM::Save ( std::ostream &  ostm) const

Save an SVM model file to the given stream.

Parameters:
ostmStream to which model file is saved.
Returns:
True if the model file was saved successfully.
Remarks:
Equivalent to the model file generated by svm_learn or input to svm_classify.
See also:
Open

Definition at line 441 of file svm.cpp.

void Sleipnir::CSVM::SetCache ( size_t  iMegabytes) [inline]

Set the cache size for SVM learning and evaluation.

Parameters:
iMegabytesCache size in megabytes for SVM learning/evaluation.
Remarks:
Equivalent to the -m argument to svm_learn

Definition at line 360 of file svm.h.

void Sleipnir::CSVM::SetDegree ( size_t  iDegree) [inline]

Set the degree parameter for learning a polynomial kernel.

Parameters:
iDegreeDegree parameter for polynomial kernel learning.
Remarks:
Equivalent to the -d parameter to svm_learn.

Definition at line 402 of file svm.h.

void Sleipnir::CSVM::SetGamma ( float  dGamma) [inline]

Set the gamma parameter for learning an RBF kernel.

Parameters:
dGammaGamma parameter for RBF kernel learning.
Remarks:
Equivalent to the -g parameter to svm_learn.

Definition at line 388 of file svm.h.

void Sleipnir::CSVM::SetIterations ( size_t  iIterations) [inline]

Set the maximum number of iterations during SVM learning.

Parameters:
iIterationsMaximum number of iterations during SVM learning.
Remarks:
Equivalent to the -# argument to svm_learn

Definition at line 346 of file svm.h.

void Sleipnir::CSVM::SetKernel ( EKernel  eKernel) [inline]

Set the kernel type parameter for SVM learning.

Parameters:
eKernelKernel type to use for SVM learning.
Remarks:
Equivalent to the -t parameter to svm_learn.

Definition at line 416 of file svm.h.

void Sleipnir::CSVM::SetTradeoff ( float  dTradeoff) [inline]

Set the error/margin tradeoff for SVM learning.

Parameters:
dTradeoffError/margin tradeoff parameter C for SVM learning.
Remarks:
Equivalent to the -c argument to svm_learn.

Definition at line 374 of file svm.h.

void Sleipnir::CSVM::SetVerbosity ( size_t  iVerbosity) [inline]

Set the verbosity parameter for svm_perf.

Parameters:
iVerbosityVerbosity of messages from svm_perf.
Remarks:
Equivalent to the -v parameter to svm_learn/svm_classify.

Definition at line 430 of file svm.h.


The documentation for this class was generated from the following files: