Sleipnir
Public Member Functions
Sleipnir::CBayesNetMinimal Class Reference

Implements a heavily optimized discrete naive Bayesian classifier. More...

#include <bayesnet.h>

Inheritance diagram for Sleipnir::CBayesNetMinimal:
Sleipnir::CBayesNetMinimalImpl Sleipnir::CBayesNetImpl

Public Member Functions

bool Open (const CBayesNetSmile &BNSmile)
 Construct a new minimal Bayes net from the given SMILE-based network.
bool Open (std::istream &istm)
 Load a minimal Bayes net from the given binary stream.
bool OpenCounts (const char *szFileCounts, const std::map< std::string, size_t > &mapstriNodes, const std::vector< unsigned char > &vecbDefaults, const std::vector< float > &vecdAlphas, float dPseudocounts=HUGE_VAL, const CBayesNetMinimal *pBNDefault=NULL)
 Constructs a naive Bayesian classifier using count data for each network node.
void Save (std::ostream &ostm) const
 Save a minimal Bayes net to the given binary stream.
float Evaluate (const std::vector< unsigned char > &vecbDatum, size_t iOffset=0) const
 Perform Bayesian inference to obtain the class probability given evidence for some number of nodes.
bool Evaluate (const std::vector< unsigned char > &vecbData, float *adResults, size_t iGenes, size_t iStart=0) const
 Repeatedly perform Bayesian inference to obtain the class probability given evidence for some number of nodes.
float Regularize (std::vector< float > &vecdAlphas) const
const CDataMatrixGetCPT (size_t iNode) const
 Return the conditional probability table matrix for the indicated node.
size_t GetNodes () const
 Return the total number of nodes in the Bayes net.
void SetID (const std::string &strID)
 Sets the string identifier of the network.
const std::string & GetID () const
 Returns the string identifier of the network.
const unsigned char GetDefault (size_t iNode) const
 Returns the default value (if no input is provided) for the requested node.

Detailed Description

Implements a heavily optimized discrete naive Bayesian classifier.

CBayesNetMinimal provides a custom implementation of a discrete naive Bayesian classifier heavily optimized for rapid inference. The intended use is to learn an appropriate network and parameters offline using one of the more complex Bayes net implementations. The resulting network can then be converted to a minimal form and used for online (realtime) inference. A minimal Bayes net always consists of one output (class) node and zero or more data nodes, all discrete and taking one or more different values.

Definition at line 291 of file bayesnet.h.


Member Function Documentation

float Sleipnir::CBayesNetMinimal::Evaluate ( const std::vector< unsigned char > &  vecbDatum,
size_t  iOffset = 0 
) const

Perform Bayesian inference to obtain the class probability given evidence for some number of nodes.

Parameters:
vecbDatumValues for each evidence node; 0xF indicates missing data (no evidence) for a particular node. Note that each evidence value is stored in four bits, not a full byte.
iOffsetPosition of the first piece of evidence within vecbDatum; zero by default. This can be used to store multiple data in a single vector and rapidly perform inference for each subsequent data setting.
Returns:
Posterior probability of the largest value of the class node given the evidence (generally the probability of functional relationship).
Remarks:
Evidence is stored in nibbles, not full bytes, so for a network containing N evidence (non-root) nodes, vecbDatum must be of size at least iOffset + ceil(N/2).
See also:
IBayesNet::Evaluate

Definition at line 829 of file bayesnetfn.cpp.

References Sleipnir::CFullMatrix< tType >::Get(), Sleipnir::CFullMatrix< tType >::GetColumns(), Sleipnir::CMeta::GetNaN(), and Sleipnir::CFullMatrix< tType >::GetRows().

Referenced by Evaluate().

bool Sleipnir::CBayesNetMinimal::Evaluate ( const std::vector< unsigned char > &  vecbData,
float *  adResults,
size_t  iGenes,
size_t  iStart = 0 
) const

Repeatedly perform Bayesian inference to obtain the class probability given evidence for some number of nodes.

Parameters:
vecbDataValues for each evidence node; 0xF indicates missing data (no evidence) for a particular node. Note that each evidence value is stored in four bits, not a full byte. Multiple sets of evidence can be included in vecbData, e.g. for N nodes, entries 0 through floor(N/2) comprise one set of evidence, floor(N/2)+1 through N the next, and so forth.
adResultsArray into which posterior probabilities of the largest value of the class node are inserted given the evidence (generally probabilities of functional relationships).
iGenesNumber of inferences to perform and probabilities to generate.
iStartFirst gene to process; this means that the first output probability is placed into the iStart element of adResults, and the first element read from vecbDatum is at iStart * ceil(N/2).
Returns:
True if evaluation was successful.

Perform Bayesian inference iGenes - iStart times using evidence from vecbData, which consists of zero or more sets of evidence values for the N non-root nodes in the Bayes net. In pseudocode:

 for( i = iStart; i < iGenes; ++i )
   adValues[ i ] = Evaluate( vecbData, i * floor((N+1)/2) );
Remarks:
Evidence is stored in nibbles, not full bytes, so for a network containing N evidence (non-root) nodes, vecbData must be of size at least iGenes * ceil(N/2).
See also:
IBayesNet::Evaluate

Definition at line 898 of file bayesnetfn.cpp.

References Evaluate().

const CDataMatrix& Sleipnir::CBayesNetMinimal::GetCPT ( size_t  iNode) const [inline]

Return the conditional probability table matrix for the indicated node.

Parameters:
iNodeIndex of node whose CPT is returned (zero-based).
Returns:
CPT of requested node.
Remarks:
Classes are row-oriented, data values are column-oriented. The root node is index zero, remaining data nodes begin at index one.

Definition at line 320 of file bayesnet.h.

Referenced by Sleipnir::CBayesNetSmile::Open().

const unsigned char Sleipnir::CBayesNetMinimal::GetDefault ( size_t  iNode) const [inline]

Returns the default value (if no input is provided) for the requested node.

Parameters:
iNodeNode for which default value is returned.
Returns:
Default value for the requested node (-1 if none).
Remarks:
Requested node index must be less than the number of nodes (beginning with the root node at index 0).

Definition at line 379 of file bayesnet.h.

Referenced by Sleipnir::CBayesNetSmile::Open().

const std::string& Sleipnir::CBayesNetMinimal::GetID ( ) const [inline]

Returns the string identifier of the network.

Returns:
String identifier for the network.
Remarks:
ID is not used internally and is purely for human convenience.

Definition at line 362 of file bayesnet.h.

size_t Sleipnir::CBayesNetMinimal::GetNodes ( ) const [inline]

Return the total number of nodes in the Bayes net.

Returns:
Number of nodes in the Bayes net (including root and data nodes).
Remarks:
Includes root/class node an non-root/data nodes.

Definition at line 334 of file bayesnet.h.

Referenced by Sleipnir::CBayesNetSmile::Open().

Construct a new minimal Bayes net from the given SMILE-based network.

Parameters:
BNSmileSMILE-based network from which to copy node parameters; must have naive structure.
Returns:
True if Bayes net was successfully constructed.
Remarks:
BNSmile must have only discrete nodes and naive structure.

Definition at line 719 of file bayesnetfn.cpp.

References Sleipnir::CBayesNetSmile::GetCPT(), Sleipnir::CBayesNetSmile::GetDefault(), Sleipnir::CBayesNetSmile::GetNodes(), and Sleipnir::CFullMatrix< tType >::GetRows().

bool Sleipnir::CBayesNetMinimal::Open ( std::istream &  istm)

Load a minimal Bayes net from the given binary stream.

Parameters:
istmStream from which Bayes net is loaded.
Returns:
True if Bayes net was successfully loaded.
Remarks:
istm must be binary and contain a minimal Bayes net stored by CBayesNetMinimal::Save.

Definition at line 756 of file bayesnetfn.cpp.

References Sleipnir::CFullMatrix< tType >::GetRows(), and Sleipnir::CFullMatrix< tType >::Open().

bool Sleipnir::CBayesNetMinimal::OpenCounts ( const char *  szFileCounts,
const std::map< std::string, size_t > &  mapstriNodes,
const std::vector< unsigned char > &  vecbDefaults,
const std::vector< float > &  vecdAlphas,
float  dPseudocounts = HUGE_VAL,
const CBayesNetMinimal pBNDefault = NULL 
)

Constructs a naive Bayesian classifier using count data for each network node.

Parameters:
szFileCountsText file containing counts from which CPTs are derived.
mapstriNodesMapping of node identifiers in counts file to integer indices (zero-based).
vecbDefaultsIf non-empty, vector of default values for each node if data is missing (-1 for none).
vecdAlphasIf non-empty, vector of prior beliefs alpha for each node.
dPseudocountsIf not equal to NaN, effective sample size to use for all nodes.
pBNDefaultIf non-null, Bayes net to use for default values when a distribution's counts are too sparse to use accurately.
Returns:
True if Bayes net was successfully constructed.

Creates a naive Bayesian classifier by estimating maximum likelihood parameter values from counts for each node's data values. These counts should be given in a text file where each set of counts is tab delimited in the form:

 network_name   number_of_nodes
 class  prior   counts
 node_name_1
 node   1   counts  for class   0
 node   1   counts  for class   1
 node_name_2
 ...

For example, suppose we are constructing a network with two output classes and three datasets, which can take two, five, or two distinct values, respectively. Valid count data might resemble:

 my_network_name    3
 90 10
 dataset_name_1
 80 20
 1  9
 dataset_name_2
 30 40  60  40  30
 5  10  20  30  35
 dataset_name_3
 15 19
 30 36

These would generate prior probabilities of 0.9 and 0.1 for the two classes, for example; CPTs for each node would similarly be calculated by dividing each set of counts by their sum. If default values are provided, they will be recorded and used during inference if there is no data available for the appropriate nodes. If a fallback network is provided, probability distributions with too few counts to estimate accurately will be replaced with fallback values.

The parameters can be regularized by providing prior belief weights alpha and an effective sample size (pseudocounts). If given, CPT parameters will be calculated as if there were the requested pseudocount number of data points and a uniform prior for each node with relative weight alpha. For example, if dataset_name_1 in the example above was given a pseudocount of 5 and an alpha of 6, the CPT parameters would be calculated as:

 P(0|0) = (5 * 80 / (80+20) + 6 * 1 / 2) / (5 + 6) = 0.636
 P(1|0) = (5 * 20 / (80+20) + 6 * 1 / 2) / (5 + 6) = 0.363
 P(0|1) = (5 * 1 / (1 + 9) + 6 * 1 /2) / (5 + 6) = 0.318
 P(1|1) = (5 * 9 / (1 + 9) + 6 * 1 /2) / (5 + 6) = 0.682

Regularization "smooths" the parameters towards a uniform prior belief with strength alpha relative to the effective sample size (pseudocounts), so these probabilities are closer to 0.5 than they would be otherwise.

Remarks:
If non-empty, vecbDefaults and vecdAlphas must be of the same length as the number of classifier nodes (including the root node), which must also agree with the maximum node index in mapstriNodes.

Definition at line 991 of file bayesnetfn.cpp.

References Sleipnir::CFullMatrix< tType >::GetRows(), Sleipnir::CFullMatrix< tType >::Initialize(), Sleipnir::CFullMatrix< tType >::Set(), and Sleipnir::CMeta::Tokenize().

void Sleipnir::CBayesNetMinimal::Save ( std::ostream &  ostm) const

Save a minimal Bayes net to the given binary stream.

Parameters:
ostmStream to which Bayes net is saved.
See also:
CBayesNetMinimal::Open

Definition at line 790 of file bayesnetfn.cpp.

References Sleipnir::CFullMatrix< tType >::Save().

void Sleipnir::CBayesNetMinimal::SetID ( const std::string &  strID) [inline]

Sets the string identifier of the network.

Parameters:
strIDString identifier for the network.
Remarks:
ID is not used internally and is purely for human convenience.

Definition at line 348 of file bayesnet.h.


The documentation for this class was generated from the following files: