Implements IBayesNet for networks using the SMILE library from the U. Pittsburgh Decision Systems Lab. More...

#include <bayesnet.h>

Inheritance diagram for Sleipnir::CBayesNetSmile:

Public Member Functions
	CBayesNetSmile (bool fGroup=true)
	Construct a new SMILE-based Bayes net.
bool	Open (const std::vector< std::string > &vecstrFiles, size_t iValues)
	Construct a new SMILE-based naive Bayes net with nodes corresponding to the given datasets.
bool	Open (const IDataset *pDataset, const std::vector< std::string > &vecstrNames, const std::vector< size_t > &veciDefaults)
	Construct a new SMILE-based naive Bayes net with nodes corresponding to the given datasets.
bool	Open (const CBayesNetSmile &BNPrior, const std::vector< CBayesNetSmile * > &vecpBNs)
	Construct a new SMILE-based naive Bayes net by merging the given class and data nodes.
bool	Open (const CBayesNetMinimal &BNMinimal, const std::vector< std::string > &vecstrNames)
	Creates a SMILE Bayes net equivalent to the given minimal naive Bayesian classifier.
float	Evaluate (size_t iNode, unsigned char bValue) const
	Evaluate the output of a Bayesian classifier given only a single node's evidence value.
unsigned char	GetDefault (size_t iNode) const
	Returns the default value (if any) for the requested node.
void	SetDefault (const CBayesNetSmile &Defaults)
	Provide a Bayes net of identical structure from which default parameter values can be obtained.
bool	Learn (const IDataset *pDataset, size_t iIterations, bool fZero=false, bool fELR=false)
	Learn conditional probabilities from data using Expectation Maximization, naive Bayesian learning, or Extended Logistic Regression.
bool	Evaluate (const std::vector< unsigned char > &vecbDatum, std::vector< float > &vecdResults, bool fZero=false, size_t iNode=0, bool fIgnoreMissing=false) const
	Perform Bayesian inference to obtain probabilities given values for each other Bayes net node.
bool	Evaluate (const CPCLPair &PCLData, CPCL &PCLResults, bool fZero=false, int iAlgorithm=DSL_ALG_BN_LAURITZEN) const
	Perform Bayesian inference to obtain probabilities over all nodes in the network given some amount of data.
void	GetNodes (std::vector< std::string > &vecstrNodes) const
	Retrieve the string IDs of all nodes in the Bayes net.
void	Randomize ()
	Randomizes every parameter in the Bayes net.
void	Randomize (size_t iNode)
	Randomizes every parameter the requested node.
void	Reverse (size_t iNode)
	Reverses the parameters of the requested node over its possible values.
bool	Open (const char *szFile)
	Load a Bayes net from a file.
bool	Save (const char *szFile) const
	Save a Bayes net to a file.
bool	GetCPT (size_t iNode, CDataMatrix &MatCPT) const
	Retrieves the parameters of the requested Bayes net node.
unsigned char	GetValues (size_t iNode) const
	Returns the number of different values taken by the requested node.
bool	IsContinuous (size_t iNode) const
	Returns true if the requested node is non-discrete (e.g. Gaussian, etc.)
bool	IsContinuous () const
bool	Evaluate (const IDataset *pDataset, std::vector< std::vector< float > > &vecvecdResults, bool fZero) const
	Perform Bayesian inference to obtain probabilities for each element of a dataset.
bool	Evaluate (const IDataset *pDataset, CDat &DatResults, bool fZero) const
	Perform Bayesian inference to obtain probabilities for each element of a dataset.

Detailed Description

Implements IBayesNet for networks using the SMILE library from the U. Pittsburgh Decision Systems Lab.

CBayesNetSmile loads and saves Bayes nets from DSL/XDSL files and performs Bayesian inference using the SMILE library from the University of Pittsburgh Decision Systems Laboratory. While SMILE is used for internal representation of the Bayes net and for inference in many cases, Sleipnir implements several optimizations. Networks detected to have naive structures are learned and evaluated using more efficient maximum likelihood methods, and Sleipnir implements its own EM and ELR parameter learning algorithms. Naive SMILE-based Bayes nets can be converted to extremely efficient CBayesNetMinimal objects, and if the PNL library is present, they can also be converted to CBayesNetPNL representations.

Remarks:: Should minimally support any network type allowed by SMILE; only tested using discrete networks with hierarchical structure. Default values for individual nodes are stored in the "zero" property of the SMILE network (can be visualized in the "User Properties" pane in GeNIe); for example, to provide a default value of 2 for some node, give it a user property named "zero" with value "2".

Definition at line 53 of file bayesnet.h.

Constructor & Destructor Documentation

Sleipnir::CBayesNetSmile::CBayesNetSmile ( bool fGroup = true )

Construct a new SMILE-based Bayes net.

Parameters:

fGroup If true, group identical learning/evaluation examples together into a single heavily weighted example.

Remarks:: There's essentially never a reason to set fGroup to false.

Definition at line 98 of file bayesnetsmile.cpp.

Member Function Documentation

float Sleipnir::CBayesNetSmile::Evaluate	(	size_t	iNode,
		unsigned char	bValue
	)		const

Evaluate the output of a Bayesian classifier given only a single node's evidence value.

Parameters:

iNode	Node for which evidence is set.
bValue	Value of evidence to set.

Returns:: Posterior probabillity of classifier node's first value given the data.

Evaluates the posterior probability of the Bayesian network's first node (i.e. the class node) given only a single piece of evidence. This can be used to calculate the impact of a single dataset on predicted probabilities, for example.

Remarks:: Unlike other evaluation methods, ignores default values for all nodes.

Definition at line 453 of file bayesnetsmile.cpp.

References Sleipnir::CMeta::GetNaN(), and IsContinuous().

Referenced by Evaluate().

bool Sleipnir::CBayesNetSmile::Evaluate	(	const std::vector< unsigned char > &	vecbDatum,
		std::vector< float > &	vecdResults,
		bool	fZero = `false`,
		size_t	iNode = `0`,
		bool	fIgnoreMissing = `false`
	)		const `[virtual]`

Perform Bayesian inference to obtain probabilities given values for each other Bayes net node.

Parameters:

vecbDatum	One-indexed values for each node in the Bayes net (zero indicates missing data).
vecdResults	Inferred probabilities for each possible value of the requested node.
fZero	If true, assume all missing values are zero (i.e. the first bin).
iNode	The node for which output probabilities are inferred.
fIgnoreMissing	If true, do not default missing values to zero or any other value.

Returns:: True if evaluation was successful.

This Evaluate assumes a discrete Bayes net and, given a vector of evidence values for each node, infers the probability distribution over possible values of the requested node. Note that vecbDatum contains one plus the discrete bin value of each node, and a value of zero indicates missing data for the corresponding node.

Remarks:: vecbDatum should contain one plus the discrete bin value of each node, and a value of zero indicates missing data for the corresponding node. If the requested output node can take N values, the output vector will contain only the first N-1 probabilities, since the Nth can be calculated to sum to one.

Implements Sleipnir::IBayesNet.

Definition at line 413 of file bayesnetsmile.cpp.

References IsContinuous().

bool Sleipnir::CBayesNetSmile::Evaluate	(	const CPCLPair &	PCLData,
		CPCL &	PCLResults,
		bool	fZero = `false`,
		int	iAlgorithm = `DSL_ALG_BN_LAURITZEN`
	)		const `[virtual]`

Perform Bayesian inference to obtain probabilities over all nodes in the network given some amount of data.

Parameters:

PCLData	Input data; each column (experiment) is mapped by label to a node in the Bayes net, and PCL entries correspond to observed (or missing) data values.
PCLResults	Output probabilities; each column (experiment) is mapped to a node:value pair from the Bayes net, and PCL entries correspond to the probability of that value in that node.
fZero	If true, assume all missing values are zero (i.e. the first bin).
iAlgorithm	Implementation-specific ID of the Bayesian inference algorithm to use.

Returns:: True if evaluation was successful.

This version of Evaluate will perform one Bayesian inference for each row (gene) of the given PCLData. Here, each PCL "experiment" column corresponds to a node in the Bayes net as identified by the experiment labels in the PCL and the IDs of the Bayes net nodes. Values are read from the given PCL and (if present; missing values are allowed) discretized into Bayes net value bins using the accompanying quantization information. For each input row, all given non-missing values are observed for the appropriate Bayes net nodes, and Bayesian inference is used to provide probabilities for each remaining, unobserved node value.

Remarks:: PCLResults must be initialized with the correct number of experimental columns before calling Evaluate; that is, the total number of node values in the Bayes net. For example, if the Bayes net has three nodes A, B, and C, node A can take two values 0 and 1, and nodes B and C can take values 0, 1, and 2, then PCLResults must have 8 experimental columns corresponding to A:0, A:1, B:0, B:1, B:2, C:0, C:1, and C:2. Columns of PCLData are mapped to Bayes net nodes by experiment and node labels; experiment labels not corresponding to any Bayes net node ID are ignored, and Bayes net nodes with no corresponding experiment are assumed to be unobserved (hidden). Only the genes in PCLResults are used, and they need not be in the same order as in PCLData.

Implements Sleipnir::IBayesNet.

Definition at line 648 of file bayesnetsmile.cpp.

References Sleipnir::CPCL::Get(), Sleipnir::CPCL::GetExperiment(), Sleipnir::CPCL::GetExperiments(), Sleipnir::CPCL::GetGene(), Sleipnir::CPCL::GetGenes(), GetValues(), IsContinuous(), and Sleipnir::CPCL::Set().

bool Sleipnir::CBayesNetSmile::Evaluate	(	const IDataset *	pDataset,
		std::vector< std::vector< float > > &	vecvecdResults,
		bool	fZero
	)		const `[inline, virtual]`

Perform Bayesian inference to obtain probabilities for each element of a dataset.

Parameters:

pDataset	Dataset to be used as input for inference.
vecvecdResults	Vector of output probabilities; each element of the outer vector represents the result for one gene pair, and each element of the inner vectors represents the probability for one possible value from the output node (i.e. the answer).
fZero	If true, assume all missing values are zero (i.e. the first bin).

Returns:: True if evaluation was successful.

The inverse of the corresponding IBayesNet::Learn method; given an IDataset, ignore the first (gold standard) dataset and infer the corresponding output probabilities for each other gene pair for which data is available. For each gene pair within the IDataset for which IDataset::IsExample is true, vecvecdResults will contain one vector. This vector will contain inferred probabilities for each possible value of the output node, generally the probability of functional unrelatedness (i.e. one minus the probability of functional relationship).

Remarks:: The order of datasets in the given IDataset must correspond to the order of nodes within the Bayes network, and the first dataset (index 0) is assumed to be a gold standard (and is thus ignored). Only data for which IDataset::IsExample is true will be used, which usually means that at least one other dataset must have a value. If the output node can take N values, each output vector will contain only the first N-1 probabilities, since the Nth can be calculated to sum to one.

Implements Sleipnir::IBayesNet.

Definition at line 120 of file bayesnet.h.

References Evaluate().

bool Sleipnir::CBayesNetSmile::Evaluate	(	const IDataset *	pDataset,
		CDat &	DatResults,
		bool	fZero
	)		const `[inline, virtual]`

Perform Bayesian inference to obtain probabilities for each element of a dataset.

Parameters:

pDataset	Dataset to be used as input for inference.
DatResults	Description of parameter DatResults.
fZero	If true, assume all missing values are zero (i.e. the first bin).

Returns:: True if evaluation was successful.

The inverse of the corresponding IBayesNet::Learn method; given an IDataset, ignore the first (gold standard) dataset and infer the corresponding output probability for each other gene pair for which data is available. For each gene pair within the IDataset for which IDataset::IsExample is true, the probability of functional relationship (i.e. the largest possible value of the output node) will be placed in the given CDat.

Remarks:: The order of datasets in the given IDataset must correspond to the order of nodes within the Bayes network, and the first dataset (index 0) is assumed to be a gold standard (and is thus ignored). Only data for which IDataset::IsExample is true will be used, which usually means that at least one other dataset must have a value.

Implements Sleipnir::IBayesNet.

Definition at line 125 of file bayesnet.h.

References Evaluate().

bool Sleipnir::CBayesNetSmile::GetCPT	(	size_t	iNode,
		CDataMatrix &	MatCPT
	)		const `[inline, virtual]`

Retrieves the parameters of the requested Bayes net node.

Parameters:

iNode	Index of node for which parameters should be retrieved.
MatCPT	Parameters of the requested node in tabular form; the columns of the matrix represent parental values, the rows node values.

Returns:: True if parameter retrieval succeeded, false if it failed or the requested node has more than one parent.

Retrieves node parameters in an implementation-specific manner, often only allowing nodes with at most one parent. For discrete nodes, matrix entries are generally conditional probabilities. For continuous nodes, matrix entries may represent distribution parameters such as Gaussian mean and standard deviation.

Remarks:: Only allowed for nodes with at most one parent; nodes with more parents are supported by some implementations, but their parameters can't be retrieved by this function.

Implements Sleipnir::IBayesNet.

Definition at line 103 of file bayesnet.h.

Referenced by Sleipnir::CBayesNetMinimal::Open().

unsigned char Sleipnir::CBayesNetSmile::GetDefault ( size_t iNode ) const

Returns the default value (if any) for the requested node.

Parameters:

iNode Node whose default value should be retrieved.

Returns:: Default value of the requested node, or -1 if none exists.

Definition at line 483 of file bayesnetsmile.cpp.

Referenced by Sleipnir::CBayesNetMinimal::Open().

void Sleipnir::CBayesNetSmile::GetNodes ( std::vector< std::string > & vecstrNodes ) const [virtual]

Retrieve the string IDs of all nodes in the Bayes net.

Parameters:

vecstrNodes Output containing the IDs of all nodes in the Bayes net.

Implements Sleipnir::IBayesNet.

Definition at line 334 of file bayesnetsmile.cpp.

Referenced by Sleipnir::CBayesNetMinimal::Open().

unsigned char Sleipnir::CBayesNetSmile::GetValues ( size_t iNode ) const [inline, virtual]

Returns the number of different values taken by the requested node.

Parameters:

iNode Bayes net node for which values should be returned.

Returns:: Number of different values taken by the requested node.

Remarks:: Not applicable for continuous nodes.

Implements Sleipnir::IBayesNet.

Definition at line 107 of file bayesnet.h.

Referenced by Evaluate().

bool Sleipnir::CBayesNetSmile::IsContinuous ( size_t iNode ) const [inline, virtual]

Returns true if the requested node is non-discrete (e.g. Gaussian, etc.)

Parameters:

iNode Node to be inspected.

Returns:: True if the requested node is continuous.

Implements Sleipnir::IBayesNet.

Definition at line 111 of file bayesnet.h.

References IsContinuous().

Referenced by Evaluate(), and IsContinuous().

bool Sleipnir::CBayesNetSmile::Learn	(	const IDataset *	pDataset,
		size_t	iIterations,
		bool	fZero = `false`,
		bool	fELR = `false`
	)		`[virtual]`

Learn conditional probabilities from data using Expectation Maximization, naive Bayesian learning, or Extended Logistic Regression.

Parameters:

pDataset	Dataset to be used for learning.
iIterations	Maximum number of iterations for EM or ELR.
fZero	If true, assume all missing values are zero (i.e. the first bin).
fELR	If true, use ELR to learn network parameters.

Returns:: True if parameters were learned successfully.

Using the given IDataset, learn parameters for the underlying Bayes network. If requested, learning is performed discriminatively using Extended Logistic Regression due to Greiner, Zhou, et al. Otherwise, maximum likelihood estimates are used for naive structures, and Expectation Maximization is used for other network structures.

Remarks:: The order of datasets in the given IDataset must correspond to the order of nodes within the Bayes network, and the first dataset (index 0) is assumed to be a gold standard. Only data for which IDataset::IsExample is true will be used, which usually means that the first dataset and at least one other dataset must have a value.

Implements Sleipnir::IBayesNet.

Definition at line 267 of file bayesnetsmile.cpp.

bool Sleipnir::CBayesNetSmile::Open	(	const std::vector< std::string > &	vecstrFiles,
		size_t	iValues
	)

Construct a new SMILE-based naive Bayes net with nodes corresponding to the given datasets.

Parameters:

vecstrFiles	Filenames of datasets, one per node.
iValues	Number of values into which each dataset will be quantized.

Returns:: True if Bayes net was successfully constructed.

This version of Open can be used to quickly construct a uniform, naive Bayes net corresponding to a particular set of data. These data files are usually PCLs or DATs containing microarray data, since large numbers of microarray datasets can be processed in this manner. In addition to one node per given file, one additional class node will be created at the top of the naive model with two possible values (generally corresponding to functional unrelatedness or relatedness).

See also:: CDat | CDataPair | CPCL | CPCLPair

Definition at line 725 of file bayesnetsmile.cpp.

References Sleipnir::CMeta::Deextension(), and Sleipnir::CMeta::Filename().

bool Sleipnir::CBayesNetSmile::Open	(	const IDataset *	pData,
		const std::vector< std::string > &	vecstrNames,
		const std::vector< size_t > &	veciDefaults
	)

Construct a new SMILE-based naive Bayes net with nodes corresponding to the given datasets.

Parameters:

pData	Datasets from which new Bayes net nodes should be constructed.
vecstrNames	String identifiers of the newly constructed nodes.
veciDefaults	Default values (if any) for missing data from each dataset. -1 is ignored, any other value is used as a default value when data is missing for the corresponding node.

Returns:: True if Bayes net was successfully constructed.

Constructs a naive Bayes classifier from the given datasets, with one node per dataset plus one additional class node at the top of the naive model. This class node corresponds to the first dataset in pData and will take two values, generally corresponding to function unrelatedness and relatedness. Each other node is named as indicated and takes the number of discrete values indicated by the dataset. Each value in veciDefaults not equal to -1 is used as a default when data is missing for the corresponding node.

Remarks:: The order and length of pData, vecstrNames, and veciDefaults must be identical.

Definition at line 779 of file bayesnetsmile.cpp.

References Sleipnir::IDataset::GetBins(), and Sleipnir::IDataset::GetExperiments().

bool Sleipnir::CBayesNetSmile::Open	(	const CBayesNetSmile &	BNPrior,
		const std::vector< CBayesNetSmile * > &	vecpBNs
	)

Construct a new SMILE-based naive Bayes net by merging the given class and data nodes.

Parameters:

BNPrior	Bayes net from which class (root) node is taken.
vecpBNs	Bayes nets from which data (child) nodes are taken.

Returns:: True if Bayes net was successfully constructed.

Constructs a new SMILE-based Bayes net by merging the root (prior or class) node from one Bayes net with the child (non-root) nodes from zero or more other networks. In other words, suppose BNPrior was a naive network with root P1 and children P2 and P3. vecpBNs contains two networks, one with root A1 and data nodes A2 and A3 and one with root B1 and child node B2. The newly constructed Bayes net would have a root node with P1's parameters and three children with A2, A3, and B2's parameters. This can be used to merge multiple naive classifiers created independently from the same answer set.

Remarks:: In the prior (class) network, only the root (first) node is used. In the data (child) networks, only the root (first) node is ignored, and the rest are copied into the new network as child nodes.

Definition at line 836 of file bayesnetsmile.cpp.

bool Sleipnir::CBayesNetSmile::Open	(	const CBayesNetMinimal &	BNMinimal,
		const std::vector< std::string > &	vecstrNames
	)

Creates a SMILE Bayes net equivalent to the given minimal naive Bayesian classifier.

Parameters:

BNMinimal	Minimal naive Bayesian classifier to be copied into the new SMILE network.
vecstrNames	Node IDs to be assigned to the SMILE network nodes.

Returns:: True if Bayes net was successfully constructed.

Remarks:: vecstrNames must contain the same number of strings as BNMinimal has non-root nodes.

Definition at line 883 of file bayesnetsmile.cpp.

References Sleipnir::CMeta::Filename(), Sleipnir::CFullMatrix< tType >::Get(), Sleipnir::CFullMatrix< tType >::GetColumns(), Sleipnir::CBayesNetMinimal::GetCPT(), Sleipnir::CBayesNetMinimal::GetDefault(), Sleipnir::CBayesNetMinimal::GetNodes(), and Sleipnir::CFullMatrix< tType >::GetRows().

bool Sleipnir::CBayesNetSmile::Open ( const char * szFile ) [inline, virtual]

Load a Bayes net from a file.

Parameters:

szFile Path to file.

Returns:: True if Bayes net was loaded succesfully.

Remarks:: Specific behavior is implementation specific; it is assumed that the network will be completely reinitialized from the given file, although it may be left in an inconsistent state if the return value is false.

Implements Sleipnir::IBayesNet.

Definition at line 95 of file bayesnet.h.

void Sleipnir::CBayesNetSmile::Randomize ( ) [virtual]

Randomizes every parameter in the Bayes net.

Remarks:: Parameter values are generated uniformly at random and normalized to represent a valid probability distribution.

Implements Sleipnir::IBayesNet.

Definition at line 494 of file bayesnetsmile.cpp.

void Sleipnir::CBayesNetSmile::Randomize ( size_t iNode ) [virtual]

Randomizes every parameter the requested node.

Parameters:

iNode Index of node to be randomized.

Remarks:: Parameter values are generated uniformly at random and normalized to represent a valid probability distribution.

Implements Sleipnir::IBayesNet.

Definition at line 504 of file bayesnetsmile.cpp.

void Sleipnir::CBayesNetSmile::Reverse ( size_t iNode ) [virtual]

Reverses the parameters of the requested node over its possible values.

Parameters:

iNode Index of node to be reversed.

Remarks:: May be ignored by some implementations, particularly continuously valued nodes.

Implements Sleipnir::IBayesNet.

Definition at line 523 of file bayesnetsmile.cpp.

bool Sleipnir::CBayesNetSmile::Save ( const char * szFile ) const [inline, virtual]

Save a Bayes net to a file.

Parameters:

szFile Path to file.

Returns:: True if Bayes net was saved succesfully.

Remarks:: Specific behavior is implementation specific; the Bayes net will not be modified, but the contents of the output file may be inconsistent if the return value is false.

Implements Sleipnir::IBayesNet.

Definition at line 99 of file bayesnet.h.

void Sleipnir::CBayesNetSmile::SetDefault ( const CBayesNetSmile & Defaults ) [inline]

Provide a Bayes net of identical structure from which default parameter values can be obtained.

Parameters:

Defaults Bayes net with identical structure whose parameters will be used when insufficient data is available during parameter learning.

If a default network is provided, the IBayesNet::Learn method will use that network's probability distribution for any parameter column in which fewer than CBayesNetSmileImpl::c_iMinimum examples are present. This prevents error introduced by methods such as Laplace smoothing when too few examples are present to estimate a reasonable maximum likelihood probability distribution.

Definition at line 81 of file bayesnet.h.

The documentation for this class was generated from the following files:

src/bayesnet.h
src/bayesnetsmile.cpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation