Sleipnir
|
Augments a dataset with a mask to dynamically exclude specific gene pairs. More...
#include <dataset.h>
Public Member Functions | |
void | Attach (const IDataset *pDataset) |
Associates the data mask with the given dataset. | |
void | AttachRandom (const IDataset *pDataset, float dFraction) |
Associates the data mask with the given dataset and randomly hides a fraction of its data. | |
void | AttachComplement (const CDataMask &DataMask) |
Associates the data mask with the given mask's underlying dataset and reverses its mask. | |
bool | IsExample (size_t iY, size_t iX) const |
Returns true if some data file can be accessed at the requested position. | |
void | Remove (size_t iY, size_t iX) |
Remove all data for the given dataset position. | |
const std::vector< std::string > & | GetGeneNames () const |
Return a vector of all gene names in the dataset. | |
size_t | GetExperiments () const |
Return the number of experimental nodes in the dataset. | |
size_t | GetGene (const std::string &strGene) const |
Return the index of the given gene name, or -1 if it is not included in the dataset. | |
size_t | GetBins (size_t iNode) const |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous. | |
size_t | GetGenes () const |
Returns the number of genes in the dataset. | |
bool | IsHidden (size_t iNode) const |
Returns true if the requested experimental node is hidden (does not correspond to a data file). | |
size_t | GetDiscrete (size_t iY, size_t iX, size_t iNode) const |
Return the discretized value at the requested position. | |
float | GetContinuous (size_t iY, size_t iX, size_t iNode) const |
Return the continuous value at the requested position. | |
const std::string & | GetGene (size_t iGene) const |
Returns the gene name at the requested index. | |
void | FilterGenes (const CGenes &Genes, CDat::EFilter eFilter) |
Remove values from the dataset based on the given gene set and filter type. | |
void | Save (std::ostream &ostm, bool fBinary) const |
Save a dataset to the given stream in binary or tabular (human readable) form. |
Augments a dataset with a mask to dynamically exclude specific gene pairs.
A data mask wraps an underlying dataset with a binary matrix allowing each gene pair to be individually masked; a masked gene pair will return false from IsExample and act like missing data. Unmasked gene pairs will be retrieved from the underlying dataset. This allows data to be temporarily hidden without modifying the underlying dataset. Can be used in combination with CDataFilter.
void Sleipnir::CDataMask::Attach | ( | const IDataset * | pDataset | ) |
Associates the data mask with the given dataset.
pDataset | Dataset to be associated with the overlaying mask. |
Definition at line 713 of file dataset.cpp.
References Sleipnir::IDataset::GetGenes(), Sleipnir::CHalfMatrix< tType >::GetSize(), Sleipnir::CBinaryMatrix::Initialize(), Sleipnir::IDataset::IsExample(), and Sleipnir::CBinaryMatrix::Set().
Referenced by AttachComplement(), and AttachRandom().
void Sleipnir::CDataMask::AttachComplement | ( | const CDataMask & | DataMask | ) |
Associates the data mask with the given mask's underlying dataset and reverses its mask.
DataMask | Mask to be reversed by the current mask. |
This associates the current mask with the given mask's underlying dataset and generates an inverted mask: all pairs hidden in the given mask are unhidden, and all unhidden pairs are hidden. Data pairs missing in the underlying dataset will return false from IsExample regardless.
Definition at line 698 of file dataset.cpp.
References Attach(), Sleipnir::CBinaryMatrix::Get(), Sleipnir::CHalfMatrix< tType >::GetSize(), and Sleipnir::CBinaryMatrix::Set().
void Sleipnir::CDataMask::AttachRandom | ( | const IDataset * | pDataset, |
float | dFraction | ||
) |
Associates the data mask with the given dataset and randomly hides a fraction of its data.
pDataset | Dataset to be associated with the overlaying mask. |
dFraction | Fraction of gene pairs (between 0 and 1) to be randomly masked. |
Definition at line 679 of file dataset.cpp.
References Attach(), Sleipnir::CBinaryMatrix::Get(), Sleipnir::CHalfMatrix< tType >::GetSize(), and Sleipnir::CBinaryMatrix::Set().
void Sleipnir::CDataMask::FilterGenes | ( | const CGenes & | Genes, |
CDat::EFilter | eFilter | ||
) | [inline, virtual] |
Remove values from the dataset based on the given gene set and filter type.
Genes | Gene set used to filter the dataset. |
eFilter | Way in which to use the given genes to remove values. |
Remove values and genes (by removing all incident edges) from the dataset based on one of several algorithms. For details, see CDat::EFilter.
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataMask::GetBins | ( | size_t | iNode | ) | const [inline, virtual] |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous.
iNode | Experimental node for which bin number should be returned. |
Implements Sleipnir::IDataset.
float Sleipnir::CDataMask::GetContinuous | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [inline, virtual] |
Return the continuous value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataMask::GetDiscrete | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [inline, virtual] |
Return the discretized value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataMask::GetExperiments | ( | ) | const [inline, virtual] |
Return the number of experimental nodes in the dataset.
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataMask::GetGene | ( | const std::string & | strGene | ) | const [inline, virtual] |
Return the index of the given gene name, or -1 if it is not included in the dataset.
strGene | Gene name to retrieve. |
Implements Sleipnir::IDataset.
Definition at line 680 of file dataset.h.
Referenced by GetGene().
const std::string& Sleipnir::CDataMask::GetGene | ( | size_t | iGene | ) | const [inline, virtual] |
Returns the gene name at the requested index.
iGene | Index of gene name to return. |
Implements Sleipnir::IDataset.
Definition at line 704 of file dataset.h.
References GetGene().
const std::vector<std::string>& Sleipnir::CDataMask::GetGeneNames | ( | ) | const [inline, virtual] |
Return a vector of all gene names in the dataset.
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataMask::GetGenes | ( | ) | const [inline, virtual] |
Returns the number of genes in the dataset.
Implements Sleipnir::IDataset.
bool Sleipnir::CDataMask::IsExample | ( | size_t | iY, |
size_t | iX | ||
) | const [inline, virtual] |
Returns true if some data file can be accessed at the requested position.
iY | Data row. |
iX | Data column. |
A dataset position is a usable example if at least one data file can be accessed at that position; that is, if some data file provides a non-missing value for that gene pair. Implementations that filter pairs in some manner can also prevent particular positions from being usable examples.
Implements Sleipnir::IDataset.
Definition at line 664 of file dataset.h.
References Sleipnir::CBinaryMatrix::Get().
bool Sleipnir::CDataMask::IsHidden | ( | size_t | iNode | ) | const [inline, virtual] |
Returns true if the requested experimental node is hidden (does not correspond to a data file).
iNode | Experimental node to investigate. |
Since a dataset can be constructed either directly on a collection of data files or by tying a model such as a Bayes net to data files, IDataset can determine which model nodes are hidden by testing whether a data file exists for them. If no such file exists, the node is hidden and, for example, can be treated specially during Bayesian learning.
Implements Sleipnir::IDataset.
void Sleipnir::CDataMask::Remove | ( | size_t | iY, |
size_t | iX | ||
) | [inline, virtual] |
Remove all data for the given dataset position.
iY | Data row. |
iX | Data column. |
Unloads or masks data from all encapsulated files for the requested gene pair.
Implements Sleipnir::IDataset.
Definition at line 668 of file dataset.h.
References Sleipnir::CBinaryMatrix::Set().
void Sleipnir::CDataMask::Save | ( | std::ostream & | ostm, |
bool | fBinary | ||
) | const [inline, virtual] |
Save a dataset to the given stream in binary or tabular (human readable) form.
ostm | Stream into which dataset is saved. |
fBinary | If true, save the dataset as a binary file; if false, save it as a text-based tab-delimited file. |
Implements Sleipnir::IDataset.