Sleipnir
|
An IDataset abstracts a collection of individual datasets, usually CDats, using various continuous and/or discrete encodings. More...
#include <dataset.h>
Public Member Functions | |
virtual bool | IsHidden (size_t iNode) const =0 |
Returns true if the requested experimental node is hidden (does not correspond to a data file). | |
virtual size_t | GetDiscrete (size_t iY, size_t iX, size_t iNode) const =0 |
Return the discretized value at the requested position. | |
virtual float | GetContinuous (size_t iY, size_t iX, size_t iNode) const =0 |
Return the continuous value at the requested position. | |
virtual const std::string & | GetGene (size_t iGene) const =0 |
Returns the gene name at the requested index. | |
virtual size_t | GetGenes () const =0 |
Returns the number of genes in the dataset. | |
virtual bool | IsExample (size_t iY, size_t iX) const =0 |
Returns true if some data file can be accessed at the requested position. | |
virtual const std::vector < std::string > & | GetGeneNames () const =0 |
Return a vector of all gene names in the dataset. | |
virtual size_t | GetExperiments () const =0 |
Return the number of experimental nodes in the dataset. | |
virtual size_t | GetGene (const std::string &strGene) const =0 |
Return the index of the given gene name, or -1 if it is not included in the dataset. | |
virtual size_t | GetBins (size_t iNode) const =0 |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous. | |
virtual void | Remove (size_t iY, size_t iX)=0 |
Remove all data for the given dataset position. | |
virtual void | FilterGenes (const CGenes &Genes, CDat::EFilter eFilter)=0 |
Remove values from the dataset based on the given gene set and filter type. | |
virtual void | Save (std::ostream &ostm, bool fBinary) const =0 |
Save a dataset to the given stream in binary or tabular (human readable) form. |
An IDataset abstracts a collection of individual datasets, usually CDats, using various continuous and/or discrete encodings.
An IDataset is intended to manage a collection of individual datasets, usually CDats. This is often used for integration of many datasets in a model such as a Bayes net or SVM, and as such, IDatasets can be used to learn or evaluate these models. Although most datasets will be backed by discretized CDats with no hidden data (e.g. CDatasetCompact), the IDataset interface allows:
The IDataset interface merges the gene lists from all contained data files into a single gene list, which it exposes through GetGenes/GetGene/GetGeneNames/etc. Gene indices are similarly normalized; requesting gene pair i,j will "mean" the same thing in each encapsulated dataset. Missing values will be filled in as necessary for data files not containing information for the requested pair. QUANT files associated with non-continuous data files will be loaded automatically.
virtual void Sleipnir::IDataset::FilterGenes | ( | const CGenes & | Genes, |
CDat::EFilter | eFilter | ||
) | [pure virtual] |
Remove values from the dataset based on the given gene set and filter type.
Genes | Gene set used to filter the dataset. |
eFilter | Way in which to use the given genes to remove values. |
Remove values and genes (by removing all incident edges) from the dataset based on one of several algorithms. For details, see CDat::EFilter.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual size_t Sleipnir::IDataset::GetBins | ( | size_t | iNode | ) | const [pure virtual] |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous.
iNode | Experimental node for which bin number should be returned. |
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CTrie< tType >::CTrie(), and Sleipnir::CBayesNetSmile::Open().
virtual float Sleipnir::IDataset::GetContinuous | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [pure virtual] |
Return the continuous value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual size_t Sleipnir::IDataset::GetDiscrete | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [pure virtual] |
Return the discretized value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CTrie< tType >::CTrie(), and Sleipnir::CBayesNetFN::Learn().
virtual size_t Sleipnir::IDataset::GetExperiments | ( | ) | const [pure virtual] |
Return the number of experimental nodes in the dataset.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CTrie< tType >::CTrie(), and Sleipnir::CBayesNetSmile::Open().
virtual const std::string& Sleipnir::IDataset::GetGene | ( | size_t | iGene | ) | const [pure virtual] |
Returns the gene name at the requested index.
iGene | Index of gene name to return. |
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual size_t Sleipnir::IDataset::GetGene | ( | const std::string & | strGene | ) | const [pure virtual] |
Return the index of the given gene name, or -1 if it is not included in the dataset.
strGene | Gene name to retrieve. |
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual const std::vector<std::string>& Sleipnir::IDataset::GetGeneNames | ( | ) | const [pure virtual] |
Return a vector of all gene names in the dataset.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual size_t Sleipnir::IDataset::GetGenes | ( | ) | const [pure virtual] |
Returns the number of genes in the dataset.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CDataMask::Attach(), Sleipnir::CTrie< tType >::CTrie(), and Sleipnir::CBayesNetFN::Learn().
virtual bool Sleipnir::IDataset::IsExample | ( | size_t | iY, |
size_t | iX | ||
) | const [pure virtual] |
Returns true if some data file can be accessed at the requested position.
iY | Data row. |
iX | Data column. |
A dataset position is a usable example if at least one data file can be accessed at that position; that is, if some data file provides a non-missing value for that gene pair. Implementations that filter pairs in some manner can also prevent particular positions from being usable examples.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompactMap, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CDataMask::Attach(), Sleipnir::CTrie< tType >::CTrie(), Sleipnir::CDataFilter::IsExample(), and Sleipnir::CBayesNetFN::Learn().
virtual bool Sleipnir::IDataset::IsHidden | ( | size_t | iNode | ) | const [pure virtual] |
Returns true if the requested experimental node is hidden (does not correspond to a data file).
iNode | Experimental node to investigate. |
Since a dataset can be constructed either directly on a collection of data files or by tying a model such as a Bayes net to data files, IDataset can determine which model nodes are hidden by testing whether a data file exists for them. If no such file exists, the node is hidden and, for example, can be treated specially during Bayesian learning.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
Referenced by Sleipnir::CTrie< tType >::CTrie().
virtual void Sleipnir::IDataset::Remove | ( | size_t | iY, |
size_t | iX | ||
) | [pure virtual] |
Remove all data for the given dataset position.
iY | Data row. |
iX | Data column. |
Unloads or masks data from all encapsulated files for the requested gene pair.
Implemented in Sleipnir::CDataSubset, Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompactMap, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.
virtual void Sleipnir::IDataset::Save | ( | std::ostream & | ostm, |
bool | fBinary | ||
) | const [pure virtual] |
Save a dataset to the given stream in binary or tabular (human readable) form.
ostm | Stream into which dataset is saved. |
fBinary | If true, save the dataset as a binary file; if false, save it as a text-based tab-delimited file. |
Implemented in Sleipnir::CDataFilter, Sleipnir::CDataMask, Sleipnir::CDatasetCompact, and Sleipnir::CDataset.