Sleipnir
|
A simple implementation of IDataset directly loading unmodified CDats for each non-hidden data node. More...
#include <dataset.h>
Public Member Functions | |
bool | Open (const char *szAnswerFile, const std::vector< std::string > &vecstrDataFiles) |
bool | Open (const std::vector< std::string > &vecstrDataFiles) |
Construct a dataset corresponding to the given files. | |
bool | Open (const char *szAnswerFile, const char *szDataDirectory, const IBayesNet *pBayesNet) |
Construct a dataset corresponding to the given Bayes net using the provided answer file and data files from the given directory. | |
bool | Open (const CDataPair &Answers, const char *szDataDirectory, const IBayesNet *pBayesNet) |
Construct a dataset corresponding to the given Bayes net using the provided answer file and data files from the given directory. | |
bool | Open (const char *szDataDirectory, const IBayesNet *pBayesNet) |
Construct a dataset corresponding to the given Bayes net using files from the given directory. | |
bool | OpenGenes (const std::vector< std::string > &vecstrDataFiles) |
Open only the merged gene list from the given data files. | |
size_t | GetDiscrete (size_t iY, size_t iX, size_t iNode) const |
Return the discretized value at the requested position. | |
bool | IsExample (size_t iY, size_t iX) const |
Returns true if some data file can be accessed at the requested position. | |
void | Remove (size_t iY, size_t iX) |
Remove all data for the given dataset position. | |
float | GetContinuous (size_t iY, size_t iX, size_t iNode) const |
Return the continuous value at the requested position. | |
void | FilterGenes (const CGenes &Genes, CDat::EFilter eFilter) |
Remove values from the dataset based on the given gene set and filter type. | |
const std::vector< std::string > & | GetGeneNames () const |
Return a vector of all gene names in the dataset. | |
bool | IsHidden (size_t iNode) const |
Returns true if the requested experimental node is hidden (does not correspond to a data file). | |
const std::string & | GetGene (size_t iGene) const |
Returns the gene name at the requested index. | |
size_t | GetGenes () const |
Returns the number of genes in the dataset. | |
size_t | GetExperiments () const |
Return the number of experimental nodes in the dataset. | |
size_t | GetGene (const std::string &strGene) const |
Return the index of the given gene name, or -1 if it is not included in the dataset. | |
size_t | GetBins (size_t iNode) const |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous. | |
void | Save (std::ostream &ostm, bool fBinary) const |
Save a dataset to the given stream in binary or tabular (human readable) form. |
A simple implementation of IDataset directly loading unmodified CDats for each non-hidden data node.
void Sleipnir::CDataset::FilterGenes | ( | const CGenes & | Genes, |
CDat::EFilter | eFilter | ||
) | [inline, virtual] |
Remove values from the dataset based on the given gene set and filter type.
Genes | Gene set used to filter the dataset. |
eFilter | Way in which to use the given genes to remove values. |
Remove values and genes (by removing all incident edges) from the dataset based on one of several algorithms. For details, see CDat::EFilter.
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataset::GetBins | ( | size_t | iNode | ) | const [inline, virtual] |
Return the number of discrete values in the requested experimental node; -1 if the node is hidden or continuous.
iNode | Experimental node for which bin number should be returned. |
Implements Sleipnir::IDataset.
float Sleipnir::CDataset::GetContinuous | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [inline, virtual] |
Return the continuous value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataset::GetDiscrete | ( | size_t | iY, |
size_t | iX, | ||
size_t | iNode | ||
) | const [virtual] |
Return the discretized value at the requested position.
iY | Data row. |
iX | Data column. |
iNode | Experimental node from which to retrieve the requested pair's value. |
Implements Sleipnir::IDataset.
Definition at line 521 of file dataset.cpp.
size_t Sleipnir::CDataset::GetExperiments | ( | ) | const [inline, virtual] |
Return the number of experimental nodes in the dataset.
Implements Sleipnir::IDataset.
const std::string& Sleipnir::CDataset::GetGene | ( | size_t | iGene | ) | const [inline, virtual] |
Returns the gene name at the requested index.
iGene | Index of gene name to return. |
Implements Sleipnir::IDataset.
Definition at line 403 of file dataset.h.
Referenced by GetGene().
size_t Sleipnir::CDataset::GetGene | ( | const std::string & | strGene | ) | const [inline, virtual] |
Return the index of the given gene name, or -1 if it is not included in the dataset.
strGene | Gene name to retrieve. |
Implements Sleipnir::IDataset.
Definition at line 415 of file dataset.h.
References GetGene().
const std::vector<std::string>& Sleipnir::CDataset::GetGeneNames | ( | ) | const [inline, virtual] |
Return a vector of all gene names in the dataset.
Implements Sleipnir::IDataset.
size_t Sleipnir::CDataset::GetGenes | ( | ) | const [inline, virtual] |
Returns the number of genes in the dataset.
Implements Sleipnir::IDataset.
bool Sleipnir::CDataset::IsExample | ( | size_t | iY, |
size_t | iX | ||
) | const [virtual] |
Returns true if some data file can be accessed at the requested position.
iY | Data row. |
iX | Data column. |
A dataset position is a usable example if at least one data file can be accessed at that position; that is, if some data file provides a non-missing value for that gene pair. Implementations that filter pairs in some manner can also prevent particular positions from being usable examples.
Implements Sleipnir::IDataset.
Definition at line 530 of file dataset.cpp.
References Sleipnir::CMeta::IsNaN().
bool Sleipnir::CDataset::IsHidden | ( | size_t | iNode | ) | const [inline, virtual] |
Returns true if the requested experimental node is hidden (does not correspond to a data file).
iNode | Experimental node to investigate. |
Since a dataset can be constructed either directly on a collection of data files or by tying a model such as a Bayes net to data files, IDataset can determine which model nodes are hidden by testing whether a data file exists for them. If no such file exists, the node is hidden and, for example, can be treated specially during Bayesian learning.
Implements Sleipnir::IDataset.
bool Sleipnir::CDataset::Open | ( | const char * | szAnswerFile, |
const std::vector< std::string > & | vecstrDataFiles | ||
) |
Construct a dataset corresponding to the given answer file and data files.
szAnswerFile | Answer file which will become the first node of the dataset. |
vecstrDataFiles | Vector of file paths to load. |
Creates a dataset with nodes corresponding to the given data files; the given answer file is inserted as the first (0th) node. All files are assumed to be continuous.
Definition at line 450 of file dataset.cpp.
References Sleipnir::CDat::GetGene(), Sleipnir::CDat::GetGenes(), and Sleipnir::CDataPair::Open().
Referenced by Open().
bool Sleipnir::CDataset::Open | ( | const std::vector< std::string > & | vecstrDataFiles | ) |
Construct a dataset corresponding to the given files.
vecstrDataFiles | Vector of file paths to load. |
Creates a dataset with nodes corresponding to the given data files. All files are assumed to be continuous.
Definition at line 494 of file dataset.cpp.
References Sleipnir::CDataPair::Open(), and OpenGenes().
bool Sleipnir::CDataset::Open | ( | const char * | szAnswerFile, |
const char * | szDataDirectory, | ||
const IBayesNet * | pBayesNet | ||
) |
Construct a dataset corresponding to the given Bayes net using the provided answer file and data files from the given directory.
szAnswerFile | Answer file which will become the first node of the dataset. |
szDataDirectory | Directory from which data files are loaded. |
pBayesNet | Bayes nets whose nodes will correspond to files in the dataset. |
Creates a dataset with nodes corresponding to the given Bayes net structure; the given answer file is always inserted as the first (0th) data file, and thus corresponds to the first node in the Bayes net (generally the class node predicting functional relationships). Data is loaded continuously or discretely as indicated by the Bayes net, and nodes for which a corresponding data file (i.e. one with the same name followed by an appropriate CDat extension) cannot be located are marked as hidden.
Definition at line 356 of file dataset.cpp.
References Sleipnir::IBayesNet::IsContinuous(), Sleipnir::CDataPair::Open(), and Open().
bool Sleipnir::CDataset::Open | ( | const CDataPair & | Answers, |
const char * | szDataDirectory, | ||
const IBayesNet * | pBayesNet | ||
) | [inline] |
Construct a dataset corresponding to the given Bayes net using the provided answer file and data files from the given directory.
Answers | Pre-loaded answer file which will become the first node of the dataset. |
szDataDirectory | Directory from which data files are loaded. |
pBayesNet | Bayes net whose nodes will correspond to files in the dataset. |
Creates a dataset with nodes corresponding to the given Bayes net structure; the given answer file is always inserted as the first (0th) data file, and thus corresponds to the first node in the Bayes net (generally the class node predicting functional relationships). Data is loaded continuously or discretely as indicated by the Bayes net, and nodes for which a corresponding data file (i.e. one with the same name followed by an appropriate CDat extension) cannot be located are marked as hidden.
Definition at line 329 of file dataset.h.
References Open().
bool Sleipnir::CDataset::Open | ( | const char * | szDataDirectory, |
const IBayesNet * | pBayesNet | ||
) | [inline] |
Construct a dataset corresponding to the given Bayes net using files from the given directory.
szDataDirectory | Directory from which data files are loaded. |
pBayesNet | Bayes net whose nodes will correspond to files in the dataset. |
Creates a dataset (without an answer file) with nodes corresponding to the given Bayes net structure. Data is loaded continuously or discretely as indicated by the Bayes net, and nodes for which a corresponding data file (i.e. one with the same name followed by an appropriate CDat extension) cannot be located are marked as hidden.
Definition at line 355 of file dataset.h.
References Open().
bool Sleipnir::CDataset::OpenGenes | ( | const std::vector< std::string > & | vecstrDataFiles | ) | [inline] |
Open only the merged gene list from the given data files.
vecstrDataFiles | Vector of file paths to load. |
Provides a way to rapidly list the set of all genes present in a given collection of data files while avoiding the overhead of loading the data itself.
Reimplemented from Sleipnir::CDataImpl.
Definition at line 379 of file dataset.h.
Referenced by Open().
void Sleipnir::CDataset::Remove | ( | size_t | iY, |
size_t | iX | ||
) | [virtual] |
Remove all data for the given dataset position.
iY | Data row. |
iX | Data column. |
Unloads or masks data from all encapsulated files for the requested gene pair.
Implements Sleipnir::IDataset.
Definition at line 542 of file dataset.cpp.
References Sleipnir::CMeta::GetNaN().
void Sleipnir::CDataset::Save | ( | std::ostream & | ostm, |
bool | fBinary | ||
) | const [inline, virtual] |
Save a dataset to the given stream in binary or tabular (human readable) form.
ostm | Stream into which dataset is saved. |
fBinary | If true, save the dataset as a binary file; if false, save it as a text-based tab-delimited file. |
Implements Sleipnir::IDataset.