Sleipnir
|
Utility class containing static utility functions. More...
#include <meta.h>
Public Member Functions | |
CMeta (int iVerbosity, size_t iRandomSeed=0) | |
Utility constructor that initializes Sleipnir (primarily log4cpp) at construction time and performs cleanup when destroyed. | |
Static Public Member Functions | |
static std::string | Filename (const std::string &strString, char cReplacement= '_') |
Replace all non-alphanumeric characters in a string with the given replacement. | |
static std::string | Basename (const char *szPath) |
Attempt to return the filename portion of a path in a platform-independent manner. | |
static void | Tokenize (const char *szString, std::vector< std::string > &vecstrTokens, const char *szSeparators="\t", bool fNoEmpties=false) |
Tokenize a given string based on one or more delimiter characters. | |
static std::string | Trim (const char *szString) |
Trim whitespace from the beginning and end of the given string. | |
static bool | MapRead (unsigned char *&pbData, HANDLE &hndlMap, size_t &iSize, const char *szFile) |
Memory map an existing file read-only in a largely platform-independent manner. | |
static bool | MapWrite (unsigned char *&pbData, HANDLE &hndlMap, size_t iSize, const char *szFile) |
Create a new writeable memory mapped file in a largely platform-independent manner. | |
static void | Unmap (const unsigned char *pbData, HANDLE hndlMap, size_t iSize) |
Unmap a memory map in a largely platform-independent manner. | |
static size_t | GetMemoryUsage () |
Returns (very approximately) the process's current memory usage in bytes. | |
template<class tType > | |
static bool | IsNaN (tType Value) |
Return true if the given value represents a missing value. | |
static float | GetNaN () |
Return a standard missing value marker. | |
static std::string | Deextension (const std::string &strName) |
Given a filename, remove the file type extension (if any). | |
template<class tIterator > | |
static void | Permute (tIterator Items, const std::vector< size_t > &veciOrder) |
Reorder a given item list based on a target ordering. | |
template<class tType > | |
static void | Permute (std::vector< tType > &vecItems, const std::vector< size_t > &veciOrder) |
Reorder a given item list based on a target ordering. | |
template<class tType > | |
static size_t | Quantize (tType Value, const std::vector< tType > &vecQuants) |
Discretize a given continuous value based on a vector of bin edges. | |
static size_t | GetMicroseconds (const struct timeval &sBegin, const struct timeval &sEnd) |
Calculates the difference in microseconds between two timevals. | |
static bool | SkipEdge (bool fAnswer, size_t i, size_t j, const std::vector< bool > &vecfHere, const std::vector< bool > &vecfUbik, bool fCtxtPos, bool fCtxtNeg, bool fBridgePos, bool fBridgeNeg, bool fOutPos, bool fOutNeg) |
Determines whether or not an item should be skipped (based on potential ubiquitous genes, context genes, and flags). | |
static bool | IsExtension (const std::string &strFile, const std::string &strExtension) |
Returns true if the given file path ends with the given extension. | |
Static Public Attributes | |
static const char | c_szWS [] = " \t\r\n" |
String constant containing basic whitespace characters: space, tab, newline, return. |
Utility class containing static utility functions.
CMeta is critical in that it contains the Startup and Shutdown functions, which should be called at the beginning and end of every process (usually in the main function) using Sleipnir. These exist primarily to set up and tear down logging, and can also be used to standardize the random seed for a process (useful for testing). Most other methods in CMeta are generic utilities for string manipulation and a few operating system abstractions (particularly memory mapping).
Sleipnir::CMeta::CMeta | ( | int | iVerbosity, |
size_t | iRandomSeed = 0 |
||
) |
Utility constructor that initializes Sleipnir (primarily log4cpp) at construction time and performs cleanup when destroyed.
iVerbosity | If linked with log4cpp, the verbosity level for logging. |
iRandomSeed | Random seed for use with srand; if -1, the current time is used. |
One (and only one) CMeta object should be created in a Sleipnir client's main
function before making any library calls. The object will be automatically destroyed as main
exits, guaranteeing proper cleanup of Sleipnir (and log4cpp).
string Sleipnir::CMeta::Basename | ( | const char * | szPath | ) | [static] |
Attempt to return the filename portion of a path in a platform-independent manner.
szPath | File path from which filename is extracted. |
Definition at line 134 of file meta.cpp.
Referenced by Sleipnir::CDatabase::Open().
static std::string Sleipnir::CMeta::Deextension | ( | const std::string & | strName | ) | [inline, static] |
Given a filename, remove the file type extension (if any).
strName | Filename to be de-extensioned. |
Definition at line 145 of file meta.h.
Referenced by Sleipnir::CBayesNetSmile::Open(), and Sleipnir::CDatabase::Open().
string Sleipnir::CMeta::Filename | ( | const std::string & | strString, |
char | cReplacement = '_' |
||
) | [static] |
Replace all non-alphanumeric characters in a string with the given replacement.
strString | String in which non-alphanumeric characters are replaced. |
cReplacement | Character used to replace non-alphanumeric characters. |
This method is intended to clean a string to make it appropriate for use as a file name or other alphanumeric identifier; given a string, non-alphanumeric characters are replaced with a configurable character, usually underscore.
Definition at line 70 of file meta.cpp.
Referenced by Sleipnir::CBayesNetSmile::Open(), Sleipnir::CDat::SaveDOT(), and Sleipnir::CDat::SaveGDF().
size_t Sleipnir::CMeta::GetMemoryUsage | ( | ) | [static] |
Returns (very approximately) the process's current memory usage in bytes.
static size_t Sleipnir::CMeta::GetMicroseconds | ( | const struct timeval & | sBegin, |
const struct timeval & | sEnd | ||
) | [inline, static] |
static float Sleipnir::CMeta::GetNaN | ( | ) | [inline, static] |
Return a standard missing value marker.
Definition at line 128 of file meta.h.
Referenced by Sleipnir::CPCL::AddGenes(), Sleipnir::CSeekDataset::CSeekDataset(), Sleipnir::CPCL::Distance(), Sleipnir::CBayesNetSmile::Evaluate(), Sleipnir::CBayesNetFN::Evaluate(), Sleipnir::CBayesNetMinimal::Evaluate(), Sleipnir::CDat::FilterGenes(), Sleipnir::CPCLSet::Get(), Sleipnir::CDatFilter::Get(), Sleipnir::CDatasetCompact::GetContinuous(), Sleipnir::CDataFilter::GetContinuous(), Sleipnir::CDataSubset::GetContinuous(), Sleipnir::CCoalesceMotifLibrary::GetMatch(), Sleipnir::CPCL::Impute(), Sleipnir::CStatistics::InverseNormal01CDF(), Sleipnir::CStatistics::KullbackLeiblerDivergence(), Sleipnir::CStatistics::MatrixLUDeterminant(), Sleipnir::CMeasureAutocorrelate::Measure(), Sleipnir::CMeasureEuclidean::Measure(), Sleipnir::CMeasureEuclideanScaled::Measure(), Sleipnir::CMeasureKolmogorovSmirnov::Measure(), Sleipnir::CMeasureKendallsTau::Measure(), Sleipnir::CMeasureSpearman::Measure(), Sleipnir::CMeasureHypergeometric::Measure(), Sleipnir::CMeasureInnerProduct::Measure(), Sleipnir::CMeasureBinaryInnerProduct::Measure(), Sleipnir::CMeasureMutualInformation::Measure(), Sleipnir::CMeasureRelativeAUC::Measure(), Sleipnir::CMeasurePearsonSignificance::Measure(), Sleipnir::CMeasureDistanceCorrelation::Measure(), Sleipnir::CMeasureSignedDistanceCorrelation::Measure(), Sleipnir::CMeasureDice::Measure(), Sleipnir::CPCL::MedianMultiples(), Sleipnir::CStatistics::MultivariateNormalCDF(), Sleipnir::CStatistics::MultivariateNormalPDF(), Sleipnir::CPCL::Open(), Sleipnir::CDat::Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CMeasurePearson::Pearson(), Sleipnir::CStatistics::Percentile(), Sleipnir::CDataset::Remove(), and Sleipnir::CSeekDataset::~CSeekDataset().
static bool Sleipnir::CMeta::IsExtension | ( | const std::string & | strFile, |
const std::string & | strExtension | ||
) | [inline, static] |
Returns true if the given file path ends with the given extension.
strFile | File path (or name). |
strExtension | Extension to test (including period, if desired). |
Definition at line 395 of file meta.h.
Referenced by Sleipnir::CDatabase::Open().
static bool Sleipnir::CMeta::IsNaN | ( | tType | Value | ) | [inline, static] |
Return true if the given value represents a missing value.
Value | Value to test. |
Definition at line 110 of file meta.h.
Referenced by Sleipnir::CClustKMeans::Cluster(), Sleipnir::CClustPivot::Cluster(), Sleipnir::CClustQTC::Cluster(), Sleipnir::CPCL::Distance(), Sleipnir::CDatFilter::Get(), Sleipnir::CCoalesceMotifLibrary::GetMatch(), Sleipnir::CPCL::Impute(), Sleipnir::CSeekDataset::InitializeGeneMap(), Sleipnir::CDat::Invert(), Sleipnir::CDataset::IsExample(), Sleipnir::CMeasureEuclidean::Measure(), Sleipnir::CMeasureEuclideanScaled::Measure(), Sleipnir::CMeasureHypergeometric::Measure(), Sleipnir::CMeasureInnerProduct::Measure(), Sleipnir::CMeasureBinaryInnerProduct::Measure(), Sleipnir::CMeasureMutualInformation::Measure(), Sleipnir::CMeasureRelativeAUC::Measure(), Sleipnir::CMeasurePearsonSignificance::Measure(), Sleipnir::CMeasureDice::Measure(), Sleipnir::CPCL::MedianMultiples(), Sleipnir::CStatistics::MultivariateNormalPDF(), Sleipnir::CPCL::Normalize(), Sleipnir::CDat::Open(), Sleipnir::CMeasurePearson::Pearson(), Sleipnir::CStatistics::Percentile(), Sleipnir::CPCL::populate(), Sleipnir::CDataPair::Quantize(), Quantize(), Sleipnir::CDat::Randomize(), Sleipnir::CDat::Rank(), Sleipnir::CPCL::RankTransform(), Sleipnir::CDataPair::Save(), Sleipnir::CDat::SaveDOT(), Sleipnir::CDat::SaveGDF(), Sleipnir::CPCL::SaveGene(), Sleipnir::CDat::SaveMATISSE(), Sleipnir::CDat::SaveNET(), Sleipnir::CCoalesceCluster::Subtract(), Sleipnir::CSeekCentral::VarianceWeightSearch(), and Sleipnir::CStatistics::WilcoxonRankSum().
bool Sleipnir::CMeta::MapRead | ( | unsigned char *& | pbData, |
HANDLE & | hndlMap, | ||
size_t & | iSize, | ||
const char * | szFile | ||
) | [static] |
Memory map an existing file read-only in a largely platform-independent manner.
pbData | Output pointer to mapped file data. |
hndlMap | Output handle to file map; ignored on non-Windows platforms. |
iSize | Output size of mapped file. |
szFile | File name to map. |
Definition at line 199 of file meta.cpp.
References Unmap().
Referenced by Sleipnir::CDat::Open(), Sleipnir::CPCL::Open(), and Sleipnir::CDatasetCompactMap::Open().
bool Sleipnir::CMeta::MapWrite | ( | unsigned char *& | pbData, |
HANDLE & | hndlMap, | ||
size_t | iSize, | ||
const char * | szFile | ||
) | [static] |
Create a new writeable memory mapped file in a largely platform-independent manner.
pbData | Output pointer to mapped file data. |
hndlMap | Output handle to file map; ignored on non-Windows platforms. |
iSize | Size of desired memory map. |
szFile | File name to map. |
This function creates a new file, sizes it to the requested number of bytes, and memory maps it writeably. Using it on an existing file will generally destroy it and overwrite it with new data.
Definition at line 267 of file meta.cpp.
References Unmap().
Referenced by Sleipnir::CDat::Open().
static void Sleipnir::CMeta::Permute | ( | tIterator | Items, |
const std::vector< size_t > & | veciOrder | ||
) | [inline, static] |
Reorder a given item list based on a target ordering.
Items | Iterator over items to be reordered. |
veciOrder | Indices at which each item should be placed. |
Reorders a list of items based on a list of target indices. For example, suppose the input list is [A, B, C] and the target order is [1, 0, 2]. Then after permutation, the vector of items will contain [B, A, C]. The reordering is done without copying more than one element at a time.
Definition at line 166 of file meta.h.
Referenced by Sleipnir::CPST::GetPWM(), Permute(), and Sleipnir::CPCL::SortGenes().
static void Sleipnir::CMeta::Permute | ( | std::vector< tType > & | vecItems, |
const std::vector< size_t > & | veciOrder | ||
) | [inline, static] |
Reorder a given item list based on a target ordering.
vecItems | Vector of items to be reordered. |
veciOrder | Indices at which each item should be placed. |
Reorders a list of items based on a list of target indices. For example, suppose the input list is [A, B, C] and the target order is [1, 0, 2]. Then after permutation, the vector of items will contain [B, A, C]. The reordering is done without copying more than one element at a time.
Definition at line 205 of file meta.h.
References Permute().
static size_t Sleipnir::CMeta::Quantize | ( | tType | Value, |
const std::vector< tType > & | vecQuants | ||
) | [inline, static] |
Discretize a given continuous value based on a vector of bin edges.
Value | Continuous value to be discretized. |
vecQuants | Bin edges used to discretize the given value. |
Given N bin edges, the continuous value will be discretized into the range [0, N-1] depending on the first bin edge it is less than or equal to. Note that this means that the last bin edge will be ignored. Upper bin edges are inclusive, lower bin edges are exclusive. This means that for bins [-0.1, 0.3, 0.6], the given value will be discretized into one of three outputs:
Definition at line 234 of file meta.h.
References IsNaN().
Referenced by Sleipnir::CDataPair::Quantize(), and Sleipnir::CPCLPair::Quantize().
static bool Sleipnir::CMeta::SkipEdge | ( | bool | fAnswer, |
size_t | i, | ||
size_t | j, | ||
const std::vector< bool > & | vecfHere, | ||
const std::vector< bool > & | vecfUbik, | ||
bool | fCtxtPos, | ||
bool | fCtxtNeg, | ||
bool | fBridgePos, | ||
bool | fBridgeNeg, | ||
bool | fOutPos, | ||
bool | fOutNeg | ||
) | [inline, static] |
Determines whether or not an item should be skipped (based on potential ubiquitous genes, context genes, and flags).
fAnswer | The answer (pos/neg) as a boolean. |
i | The index of the first gene |
j | the index of the second gene |
vecfHere | A vector representing genes in the context (if any) as bool |
vecfUbik | A vector representing ubiquitous genes (as bool) |
fCtxtPos | Should within-context positives be used? |
fCtxtNeg | Should within-context negatives be used? |
fBridgePos | Should bridging positives be used? |
fBridgeNeg | Should bridging negatives be used? |
fOutPos | Should outside positives be used? |
fOutNeg | Should outside negatives be used? |
Definition at line 343 of file meta.h.
Referenced by Sleipnir::CStatistics::WilcoxonRankSum().
void Sleipnir::CMeta::Tokenize | ( | const char * | szString, |
std::vector< std::string > & | vecstrTokens, | ||
const char * | szSeparators = "\t" , |
||
bool | fNoEmpties = false |
||
) | [static] |
Tokenize a given string based on one or more delimiter characters.
szString | String to be tokenized. |
vecstrTokens | Output vector of tokens from the given string. |
szSeparators | One or more separator characters used to split tokens. |
fNoEmpties | If true, discard empty strings between delimiters; otherwise, include them in the output. |
Definition at line 96 of file meta.cpp.
Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CSeekCentral::Initialize(), Sleipnir::COrthology::Open(), Sleipnir::CCoalesceCluster::Open(), Sleipnir::CSVM::Open(), Sleipnir::CFASTA::Open(), Sleipnir::CGenome::Open(), Sleipnir::CGenes::Open(), Sleipnir::CBayesNetMinimal::OpenCounts(), Sleipnir::CCoalesceMotifLibrary::OpenKnown(), Sleipnir::CGenes::OpenWeighted(), Sleipnir::CSeekTools::ReadListTwoColumns(), Sleipnir::CSeekTools::ReadMultiGeneOneLine(), Sleipnir::CSeekTools::ReadMultipleQueries(), and Sleipnir::CSeekTools::ReadQuantFile().
string Sleipnir::CMeta::Trim | ( | const char * | szString | ) | [static] |
void Sleipnir::CMeta::Unmap | ( | const unsigned char * | pbData, |
HANDLE | hndlMap, | ||
size_t | iSize | ||
) | [static] |
Unmap a memory map in a largely platform-independent manner.
pbData | Pointer to mapped file data. |
hndlMap | Handle to file map; ignored on non-Windows platforms. |
iSize | Size of memory map. |
Definition at line 318 of file meta.cpp.
Referenced by MapRead(), MapWrite(), and Sleipnir::CDatasetCompactMap::Open().