Sleipnir
Public Member Functions | Static Public Member Functions | Static Public Attributes
Sleipnir::CMeta Class Reference

Utility class containing static utility functions. More...

#include <meta.h>

Inheritance diagram for Sleipnir::CMeta:
Sleipnir::CMetaImpl

Public Member Functions

 CMeta (int iVerbosity, size_t iRandomSeed=0)
 Utility constructor that initializes Sleipnir (primarily log4cpp) at construction time and performs cleanup when destroyed.

Static Public Member Functions

static std::string Filename (const std::string &strString, char cReplacement= '_')
 Replace all non-alphanumeric characters in a string with the given replacement.
static std::string Basename (const char *szPath)
 Attempt to return the filename portion of a path in a platform-independent manner.
static void Tokenize (const char *szString, std::vector< std::string > &vecstrTokens, const char *szSeparators="\t", bool fNoEmpties=false)
 Tokenize a given string based on one or more delimiter characters.
static std::string Trim (const char *szString)
 Trim whitespace from the beginning and end of the given string.
static bool MapRead (unsigned char *&pbData, HANDLE &hndlMap, size_t &iSize, const char *szFile)
 Memory map an existing file read-only in a largely platform-independent manner.
static bool MapWrite (unsigned char *&pbData, HANDLE &hndlMap, size_t iSize, const char *szFile)
 Create a new writeable memory mapped file in a largely platform-independent manner.
static void Unmap (const unsigned char *pbData, HANDLE hndlMap, size_t iSize)
 Unmap a memory map in a largely platform-independent manner.
static size_t GetMemoryUsage ()
 Returns (very approximately) the process's current memory usage in bytes.
template<class tType >
static bool IsNaN (tType Value)
 Return true if the given value represents a missing value.
static float GetNaN ()
 Return a standard missing value marker.
static std::string Deextension (const std::string &strName)
 Given a filename, remove the file type extension (if any).
template<class tIterator >
static void Permute (tIterator Items, const std::vector< size_t > &veciOrder)
 Reorder a given item list based on a target ordering.
template<class tType >
static void Permute (std::vector< tType > &vecItems, const std::vector< size_t > &veciOrder)
 Reorder a given item list based on a target ordering.
template<class tType >
static size_t Quantize (tType Value, const std::vector< tType > &vecQuants)
 Discretize a given continuous value based on a vector of bin edges.
static size_t GetMicroseconds (const struct timeval &sBegin, const struct timeval &sEnd)
 Calculates the difference in microseconds between two timevals.
static bool SkipEdge (bool fAnswer, size_t i, size_t j, const std::vector< bool > &vecfHere, const std::vector< bool > &vecfUbik, bool fCtxtPos, bool fCtxtNeg, bool fBridgePos, bool fBridgeNeg, bool fOutPos, bool fOutNeg)
 Determines whether or not an item should be skipped (based on potential ubiquitous genes, context genes, and flags).
static bool IsExtension (const std::string &strFile, const std::string &strExtension)
 Returns true if the given file path ends with the given extension.

Static Public Attributes

static const char c_szWS [] = " \t\r\n"
 String constant containing basic whitespace characters: space, tab, newline, return.

Detailed Description

Utility class containing static utility functions.

CMeta is critical in that it contains the Startup and Shutdown functions, which should be called at the beginning and end of every process (usually in the main function) using Sleipnir. These exist primarily to set up and tear down logging, and can also be used to standardize the random seed for a process (useful for testing). Most other methods in CMeta are generic utilities for string manipulation and a few operating system abstractions (particularly memory mapping).

Definition at line 73 of file meta.h.


Constructor & Destructor Documentation

Sleipnir::CMeta::CMeta ( int  iVerbosity,
size_t  iRandomSeed = 0 
)

Utility constructor that initializes Sleipnir (primarily log4cpp) at construction time and performs cleanup when destroyed.

Parameters:
iVerbosityIf linked with log4cpp, the verbosity level for logging.
iRandomSeedRandom seed for use with srand; if -1, the current time is used.

One (and only one) CMeta object should be created in a Sleipnir client's main function before making any library calls. The object will be automatically destroyed as main exits, guaranteeing proper cleanup of Sleipnir (and log4cpp).

Remarks:
If Sleipnir is configured without log4cpp, iVerbosity will be ignored.

Definition at line 29 of file meta.cpp.


Member Function Documentation

string Sleipnir::CMeta::Basename ( const char *  szPath) [static]

Attempt to return the filename portion of a path in a platform-independent manner.

Parameters:
szPathFile path from which filename is extracted.
Returns:
Filename portion of the given path.
Remarks:
Actually looks for the last / or \ character in the string and returns everything to the right of that. Of course this won't always work, but it tends to do an awfully good job.

Definition at line 134 of file meta.cpp.

Referenced by Sleipnir::CDatabase::Open().

static std::string Sleipnir::CMeta::Deextension ( const std::string &  strName) [inline, static]

Given a filename, remove the file type extension (if any).

Parameters:
strNameFilename to be de-extensioned.
Returns:
Filename without the trailing extension.
Remarks:
Actually removes anything after the last . in the given string.

Definition at line 145 of file meta.h.

Referenced by Sleipnir::CBayesNetSmile::Open(), and Sleipnir::CDatabase::Open().

string Sleipnir::CMeta::Filename ( const std::string &  strString,
char  cReplacement = '_' 
) [static]

Replace all non-alphanumeric characters in a string with the given replacement.

Parameters:
strStringString in which non-alphanumeric characters are replaced.
cReplacementCharacter used to replace non-alphanumeric characters.
Returns:
String with non-alphanumeric characters replaced.

This method is intended to clean a string to make it appropriate for use as a file name or other alphanumeric identifier; given a string, non-alphanumeric characters are replaced with a configurable character, usually underscore.

Definition at line 70 of file meta.cpp.

Referenced by Sleipnir::CBayesNetSmile::Open(), Sleipnir::CDat::SaveDOT(), and Sleipnir::CDat::SaveGDF().

size_t Sleipnir::CMeta::GetMemoryUsage ( ) [static]

Returns (very approximately) the process's current memory usage in bytes.

Returns:
Current (approximate) memory usage in bytes.
Remarks:
Disabled by default on Windows because it requires an extra library (psapi.lib); reads /proc/<pid>/statm on Linux and returns the resident set size, which is better than nothing.

Definition at line 342 of file meta.cpp.

static size_t Sleipnir::CMeta::GetMicroseconds ( const struct timeval &  sBegin,
const struct timeval &  sEnd 
) [inline, static]

Calculates the difference in microseconds between two timevals.

Parameters:
sBeginEarlier timeval.
sEndLater timeval.
Returns:
Difference in microseconds between later and earlier timeval.

Definition at line 298 of file meta.h.

static float Sleipnir::CMeta::GetNaN ( ) [inline, static]

Return a standard missing value marker.

Returns:
Standard missing value marker.
Remarks:
Should be used anywhere a missing value is required, e.g. CPCL or CDat.
See also:
IsNaN

Definition at line 128 of file meta.h.

Referenced by Sleipnir::CPCL::AddGenes(), Sleipnir::CSeekDataset::CSeekDataset(), Sleipnir::CPCL::Distance(), Sleipnir::CBayesNetSmile::Evaluate(), Sleipnir::CBayesNetFN::Evaluate(), Sleipnir::CBayesNetMinimal::Evaluate(), Sleipnir::CDat::FilterGenes(), Sleipnir::CPCLSet::Get(), Sleipnir::CDatFilter::Get(), Sleipnir::CDatasetCompact::GetContinuous(), Sleipnir::CDataFilter::GetContinuous(), Sleipnir::CDataSubset::GetContinuous(), Sleipnir::CCoalesceMotifLibrary::GetMatch(), Sleipnir::CPCL::Impute(), Sleipnir::CStatistics::InverseNormal01CDF(), Sleipnir::CStatistics::KullbackLeiblerDivergence(), Sleipnir::CStatistics::MatrixLUDeterminant(), Sleipnir::CMeasureAutocorrelate::Measure(), Sleipnir::CMeasureEuclidean::Measure(), Sleipnir::CMeasureEuclideanScaled::Measure(), Sleipnir::CMeasureKolmogorovSmirnov::Measure(), Sleipnir::CMeasureKendallsTau::Measure(), Sleipnir::CMeasureSpearman::Measure(), Sleipnir::CMeasureHypergeometric::Measure(), Sleipnir::CMeasureInnerProduct::Measure(), Sleipnir::CMeasureBinaryInnerProduct::Measure(), Sleipnir::CMeasureMutualInformation::Measure(), Sleipnir::CMeasureRelativeAUC::Measure(), Sleipnir::CMeasurePearsonSignificance::Measure(), Sleipnir::CMeasureDistanceCorrelation::Measure(), Sleipnir::CMeasureSignedDistanceCorrelation::Measure(), Sleipnir::CMeasureDice::Measure(), Sleipnir::CPCL::MedianMultiples(), Sleipnir::CStatistics::MultivariateNormalCDF(), Sleipnir::CStatistics::MultivariateNormalPDF(), Sleipnir::CPCL::Open(), Sleipnir::CDat::Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CMeasurePearson::Pearson(), Sleipnir::CStatistics::Percentile(), Sleipnir::CDataset::Remove(), and Sleipnir::CSeekDataset::~CSeekDataset().

static bool Sleipnir::CMeta::IsExtension ( const std::string &  strFile,
const std::string &  strExtension 
) [inline, static]

Returns true if the given file path ends with the given extension.

Parameters:
strFileFile path (or name).
strExtensionExtension to test (including period, if desired).
Returns:
True if the given file path ends with the given extension.

Definition at line 395 of file meta.h.

Referenced by Sleipnir::CDatabase::Open().

template<class tType >
static bool Sleipnir::CMeta::IsNaN ( tType  Value) [inline, static]

Return true if the given value represents a missing value.

Parameters:
ValueValue to test.
Returns:
True if the given value represents a missing value.
Remarks:
Returns true for either infinite or not-a-number (NaN) values, the latter of which is used as a standard missing value marker. Templated to work with either doubles or floats, although the latter are more standard. Used since == isn't reliable for NaN.
See also:
GetNaN

Definition at line 110 of file meta.h.

Referenced by Sleipnir::CClustKMeans::Cluster(), Sleipnir::CClustPivot::Cluster(), Sleipnir::CClustQTC::Cluster(), Sleipnir::CPCL::Distance(), Sleipnir::CDatFilter::Get(), Sleipnir::CCoalesceMotifLibrary::GetMatch(), Sleipnir::CPCL::Impute(), Sleipnir::CSeekDataset::InitializeGeneMap(), Sleipnir::CDat::Invert(), Sleipnir::CDataset::IsExample(), Sleipnir::CMeasureEuclidean::Measure(), Sleipnir::CMeasureEuclideanScaled::Measure(), Sleipnir::CMeasureHypergeometric::Measure(), Sleipnir::CMeasureInnerProduct::Measure(), Sleipnir::CMeasureBinaryInnerProduct::Measure(), Sleipnir::CMeasureMutualInformation::Measure(), Sleipnir::CMeasureRelativeAUC::Measure(), Sleipnir::CMeasurePearsonSignificance::Measure(), Sleipnir::CMeasureDice::Measure(), Sleipnir::CPCL::MedianMultiples(), Sleipnir::CStatistics::MultivariateNormalPDF(), Sleipnir::CPCL::Normalize(), Sleipnir::CDat::Open(), Sleipnir::CMeasurePearson::Pearson(), Sleipnir::CStatistics::Percentile(), Sleipnir::CPCL::populate(), Sleipnir::CDataPair::Quantize(), Quantize(), Sleipnir::CDat::Randomize(), Sleipnir::CDat::Rank(), Sleipnir::CPCL::RankTransform(), Sleipnir::CDataPair::Save(), Sleipnir::CDat::SaveDOT(), Sleipnir::CDat::SaveGDF(), Sleipnir::CPCL::SaveGene(), Sleipnir::CDat::SaveMATISSE(), Sleipnir::CDat::SaveNET(), Sleipnir::CCoalesceCluster::Subtract(), Sleipnir::CSeekCentral::VarianceWeightSearch(), and Sleipnir::CStatistics::WilcoxonRankSum().

bool Sleipnir::CMeta::MapRead ( unsigned char *&  pbData,
HANDLE &  hndlMap,
size_t &  iSize,
const char *  szFile 
) [static]

Memory map an existing file read-only in a largely platform-independent manner.

Parameters:
pbDataOutput pointer to mapped file data.
hndlMapOutput handle to file map; ignored on non-Windows platforms.
iSizeOutput size of mapped file.
szFileFile name to map.
Returns:
True if file was memory mapped successfully.
Remarks:
This has been successfully tested on Windows, Linux, and (minimally) Mac OS. It plays some mildly ugly tricks to provide a standard memory mapping interface on all three systems, but it should work; your mileage may vary. On success, pbData will be of size iSize. An opened map should be closed with Unmap.
See also:
MapWrite

Definition at line 199 of file meta.cpp.

References Unmap().

Referenced by Sleipnir::CDat::Open(), Sleipnir::CPCL::Open(), and Sleipnir::CDatasetCompactMap::Open().

bool Sleipnir::CMeta::MapWrite ( unsigned char *&  pbData,
HANDLE &  hndlMap,
size_t  iSize,
const char *  szFile 
) [static]

Create a new writeable memory mapped file in a largely platform-independent manner.

Parameters:
pbDataOutput pointer to mapped file data.
hndlMapOutput handle to file map; ignored on non-Windows platforms.
iSizeSize of desired memory map.
szFileFile name to map.
Returns:
True if file was memory mapped successfully.

This function creates a new file, sizes it to the requested number of bytes, and memory maps it writeably. Using it on an existing file will generally destroy it and overwrite it with new data.

Remarks:
This has been successfully tested on Windows, Linux, and (minimally) Mac OS. It plays some mildly ugly tricks to provide a standard memory mapping interface on all three systems, but it should work; your mileage may vary. On success, pbData will be of size iSize. An opened map should be closed with Unmap.
See also:
MapRead

Definition at line 267 of file meta.cpp.

References Unmap().

Referenced by Sleipnir::CDat::Open().

template<class tIterator >
static void Sleipnir::CMeta::Permute ( tIterator  Items,
const std::vector< size_t > &  veciOrder 
) [inline, static]

Reorder a given item list based on a target ordering.

Parameters:
ItemsIterator over items to be reordered.
veciOrderIndices at which each item should be placed.

Reorders a list of items based on a list of target indices. For example, suppose the input list is [A, B, C] and the target order is [1, 0, 2]. Then after permutation, the vector of items will contain [B, A, C]. The reordering is done without copying more than one element at a time.

Definition at line 166 of file meta.h.

Referenced by Sleipnir::CPST::GetPWM(), Permute(), and Sleipnir::CPCL::SortGenes().

template<class tType >
static void Sleipnir::CMeta::Permute ( std::vector< tType > &  vecItems,
const std::vector< size_t > &  veciOrder 
) [inline, static]

Reorder a given item list based on a target ordering.

Parameters:
vecItemsVector of items to be reordered.
veciOrderIndices at which each item should be placed.

Reorders a list of items based on a list of target indices. For example, suppose the input list is [A, B, C] and the target order is [1, 0, 2]. Then after permutation, the vector of items will contain [B, A, C]. The reordering is done without copying more than one element at a time.

Remarks:
I'm not smart enough to get this to work seamlessly with both arrays and vectors, so this is my compromise. Crazy STL and iterators...

Definition at line 205 of file meta.h.

References Permute().

template<class tType >
static size_t Sleipnir::CMeta::Quantize ( tType  Value,
const std::vector< tType > &  vecQuants 
) [inline, static]

Discretize a given continuous value based on a vector of bin edges.

Parameters:
ValueContinuous value to be discretized.
vecQuantsBin edges used to discretize the given value.
Returns:
Discretized value based on the given bin edges.

Given N bin edges, the continuous value will be discretized into the range [0, N-1] depending on the first bin edge it is less than or equal to. Note that this means that the last bin edge will be ignored. Upper bin edges are inclusive, lower bin edges are exclusive. This means that for bins [-0.1, 0.3, 0.6], the given value will be discretized into one of three outputs:

  • 0, corresponding to values less than or equal to -0.1.
  • 1, corresponding to values greater than -0.1 but less than or equal to 0.3.
  • 2, corresponding to values greater than 0.3.
See also:
CDataPair

Definition at line 234 of file meta.h.

References IsNaN().

Referenced by Sleipnir::CDataPair::Quantize(), and Sleipnir::CPCLPair::Quantize().

static bool Sleipnir::CMeta::SkipEdge ( bool  fAnswer,
size_t  i,
size_t  j,
const std::vector< bool > &  vecfHere,
const std::vector< bool > &  vecfUbik,
bool  fCtxtPos,
bool  fCtxtNeg,
bool  fBridgePos,
bool  fBridgeNeg,
bool  fOutPos,
bool  fOutNeg 
) [inline, static]

Determines whether or not an item should be skipped (based on potential ubiquitous genes, context genes, and flags).

Parameters:
fAnswerThe answer (pos/neg) as a boolean.
iThe index of the first gene
jthe index of the second gene
vecfHereA vector representing genes in the context (if any) as bool
vecfUbikA vector representing ubiquitous genes (as bool)
fCtxtPosShould within-context positives be used?
fCtxtNegShould within-context negatives be used?
fBridgePosShould bridging positives be used?
fBridgeNegShould bridging negatives be used?
fOutPosShould outside positives be used?
fOutNegShould outside negatives be used?
Returns:
True if this edge should be used given these parameters.

Definition at line 343 of file meta.h.

Referenced by Sleipnir::CStatistics::WilcoxonRankSum().

void Sleipnir::CMeta::Tokenize ( const char *  szString,
std::vector< std::string > &  vecstrTokens,
const char *  szSeparators = "\t",
bool  fNoEmpties = false 
) [static]

Tokenize a given string based on one or more delimiter characters.

Parameters:
szStringString to be tokenized.
vecstrTokensOutput vector of tokens from the given string.
szSeparatorsOne or more separator characters used to split tokens.
fNoEmptiesIf true, discard empty strings between delimiters; otherwise, include them in the output.

Definition at line 96 of file meta.cpp.

Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CSeekCentral::Initialize(), Sleipnir::COrthology::Open(), Sleipnir::CCoalesceCluster::Open(), Sleipnir::CSVM::Open(), Sleipnir::CFASTA::Open(), Sleipnir::CGenome::Open(), Sleipnir::CGenes::Open(), Sleipnir::CBayesNetMinimal::OpenCounts(), Sleipnir::CCoalesceMotifLibrary::OpenKnown(), Sleipnir::CGenes::OpenWeighted(), Sleipnir::CSeekTools::ReadListTwoColumns(), Sleipnir::CSeekTools::ReadMultiGeneOneLine(), Sleipnir::CSeekTools::ReadMultipleQueries(), and Sleipnir::CSeekTools::ReadQuantFile().

string Sleipnir::CMeta::Trim ( const char *  szString) [static]

Trim whitespace from the beginning and end of the given string.

Parameters:
szStringString from which whitespace is trimmed.
Returns:
String with whitespace removed.

Definition at line 156 of file meta.cpp.

void Sleipnir::CMeta::Unmap ( const unsigned char *  pbData,
HANDLE  hndlMap,
size_t  iSize 
) [static]

Unmap a memory map in a largely platform-independent manner.

Parameters:
pbDataPointer to mapped file data.
hndlMapHandle to file map; ignored on non-Windows platforms.
iSizeSize of memory map.
Remarks:
Should be used to close any maps opened with MapRead or MapWrite.

Definition at line 318 of file meta.cpp.

Referenced by MapRead(), MapWrite(), and Sleipnir::CDatasetCompactMap::Open().


The documentation for this class was generated from the following files: