Sleipnir
Public Types | Public Member Functions
Sleipnir::CTrie< tType > Class Template Reference

A simple prefix tree implementation. More...

#include <trie.h>

Inheritance diagram for Sleipnir::CTrie< tType >:
Sleipnir::CTrieImpl< tType >

Public Types

typedef CTrieIterator< tType > iterator
 Iterator type for trie traversal.

Public Member Functions

 CTrie (const std::vector< unsigned char > &vecbSizes, const tType &Default)
 Construct a new trie with the specified branching factors and default value.
 CTrie (const IDataset *pData, const tType &Default, bool fAnswers=true)
 Construct a new trie encoding the data from the given dataset.
tType & Set (const std::vector< unsigned char > &vecbData)
 Return a writable reference to the given key's value.
const tType Get (const std::vector< unsigned char > &vecbData) const
 Retrieve the value at the given key, or a default if none exists.
size_t GetSize () const
 Return maximum depth of the trie.
unsigned char GetSize (size_t iDepth) const
 Return branching factor of the requested trie level.

Detailed Description

template<class tType>
class Sleipnir::CTrie< tType >

A simple prefix tree implementation.

Parameters:
tTypeType of element contained by the trie.

A trie, or prefix tree, is an n-ary tree which acts as a key to value map. Keys consist of a sequence of elements (usually characters making up a string) stored efficiently in overlapping tree nodes. This trie implementation can store arbitrary objects as values and use arbitrary byte strings as keys. At the simplest, think of a trie like a hash or map dictionary that might save in memory usage and lookup time for highly structured keys.

Remarks:
Lookup time is linear in the length of the key, and memory usage is linear in the values and expected logarithmic in the keys (worst case linear). However, due to the intended usage of this trie implementation, it is optimized for densely overlapping keys; the memory usage constant will be large for sparse tries.
See also:
CTrieIterator

Definition at line 54 of file trie.h.


Constructor & Destructor Documentation

template<class tType>
Sleipnir::CTrie< tType >::CTrie ( const std::vector< unsigned char > &  vecbSizes,
const tType &  Default 
) [inline]

Construct a new trie with the specified branching factors and default value.

Parameters:
vecbSizesBranching factor (maximum number of different values) at each depth within the trie (i.e. at each position within the keys).
DefaultDefault value to be returned when trie lookup fails for a given key.
Remarks:
The given sizes indicate the maximum value at each key position; for example, if a trie is initialized with sizes [2, 1, 3], then keys such as [0, 0, 0], [1, 0, 2], or [0, 0, 1] are all valid, but [2, 1, 3] or [2, 0, 0] are not. This is equivalent to the maximum branching factor at each level of the trie, and it also dictates the maximum depth of the trie (i.e. length of a key).

Definition at line 79 of file trie.h.

template<class tType>
Sleipnir::CTrie< tType >::CTrie ( const IDataset pData,
const tType &  Default,
bool  fAnswers = true 
) [inline]

Construct a new trie encoding the data from the given dataset.

Parameters:
pDataDataset to be encoded as a trie.
DefaultDefault value to be returned when trie lookup fails for a given key.
fAnswersIf true, the first (0th) data file in the dataset represents a gold standard.

Constructs a trie encoding the given dataset. Each key in the trie consists of gene pair values from each data file within the dataset, and the corresponding trie value is a count of how many times that combination of values occurs in the dataset. If fAnswers is true, only gene pairs with a value present in the first (0th) data file are included in the trie. This can provide a fairly compact way to store a highly redundant dataset, but it becomes inefficient rapidly if the dataset is sparse.

For example, suppose a dataset contains an answer file and two data files and spans three genes A, B, and C. The answer file is binary (contains only 0, 1, and missing values), the first data file is binary, and the second data file can take three values. The values present for each gene pair are:

 A  B   -1  0   -1
 A  C   0   0   0
 B  C   1   -1  2

Definition at line 113 of file trie.h.

References Sleipnir::IDataset::GetBins(), Sleipnir::IDataset::GetDiscrete(), Sleipnir::IDataset::GetExperiments(), Sleipnir::IDataset::GetGenes(), Sleipnir::CTrie< tType >::GetSize(), Sleipnir::IDataset::IsExample(), Sleipnir::IDataset::IsHidden(), and Sleipnir::CTrie< tType >::Set().


Member Function Documentation

template<class tType>
const tType Sleipnir::CTrie< tType >::Get ( const std::vector< unsigned char > &  vecbData) const [inline]

Retrieve the value at the given key, or a default if none exists.

Parameters:
vecbDataKey whose value should be retrieved.
Returns:
The value corresponding to the given key, or the trie's default value if none exists.
Remarks:
For efficiency, no bounds checking is performed. The given key's values should all be smaller than the trie's branching factors at the appropriate levels.
See also:
Set

Definition at line 190 of file trie.h.

template<class tType>
size_t Sleipnir::CTrie< tType >::GetSize ( ) const [inline]

Return maximum depth of the trie.

Returns:
Maximum depth of the trie.

Definition at line 208 of file trie.h.

Referenced by Sleipnir::CTrie< tType >::CTrie().

template<class tType>
unsigned char Sleipnir::CTrie< tType >::GetSize ( size_t  iDepth) const [inline]

Return branching factor of the requested trie level.

Parameters:
iDepthTrie level for which branching factor should be returned.
Returns:
Maximum branching factor (key value) for the requested depth.
Remarks:
For efficiency, no bounds checking is performed. The given depth must be smaller than GetSize.

Definition at line 225 of file trie.h.

template<class tType>
tType& Sleipnir::CTrie< tType >::Set ( const std::vector< unsigned char > &  vecbData) [inline]

Return a writable reference to the given key's value.

Parameters:
vecbDataKey to which value should be written.
Returns:
Writable reference to the given key's trie value.
Remarks:
For efficiency, no bounds checking is performed. The given key's values should all be smaller than the trie's branching factors at the appropriate levels.
See also:
Get

Definition at line 156 of file trie.h.

Referenced by Sleipnir::CTrie< tType >::CTrie().


The documentation for this class was generated from the following file: