Sleipnir
|
Represents a probabilistic suffix tree (PST) containing zero or more overlapping strings in a weighted manner. More...
#include <pst.h>
Public Member Functions | |
CPST (size_t iArity) | |
Initializes a new PST over an alphabet of the given size. | |
void | RemoveRCs (float dPenaltyGap, float dPenaltyMismatch, CPST &PSTOut) const |
Constructs a new PST containing only a single strand of the current PST by finding the best alignment of contained sequences and their reverse complements. | |
void | Add (const std::string &strSequence, const CPST &PST, int iOffset) |
Add the given string and strings from the given PST to the current PST, with an optional character offset between the two. | |
void | Add (const std::string &strOne, const std::string &strTwo, int iOffset) |
Add the given strings to the current PST, with an optional character offset between them. | |
void | Add (const std::string &strSequence, size_t iOffset=0) |
Add the given string to the current PST, with an optional character offset relative to the PST's root. | |
void | Add (const CPST &PST, size_t iOffset=0) |
Add the strings from the given PST to the current PST, with an optional character offset relative to the current PST's root. | |
float | GetMatch (const std::string &strTarget, size_t iOffset=0) const |
Returns the match score of the PST against the given string, with an optional offset. | |
size_t | GetDepth () const |
Returns the maximum length over all strings in the PST. | |
std::string | GetMotif () const |
Returns a string representation of the PST. | |
float | Align (const std::string &strSequence, float dPenaltyGap, float dPenaltyMismatch, float dCutoff, int &iOffset) const |
Returns the best alignment score of the given string against all strings in the PST, using the requested penalties, maximum score, and offset. | |
float | Align (const CPST &PST, float dPenaltyGap, float dPenaltyMismatch, float dCutoff, int &iOffset) const |
Returns the best alignment score of strings in the given PST against strings in the current PST, using the requested penalties, maximum score, and offset. | |
bool | Open (const std::string &strPST) |
Adds the given PST encoded in string form to the current PST. | |
bool | GetPWM (CFullMatrix< uint16_t > &MatPWM, const char *szSymbols) const |
Retrieves a PWM approximately equivalent to the current PST. | |
size_t | Integrate () const |
Counts the number of possible discrete strings encoded by the PST. | |
bool | Simplify () |
Simplifies the current PST by removing all subtrees occurring at relatively low frequency. |
Represents a probabilistic suffix tree (PST) containing zero or more overlapping strings in a weighted manner.
A probabilistic suffix tree (PST) represents a set of zero or more weighted strings in an efficient manner. For example, a PST might contain the strings:
2:AACCGG 1:AACCT 1:AACT
indicating that the set contains AACCGG
with probability 0.5, AACCT
with probability 0.25, and AACT
with probability 0.25. A PST encodes such strings as an overlapping tree of characters, which is thus compact in memory and can be compared efficiently against an input string to count the matches of any string in the tree.
Sleipnir::CPST::CPST | ( | size_t | iArity | ) | [inline] |
void Sleipnir::CPST::Add | ( | const std::string & | strSequence, |
const CPST & | PST, | ||
int | iOffset | ||
) | [inline] |
Add the given string and strings from the given PST to the current PST, with an optional character offset between the two.
strSequence | Fixed string to be added to the current PST. |
PST | PST containing strings to be added to the current PST. |
iOffset | Optional character offset between the given string and PST; if positive, the PST is added first, and if negative, the fixed string. |
Definition at line 83 of file pst.h.
Referenced by Add(), and RemoveRCs().
void Sleipnir::CPST::Add | ( | const std::string & | strOne, |
const std::string & | strTwo, | ||
int | iOffset | ||
) | [inline] |
Add the given strings to the current PST, with an optional character offset between them.
strOne | First string to be added to the current PST. |
strTwo | Second string to be added to the current PST. |
iOffset | Optional character offset between the given strings; if positive, strOne is added first, and if negative, strTwo. |
Definition at line 106 of file pst.h.
References Add().
void Sleipnir::CPST::Add | ( | const std::string & | strSequence, |
size_t | iOffset = 0 |
||
) | [inline] |
Add the given string to the current PST, with an optional character offset relative to the PST's root.
strSequence | Fixed string to be added to the current PST. |
iOffset | Optional character offset between the given string; if nonzero, iOffset characters along each branch of the current PST are skipped before strSequence is inserted. |
Definition at line 127 of file pst.h.
References GetDepth().
void Sleipnir::CPST::Add | ( | const CPST & | PST, |
size_t | iOffset = 0 |
||
) | [inline] |
Add the strings from the given PST to the current PST, with an optional character offset relative to the current PST's root.
PST | PST containng strings to be added to the current PST. |
iOffset | Optional character offset applied to the given PST; if nonzero, iOffset characters along each branch of the current PST are skipped before strings from the given PST are added. |
Definition at line 145 of file pst.h.
References GetDepth().
float Sleipnir::CPST::Align | ( | const std::string & | strSequence, |
float | dPenaltyGap, | ||
float | dPenaltyMismatch, | ||
float | dCutoff, | ||
int & | iOffset | ||
) | const [inline] |
Returns the best alignment score of the given string against all strings in the PST, using the requested penalties, maximum score, and offset.
strSequence | String to be aligned with the current PST. |
dPenaltyGap | Alignment score penalty for gaps. |
dPenaltyMismatch | Alignment score penalty for mismatches. |
dCutoff | Maximum score before an alignment is considered inviable; used for optimization. |
iOffset | If nonzero, the number of characters to skip within the input string before scoring the alignment. |
Definition at line 237 of file pst.h.
References GetDepth().
Referenced by Align(), and RemoveRCs().
float Sleipnir::CPST::Align | ( | const CPST & | PST, |
float | dPenaltyGap, | ||
float | dPenaltyMismatch, | ||
float | dCutoff, | ||
int & | iOffset | ||
) | const [inline] |
Returns the best alignment score of strings in the given PST against strings in the current PST, using the requested penalties, maximum score, and offset.
PST | PST to be aligned with the current PST. |
dPenaltyGap | Alignment score penalty for gaps. |
dPenaltyMismatch | Alignment score penalty for mismatches. |
dCutoff | Maximum score before an alignment is considered inviable; used for optimization. |
iOffset | If nonzero, the number of characters to skip within the input PST before scoring the alignment. |
Definition at line 272 of file pst.h.
References Align(), and GetDepth().
size_t Sleipnir::CPST::GetDepth | ( | ) | const [inline] |
Returns the maximum length over all strings in the PST.
Definition at line 183 of file pst.h.
Referenced by Add(), Align(), Sleipnir::CCoalesceMotifLibrary::GetMatch(), GetMatch(), and GetPWM().
float Sleipnir::CPST::GetMatch | ( | const std::string & | strTarget, |
size_t | iOffset = 0 |
||
) | const [inline] |
Returns the match score of the PST against the given string, with an optional offset.
strTarget | String against which PST is matched. |
iOffset | If nonzero, the number of characters to skip within the target string before scoring the match. |
Definition at line 168 of file pst.h.
References GetDepth().
Referenced by Sleipnir::CCoalesceMotifLibrary::GetMatch().
std::string Sleipnir::CPST::GetMotif | ( | ) | const [inline] |
Returns a string representation of the PST.
A PST is represented using pipe characters (|) to indicate choices, with parentheses indicating grouping and weights indicated with preceding counts delimited by colons (:). For example, a PST containing the strings:
2:AACCGG 1:AACCT 1:AACT
would be represented by the motif:
AAC(3:C(2:GG)|(1:T))|(1:T)
bool Sleipnir::CPST::GetPWM | ( | CFullMatrix< uint16_t > & | MatPWM, |
const char * | szSymbols | ||
) | const [inline] |
Retrieves a PWM approximately equivalent to the current PST.
MatPWM | Output PWM approximating the current PST. |
szSymbols | String of symbols mapped to the indices of the current PST. |
Definition at line 313 of file pst.h.
References Sleipnir::CFullMatrix< tType >::Clear(), Sleipnir::CFullMatrix< tType >::Get(), GetDepth(), Sleipnir::CFullMatrix< tType >::GetRows(), Sleipnir::CFullMatrix< tType >::Initialize(), and Sleipnir::CMeta::Permute().
size_t Sleipnir::CPST::Integrate | ( | ) | const [inline] |
bool Sleipnir::CPST::Open | ( | const std::string & | strPST | ) | [inline] |
Adds the given PST encoded in string form to the current PST.
strPST | PST encoded in string form, or a string from the current PST's alphabet. |
Definition at line 291 of file pst.h.
Referenced by Sleipnir::CCoalesceMotifLibrary::Open().
void Sleipnir::CPST::RemoveRCs | ( | float | dPenaltyGap, |
float | dPenaltyMismatch, | ||
CPST & | PSTOut | ||
) | const |
Constructs a new PST containing only a single strand of the current PST by finding the best alignment of contained sequences and their reverse complements.
dPenaltyGap | Alignment score penalty for gaps. |
dPenaltyMismatch | Alignment score penalty for mismatches. |
PSTOut | Output PST constructed from the single stranded best alignment of strings in the current PST. |
Constructs a new PST representing a single strand of the current PST. Which strand is chosen arbitrarily; resolution occurs by tracing all possible strings stored in the current PST (i.e. paths from root to a leaf), aligning each string and its reverse complement with previous strings, and retaining only the best alignments. This is a heuristic, but it correctly preserves most PSTs that are already single-stranded and resolves most reverse complement PSTs. However, it should only be used for tasks like generating approximate PWMs from PSTs, since it can seriously diverge from the original PST in the worst cases.
Definition at line 55 of file pst.cpp.
References Add(), Align(), and Sleipnir::CCoalesceMotifLibrary::GetReverseComplement().
bool Sleipnir::CPST::Simplify | ( | ) | [inline] |