Sleipnir
|
Stores a continuously valued half matrix paired with a list of names for matrix elements. More...
#include <dat.h>
Public Types | |
enum | EFilter { EFilterInclude = 0, EFilterTerm = EFilterInclude + 1, EFilterExclude = EFilterTerm + 1, EFilterPixie = EFilterExclude + 1, EFilterEdge = EFilterPixie + 1, EFilterHefalmp = EFilterEdge + 1, EFilterIncludePos = EFilterHefalmp +1, EFilterExEdge = EFilterIncludePos +1 } |
Ways in which nodes/edges can be removed to filter a CDat. More... | |
enum | EFormat { EFormatBinary = 0, EFormatText = EFormatBinary + 1, EFormatPCL = EFormatText + 1, EFormatSparse = EFormatPCL + 1, EFormatQdab = EFormatSparse + 1 } |
Ways in which a CDat can be persisted to/from disk. More... | |
enum | ENormalize { ENormalizeNone = 0, ENormalizeMinMax = ENormalizeNone + 1, ENormalizeMinMaxNPone = ENormalizeMinMax + 1, ENormalizeZScore = ENormalizeMinMaxNPone + 1, ENormalizeSigmoid = ENormalizeZScore + 1, ENormalizeNormCDF = ENormalizeSigmoid + 1, ENormalizePCC = ENormalizeNormCDF + 1 } |
Ways in which a CDat can have its edge values normalized. More... | |
Public Member Functions | |
bool | Open (const char *szFile, bool fMemmap=false, size_t iSkip=2, bool fZScore=false, bool fDuplicates=false, bool fSeek=false) |
Open a CDat stored in the given file, guessing the format from the file's extension. | |
bool | Open (std::istream &istm, EFormat eFormat=EFormatBinary, float dDefault=HUGE_VAL, bool fDuplicates=false, size_t iSkip=2, bool fZScore=false, bool fSeek=false) |
Open a CDat stored in the given stream with the given format, processing missing values and duplicates as specified. | |
bool | Open (const CSlim &Slim) |
Construct a CDat from the given ontology slim. | |
bool | Open (const CSlim &SlimPositives, const CSlim &SlimNonnegatives) |
Construct a CDat from the given ontology slims. | |
bool | Open (const std::vector< std::string > &vecstrGenes, bool fClear=true, const char *szFile=NULL) |
Construct a new CDat with the given gene names. | |
bool | Open (const std::vector< std::string > &vecstrGenes, const CDistanceMatrix &MatValues) |
Construct a new CDat with the given gene names and values. | |
bool | Open (const std::vector< CGenes * > &vecpPositives, const std::vector< CGenes * > &vecpNonnegatives, float dPValue, const CGenome &Genome, bool fIncident=false) |
Construct a CDat from the given gene sets. | |
bool | Open (const CDat &DatKnown, const std::vector< CGenes * > &vecpOther, const CGenome &Genome, bool fKnownNegatives, bool fIncident=false) |
Construct a CDat from the given known gene relationships and gene sets. | |
bool | Open (const CPCL &PCL, const IMeasure *pMeasure, bool fMeasureMemory) |
Construct a new CDat backed by the given PCL and similarity measure. | |
bool | Open (const CDat &Dat) |
Construct a copy of the given CDat. | |
bool | OpenGenes (std::istream &istm, bool fBinary, bool fPCL=false) |
Open the genes from a CDat stored in the stream with the given format. | |
bool | OpenGenes (const char *szFile, size_t iSkip=2) |
Open the genes from a CDat stored in the given file, guessing the format from the file's extension. | |
void | Save (std::ostream &ostm, EFormat eFormat=EFormatBinary) const |
Save a CDat to the given stream in the requested format. | |
void | Save (const char *szFile) const |
Save a CDat to the given file, guessing the format from the file's extension. | |
void | SaveDOT (std::ostream &ostm, float dCutoff=HUGE_VAL, const CGenome *pGenome=NULL, bool fUnlabeled=false, bool fHashes=true, const std::vector< float > *pvecdColors=NULL, const std::vector< float > *pvecdBorders=NULL) const |
Generate a DOT-formatted graph from the CDat, suitable for processing with AT&T's Graphviz software. | |
void | SaveGDF (std::ostream &ostm, float dCutoff=HUGE_VAL) const |
Generate a GDF-formatted graph from the CDat, suitable for processing with the GUESS graph exploration system. | |
void | SaveNET (std::ostream &ostm, float dCutoff=HUGE_VAL) const |
Generate a NET-formatted graph from the CDat. | |
void | SaveMATISSE (std::ostream &ostm, float dCutoff=HUGE_VAL, const CGenome *pGenome=NULL) const |
Generate a MATISSE-formatted graph from the CDat, suitable for processing with the MATISSE software. | |
void | Invert () |
Replace each finite value in the CDat with one minus that value. | |
void | Rank () |
Replace each finite value in the CDat with its integer rank. | |
bool | FilterGenes (const char *szGenes, EFilter eFilter, size_t iLimit=-1) |
Remove edges from the CDat based on the given gene file and filter type. | |
void | FilterGenes (const CGenes &Genes, EFilter eFilter, size_t iLimit=-1, float dEdgeAggressiveness=0.5, bool fAbsolute=false, const std::vector< float > *pvecdWeights=NULL) |
Remove edges from the CDat based on the given gene set and filter type. | |
void | NormalizeQuantiles (size_t iQuantiles) |
float * | GetRowSeek (const string &strGene) |
float * | GetRowSeek (const size_t &i) |
size_t | GetGeneIndex (const string &strGene) const |
void | AveStd (double &a, double &b, size_t &c) |
void | Clear (float dValue) |
bool | AddGene (const std::string &strGene) |
bool | AddGenes (const std::vector< std::string > &vecstrGenes) |
void | Normalize (ENormalize eNormalize) |
Normalize each finite value in the CDat by a specific function. | |
size_t | GetGene (const std::string &strGene) const |
Return the index of the given gene name, or -1 if it is not included in the CDat. | |
float * | GetFullRow (const size_t &iY) |
float & | Get (size_t iY, size_t iX) const |
Return the value at the requested CDat position. | |
size_t | GetGenes () const |
Returns the number of elements (genes) in the CDat. | |
const CDistanceMatrix & | Get () const |
Returns the symmetric matrix containing the CDat's values. | |
CDistanceMatrix & | Get () |
Returns the symmetric matrix containing the CDat's values. | |
bool | Set (size_t iY, size_t iX, float dValue) |
Set the value at the requested CDat position. | |
std::string | GetGene (size_t iGene) const |
Returns the gene name at the given CDat position. | |
const std::vector< std::string > & | GetGeneNames () const |
Returns the vector of gene names associated with this CDat. | |
void | Set (size_t iY, const float *adValues) |
Set an entire row of CDat values efficiently. | |
const float * | Get (size_t iY) const |
Get an entire row of CDat values efficiently. | |
float * | Get (size_t iY) |
Get an entire row of CDat values efficiently. | |
void | SetGene (size_t iGene, const std::string &strGene) |
Set the gene name at the given index. | |
void | Randomize () |
Randomizes the CDat's values by iterated swapping. |
Stores a continuously valued half matrix paired with a list of names for matrix elements.
Conceptually, a CDat stores a list of weighted pairs; this is equivalent to a weighted undirected graph with labeled nodes, or a symmetric matrix with labels for each matrix element. CDat entries are stored as continuous values, although they can be discretized in various ways. CDats can be constructed in several ways, read from disk, persisted to disk in multiple file formats, or calculated from existing gene sets, microarray data, or gold standards. In practice, a CDat is simply a continuously valued symmetric matrix (in which zero or more values may be missing) paired with a list of element names (assumed to be genes), but this data structure is sufficiently flexible to represent nearly any biological dataset.
CDats can be loaded (by Open) and/or stored (by Save) from/to disk in the following formats:
GENE1 GENE2 SCORE1 GENE1 GENE3 SCORE2 GENE2 GENE3 SCORE3
Ways in which nodes/edges can be removed to filter a CDat.
Ways in which a CDat can be persisted to/from disk.
Ways in which a CDat can have its edge values normalized.
bool Sleipnir::CDat::FilterGenes | ( | const char * | szGenes, |
EFilter | eFilter, | ||
size_t | iLimit = -1 |
||
) |
Remove edges from the CDat based on the given gene file and filter type.
szGenes | File from which gene names are loaded, one per line. |
eFilter | Way in which to use the given genes to remove edges. |
iLimit | For EFilterPixie and EFilterHefalmp, the maximum number of genes to retain. |
Remove edges and nodes (by removing all incident edges) from the CDat based on one of several algorithms. For details, see EFilter.
Definition at line 1450 of file dat.cpp.
References Sleipnir::CGenes::Open().
void Sleipnir::CDat::FilterGenes | ( | const CGenes & | Genes, |
EFilter | eFilter, | ||
size_t | iLimit = -1 , |
||
float | dEdgeAggressiveness = 0.5 , |
||
bool | fAbsolute = false , |
||
const std::vector< float > * | pvecdWeights = NULL |
||
) |
Remove edges from the CDat based on the given gene set and filter type.
Genes | Gene set used to filter the CDat. |
eFilter | Way in which to use the given genes to remove edges. |
iLimit | For EFilterPixie and EFilterHefalmp, the maximum number of genes to retain. |
dEdgeAggressiveness | For EFilterPixie and EFilterHefalmp, higher values result in more aggressive edge trimming. NaN completely skips edge trimming. |
Remove edges and nodes (by removing all incident edges) from the CDat based on one of several algorithms. For details, see EFilter.
Definition at line 1487 of file dat.cpp.
References EFilterEdge, EFilterExclude, EFilterExEdge, EFilterHefalmp, EFilterInclude, EFilterIncludePos, EFilterPixie, EFilterTerm, Get(), GetGene(), Sleipnir::CGenes::GetGene(), GetGenes(), Sleipnir::CGenes::GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CMeta::GetNaN(), and Set().
float& Sleipnir::CDat::Get | ( | size_t | iY, |
size_t | iX | ||
) | const [inline] |
Return the value at the requested CDat position.
Reimplemented from Sleipnir::CDatImpl.
Definition at line 347 of file dat.h.
References Get().
Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CDatFilter::Get(), Sleipnir::CPCL::Impute(), Sleipnir::CDataFilter::IsExample(), Open(), Sleipnir::CPCL::Open(), and Sleipnir::CStatistics::WilcoxonRankSum().
const CDistanceMatrix& Sleipnir::CDat::Get | ( | ) | const [inline] |
Returns the symmetric matrix containing the CDat's values.
Definition at line 373 of file dat.h.
Referenced by FilterGenes(), Get(), Invert(), Open(), Sleipnir::CDataPair::Quantize(), Randomize(), Rank(), Sleipnir::CDataPair::Save(), SaveDOT(), SaveGDF(), SaveMATISSE(), and SaveNET().
CDistanceMatrix& Sleipnir::CDat::Get | ( | ) | [inline] |
const float* Sleipnir::CDat::Get | ( | size_t | iY | ) | const [inline] |
Get an entire row of CDat values efficiently.
iY | CDat row. |
Definition at line 484 of file dat.h.
References Sleipnir::CHalfMatrix< tType >::Get().
float* Sleipnir::CDat::Get | ( | size_t | iY | ) | [inline] |
Get an entire row of CDat values efficiently.
iY | CDat row. |
Definition at line 505 of file dat.h.
References Sleipnir::CHalfMatrix< tType >::Get().
size_t Sleipnir::CDat::GetGene | ( | const std::string & | strGene | ) | const [inline] |
Return the index of the given gene name, or -1 if it is not included in the CDat.
strGene | Gene name to retrieve. |
Reimplemented from Sleipnir::CDatImpl.
Definition at line 318 of file dat.h.
Referenced by Sleipnir::CDataFilter::Attach(), FilterGenes(), Sleipnir::CDatFilter::GetGene(), GetGene(), Open(), Sleipnir::CDataset::Open(), Sleipnir::CDatasetCompact::Open(), SaveDOT(), SaveGDF(), SaveMATISSE(), SaveNET(), and Sleipnir::CStatistics::WilcoxonRankSum().
std::string Sleipnir::CDat::GetGene | ( | size_t | iGene | ) | const [inline] |
Returns the gene name at the given CDat position.
iGene | Index of gene name to return. |
Reimplemented from Sleipnir::CDatImpl.
Definition at line 428 of file dat.h.
References GetGene().
const std::vector<std::string>& Sleipnir::CDat::GetGeneNames | ( | ) | const [inline] |
Returns the vector of gene names associated with this CDat.
Reimplemented from Sleipnir::CDatImpl.
Definition at line 442 of file dat.h.
Referenced by Open(), and Sleipnir::CPCL::Open().
size_t Sleipnir::CDat::GetGenes | ( | ) | const [inline] |
Returns the number of elements (genes) in the CDat.
Reimplemented from Sleipnir::CDatImpl.
Definition at line 362 of file dat.h.
Referenced by Sleipnir::CPCL::Distance(), FilterGenes(), Sleipnir::CPCL::Impute(), Invert(), Open(), Sleipnir::CPCL::Open(), Sleipnir::CDataset::Open(), Sleipnir::CDatasetCompact::Open(), Randomize(), Rank(), Sleipnir::CDataPair::Save(), SaveDOT(), SaveGDF(), SaveMATISSE(), SaveNET(), and Sleipnir::CStatistics::WilcoxonRankSum().
void Sleipnir::CDat::Invert | ( | ) |
Replace each finite value in the CDat with one minus that value.
Definition at line 1418 of file dat.cpp.
References Get(), GetGenes(), Sleipnir::CMeta::IsNaN(), and Set().
void Sleipnir::CDat::Normalize | ( | ENormalize | eNormalize | ) | [inline] |
Normalize each finite value in the CDat by a specific function.
eNormalize | Method by which scores are normalized. |
Definition at line 279 of file dat.h.
References ENormalizeMinMax, ENormalizeMinMaxNPone, and ENormalizeZScore.
Referenced by Sleipnir::CPCL::Distance(), and Sleipnir::CDatasetCompact::Open().
bool Sleipnir::CDat::Open | ( | const char * | szFile, |
bool | fMemmap = false , |
||
size_t | iSkip = 2 , |
||
bool | fZScore = false , |
||
bool | fDuplicates = false , |
||
bool | fSeek = false |
||
) |
Open a CDat stored in the given file, guessing the format from the file's extension.
szFile | Filename from which CDat is loaded. |
fMemmap | If true, memory map file rather than allocating memory and copying its contents. |
iSkip | If the given file is a PCL, the number of columns to skip between the ID and experiments. |
fZScore | If true and the given file is a PCL, z-score similarity measures after pairwise calculation. |
fDuplicates | If true, allow duplicates (DAT format only), ignoring all but the last value for each gene pair. |
Opens a CDat from the given file, guessing the file type from its extension: DAT, DAB, DAS, or PCL.
Definition at line 1220 of file dat.cpp.
References EFormatBinary, EFormatPCL, EFormatText, and Sleipnir::CMeta::MapRead().
Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CPCL::Impute(), Sleipnir::CDataPair::Open(), Open(), and Sleipnir::CPCL::Open().
bool Sleipnir::CDat::Open | ( | std::istream & | istm, |
EFormat | eFormat = EFormatBinary , |
||
float | dDefault = HUGE_VAL , |
||
bool | fDuplicates = false , |
||
size_t | iSkip = 2 , |
||
bool | fZScore = false , |
||
bool | fSeek = false |
||
) |
Open a CDat stored in the given stream with the given format, processing missing values and duplicates as specified.
istm | Stream from which CDat is loaded. |
eFormat | Format in which the stream should be parsed. |
dDefault | Default value inserted for missing pairs (DAT format only). |
fDuplicates | If true, allow duplicates (DAT format only), ignoring all but the last value for each gene pair. |
iSkip | If the given stream contains a PCL, the number of columns to skip between the ID and experiments. |
fZScore | If true and the given stream contains a PCL, z-score similarity measures after pairwise calculation. |
fSeek | If true, read by seeking in the file, particularly useful if reading a few values, since there is no need to read the entire file. (for binary format only) |
Opens a CDat from the given stream with the given format.
Definition at line 492 of file dat.cpp.
References EFormatPCL, EFormatQdab, EFormatSparse, and EFormatText.
bool Sleipnir::CDat::Open | ( | const CSlim & | Slim | ) |
Construct a CDat from the given ontology slim.
Slim | Set of ontology terms from which to generate a CDat. |
Generates a CDat over all genes within the given slim. Gene pairs coannotated to at least one slim term are given a value of 1; all other gene pairs are given a value of 0. This is useful for rapidly generating gold standards from functional catalogs.
Reimplemented in Sleipnir::CDataPair.
Definition at line 129 of file dat.cpp.
References Get(), Sleipnir::CSlim::GetGenes(), Sleipnir::CSlim::GetSlim(), Sleipnir::CSlim::GetSlims(), Sleipnir::CMeta::IsNaN(), Open(), and Set().
bool Sleipnir::CDat::Open | ( | const CSlim & | SlimPositives, |
const CSlim & | SlimNonnegatives | ||
) |
Construct a CDat from the given ontology slims.
SlimPositives | Set of ontology terms from which to generate related gene pairs. |
SlimNonnegatives | Set of ontology terms from which to generate agnostic gene pairs (neither related nor unrelated). |
Generates a CDat over all genes within the given slims. Gene pairs coannotated to at least one positive slim term are given a value of 1, other gene pairs coannotated to at least one nonnegative slim term are given no value (NaN), and all other gene pairs are given a value of 0. This is useful for rapidly generating gold standards from functional catalogs.
Definition at line 176 of file dat.cpp.
References Get(), Sleipnir::CSlim::GetGenes(), Sleipnir::CSlim::GetSlim(), Sleipnir::CSlim::GetSlims(), Sleipnir::CMeta::IsNaN(), Open(), and Set().
bool Sleipnir::CDat::Open | ( | const std::vector< std::string > & | vecstrGenes, |
bool | fClear = true , |
||
const char * | szFile = NULL |
||
) |
Construct a new CDat with the given gene names.
vecstrGenes | Gene names and size to associate with the CDat. |
fClear | If true, set each value of the new CDat to missing (NaN). |
szFile | If non-null, use the given file as memory-mapped backing for the new CDat. |
Generates a CDat over the given genes (and of the same size), optionally initializing it to contain no values or backing it with a memory-mapped file rather than physical memory.
Definition at line 1095 of file dat.cpp.
References GetGene(), GetGenes(), Sleipnir::CMeta::GetNaN(), Sleipnir::CHalfMatrix< float >::GetSpace(), Sleipnir::CHalfMatrix< tType >::Initialize(), Sleipnir::CMeta::MapWrite(), and Set().
bool Sleipnir::CDat::Open | ( | const std::vector< std::string > & | vecstrGenes, |
const CDistanceMatrix & | MatScores | ||
) |
Construct a new CDat with the given gene names and values.
vecstrGenes | Gene names to associate with the CDat. |
MatScores | Values to associate with the CDat. |
Generates a CDat over the given genes (and of the same size) and associate it with the given existing pairwise values.
Reimplemented in Sleipnir::CDataPair.
Definition at line 1157 of file dat.cpp.
References Sleipnir::CHalfMatrix< tType >::GetSize(), and Sleipnir::CHalfMatrix< tType >::Initialize().
bool Sleipnir::CDat::Open | ( | const std::vector< CGenes * > & | vecpPositives, |
const std::vector< CGenes * > & | vecpNonnegatives, | ||
float | dPValue, | ||
const CGenome & | Genome, | ||
bool | fIncident = false |
||
) |
Construct a CDat from the given gene sets.
vecpPositives | Set of gene sets from which to generate related gene pairs. |
vecpNonnegatives | Set of ontology terms from which to generate agnostic gene pairs (neither related nor unrelated), possibly empty. |
dPValue | Hypergeometric p-value of overlap below which nonnegative gene set pairs are considered agnostic rather than negative. |
Genome | Genome containing all genes of interest. |
fIncident | If true, only allow negative pairs incident to at least one agnostic gene set. Otherwise, negative gene pairs can include any non-co-annotated genes. |
Generates a CDat over all genes in the given genome in which:
Definition at line 367 of file dat.cpp.
References Get(), GetGene(), Sleipnir::CGenes::GetGene(), Sleipnir::CGenome::GetGeneNames(), GetGenes(), Sleipnir::CGenes::GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CMeta::GetNaN(), Sleipnir::CStatistics::HypergeometricCDF(), Sleipnir::CGenes::IsGene(), Sleipnir::CMeta::IsNaN(), Open(), and Set().
bool Sleipnir::CDat::Open | ( | const CDat & | DatKnown, |
const std::vector< CGenes * > & | vecpOther, | ||
const CGenome & | Genome, | ||
bool | fKnownNegatives, | ||
bool | fIncident = false |
||
) |
Construct a CDat from the given known gene relationships and gene sets.
DatKnown | Known pairwise scores, either positive or negative as indicated. |
vecpOther | Gene sets, either positive or nonnegative as indicated (possibly empty). |
Genome | Genome containing all genes of interest. |
fKnownNegatives | If true, DatKnown contains known negative gene pairs (0 scores); if false, it contains known related gene pairs (1 scores). In the former case, positives are generated from pairs coannotated to the given gene sets; in the latter, negatives are generated from pairs not coannotated to the given gene sets. |
fIncident | If true, only allow negative pairs incident to at least one agnostic gene set. Otherwise, negative gene pairs can include any non-co-annotated genes. |
Constructs a CDat by copying either the positive (1) or negative (0) values from DatKnown and calculating negatives or positives from vecpOther. In either case, values are calculated for all genes in the given genome.
Definition at line 253 of file dat.cpp.
References Get(), GetGene(), Sleipnir::CGenome::GetGeneNames(), GetGenes(), Sleipnir::CMeta::GetNaN(), Sleipnir::CMeta::IsNaN(), Open(), and Set().
bool Sleipnir::CDat::Open | ( | const CPCL & | PCL, |
const IMeasure * | pMeasure, | ||
bool | fMeasureMemory | ||
) |
Construct a new CDat backed by the given PCL and similarity measure.
PCL | PCL from which CDat genes and pairwise values are drawn. |
pMeasure | Similarity measure used to calculate pairwise scores between genes. |
fMeasureMemory | If true, the CDat is responsible for freeing the given similarity measure. |
This opens a new CDat without precalculated similarity scores; the CDat retains a reference to the given PCL and similarity measure and calculates pairwise scores as needed. This can greatly reduce the amount of memory required by the CDat, but it can increase runtime when specific pairwise values are requested repeatedly.
bool Sleipnir::CDat::Open | ( | const CDat & | Dat | ) |
Construct a copy of the given CDat.
Dat | Data to be copied. |
Reimplemented in Sleipnir::CDataPair.
Definition at line 321 of file dat.cpp.
References Get(), GetGeneNames(), GetGenes(), and Open().
bool Sleipnir::CDat::OpenGenes | ( | std::istream & | istm, |
bool | fBinary, | ||
bool | fPCL = false |
||
) |
Open the genes from a CDat stored in the stream with the given format.
istm | Stream from which genes are loaded. |
fBinary | If true, assume DAB format. |
fPCL | If true, assume PCL format. |
Like Open, OpenGenes will load a CDat from the stream in the requested format, but it will load only the CDat's size and gene list from the stream. This is useful for rapidly obtaining gene lists and counts from large CDats without incurring the memory/time penalty of loading or memory mapping the whole file.
Reimplemented from Sleipnir::CDatImpl.
Definition at line 899 of file dat.cpp.
Referenced by Sleipnir::CDatasetCompact::Open(), and OpenGenes().
bool Sleipnir::CDat::OpenGenes | ( | const char * | szFile, |
size_t | iSkip = 2 |
||
) |
Open the genes from a CDat stored in the given file, guessing the format from the file's extension.
szFile | Filename from which CDat genes are loaded. |
iSkip | If the given file is a PCL, the number of columns to skip between the ID and experiments. |
Like Open, OpenGenes will guess the appropriate file format from the given filename's extension, but it will load only a CDat's size and gene list from the file. This is useful for rapidly obtaining gene lists and counts from large CDats without incurring the memory/time penalty of loading or memory mapping the whole file.
Definition at line 857 of file dat.cpp.
References EFormatPCL, EFormatText, and OpenGenes().
void Sleipnir::CDat::Rank | ( | ) |
Replace each finite value in the CDat with its integer rank.
Definition at line 1909 of file dat.cpp.
References Get(), GetGenes(), Sleipnir::CMeta::IsNaN(), and Set().
void Sleipnir::CDat::Save | ( | std::ostream & | ostm, |
EFormat | eFormat = EFormatBinary |
||
) | const |
Save a CDat to the given stream in the requested format.
Definition at line 1010 of file dat.cpp.
References EFormatSparse, and EFormatText.
Referenced by Save().
void Sleipnir::CDat::Save | ( | const char * | szFile | ) | const |
Save a CDat to the given file, guessing the format from the file's extension.
szFile | Filename into which CDat is saved. |
Save a CDat to the given file, guessing the format (DAT, DAB, or DAS) from the extension. If null, the CDat will be saved as a DAT to standard output.
Reimplemented in Sleipnir::CDataPair.
Definition at line 974 of file dat.cpp.
References EFormatPCL, EFormatText, and Save().
void Sleipnir::CDat::SaveDOT | ( | std::ostream & | ostm, |
float | dCutoff = HUGE_VAL , |
||
const CGenome * | pGenome = NULL , |
||
bool | fUnlabeled = false , |
||
bool | fHashes = true , |
||
const std::vector< float > * | pvecdColors = NULL , |
||
const std::vector< float > * | pvecdBorders = NULL |
||
) | const |
Generate a DOT-formatted graph from the CDat, suitable for processing with AT&T's Graphviz software.
ostm | Stream into which DOT is saved. |
dCutoff | If finite, edge weights below this cutoff are not included in the DOT. |
pGenome | If non-null, name synonyms from this genome are used to label gene nodes. |
fUnlabeled | If true, do not label gene nodes and use only minimal unique identifiers when generating the DOT. |
fHashes | If true, include # marks in hexadecimal color strings within the DOT. This is moronic, but some DOT parsers (*coughboostcough*) will randomly crash if # characters are included in the DOT. |
pvecdColors | If non-null, contains weights between 0 and 1 interpolating node colors between cyan and yellow. |
pvecdBorders | If non-null, contains border widths in pixels for each node. |
SaveDOT is one of the most useful methods for visualizing the weighted graph implicit in a CDat, particularly in combination with FilterGenes and EFilterPixie/EFilterHefalmp. Calling SaveDOT will generate a DOT graph file containing (optionally) each node and edge in the CDat, with edges colored by weight (scaled from green for the minimum weight to red at the maximum). Nodes can optionally be colored or given varying border widths based on external information, or labeled with different gene names (synonyms) than used internally by the CDat. DOT files can be visualized by a variety of software, notably the Graphviz package from AT&T.
Definition at line 1700 of file dat.cpp.
References Sleipnir::CColor::c_Cyan, Sleipnir::CColor::c_White, Sleipnir::CColor::c_Yellow, Sleipnir::CMeta::Filename(), Get(), GetGene(), Sleipnir::CGenome::GetGene(), GetGenes(), Sleipnir::CGene::GetName(), Sleipnir::CGene::GetSynonym(), Sleipnir::CGene::GetSynonyms(), Sleipnir::CColor::Interpolate(), Sleipnir::CMeta::IsNaN(), and Sleipnir::CColor::ToRGB().
void Sleipnir::CDat::SaveGDF | ( | std::ostream & | ostm, |
float | dCutoff = HUGE_VAL |
||
) | const |
Generate a GDF-formatted graph from the CDat, suitable for processing with the GUESS graph exploration system.
ostm | Stream into which GDF is saved. |
dCutoff | If finite, edge weights below this cutoff are not included in the GDF. |
Calling SaveGDF will generate a GDF graph file containing (optionally) each node and edge in the CDat. Nodes are labeled minimally, and edges are not given any inherent color information (although this can be added later). GDFs are visualizable using the GUESS graph exploration software, although it doesn't deal well with large or highly connected graphs.
Definition at line 1797 of file dat.cpp.
References Sleipnir::CMeta::Filename(), Get(), GetGene(), GetGenes(), and Sleipnir::CMeta::IsNaN().
void Sleipnir::CDat::SaveMATISSE | ( | std::ostream & | ostm, |
float | dCutoff = HUGE_VAL , |
||
const CGenome * | pGenome = NULL |
||
) | const |
Generate a MATISSE-formatted graph from the CDat, suitable for processing with the MATISSE software.
ostm | Stream into which MATISSE file is saved. |
dCutoff | If finite, edge weights below this cutoff are not included in the MATISSE file. |
pGenome | If non-null, name synonyms from this genome are used to label gene nodes. |
Calling SaveMATISSE will generate a MATISSE graph file containing (optionally) each node and edge in the CDat. Nodes are labeled either minimally or using the given genome's names/sysnonyms, and edges are not given any inherent color information (although this can be added later). MATISSE files are visualizable using the MATISSE software, although it doesn't deal well with large or highly connected graphs.
Definition at line 1873 of file dat.cpp.
References Get(), GetGene(), Sleipnir::CGenome::GetGene(), GetGenes(), Sleipnir::CGene::GetGloss(), Sleipnir::CGene::GetSynonym(), Sleipnir::CGene::GetSynonyms(), and Sleipnir::CMeta::IsNaN().
void Sleipnir::CDat::SaveNET | ( | std::ostream & | ostm, |
float | dCutoff = HUGE_VAL |
||
) | const |
Generate a NET-formatted graph from the CDat.
ostm | Stream into which NET is saved. |
dCutoff | If finite, edge weights below this cutoff are not included in the NET. |
Calling SaveNET will generate a NET graph file containing (optionally) each node and edge in the CDat. Nodes are labeled minimally, and edges are not given any inherent color information (although this can be added later). I honestly don't remember at all what software uses the NET format, and "net"'s impossible to Google these days (thanks, Microsoft...)
Definition at line 1836 of file dat.cpp.
References Get(), GetGene(), GetGenes(), and Sleipnir::CMeta::IsNaN().
bool Sleipnir::CDat::Set | ( | size_t | iY, |
size_t | iX, | ||
float | dValue | ||
) | [inline] |
Set the value at the requested CDat position.
Reimplemented from Sleipnir::CDatImpl.
Definition at line 411 of file dat.h.
Referenced by Sleipnir::CPCL::Distance(), FilterGenes(), Sleipnir::CPCL::Impute(), Invert(), Open(), Randomize(), and Rank().
void Sleipnir::CDat::Set | ( | size_t | iY, |
const float * | adValues | ||
) | [inline] |
Set an entire row of CDat values efficiently.
iY | CDat row. |
adValues | Values to store. |
Reimplemented from Sleipnir::CDatImpl.
Definition at line 463 of file dat.h.
References Sleipnir::CHalfMatrix< tType >::Set().
void Sleipnir::CDat::SetGene | ( | size_t | iGene, |
const std::string & | strGene | ||
) | [inline] |
Set the gene name at the given index.
iGene | Index of gene name to modify. |
strGene | Gene name to store at the requested index. |
Definition at line 525 of file dat.h.
References Sleipnir::CPCL::SetGene().