Sleipnir
|
Encapsulates a CDat paired with a quantization file. More...
#include <datapair.h>
Public Member Functions | |
bool | Open (const char *szDatafile, bool fContinuous, bool fMemmap=false, size_t iSkip=2, bool fZScore=false, bool fSeek=false) |
Open the given data file as a CDat and load discretization bin edges from an accompanying QUANT file. | |
bool | Open (const CSlim &Slim) |
Construct an unbinned CDat from the given ontology slim. | |
bool | Open (const CDat &dat) |
Construct a copy of the given CDat. | |
bool | OpenQuants (const char *szDatafile) |
Open only the QUANT file associated with the given data file name. | |
void | SetQuants (const float *adBinEdges, size_t iBins) |
Set the data pair's bin edges. | |
void | SetQuants (const std::vector< float > &vecdBinEdges) |
Set the data pair's bin edges. | |
std::vector< float > | GetQuants () |
size_t | Quantize (float dValue) const |
Return the discretized form of the given value using the data pair's current bin edges. | |
void | Quantize () |
size_t | Quantize (size_t iY, size_t iX, size_t iZero) const |
Return the discretized form of the given value using the data pair's current bin edges, or return the provided missing data value. | |
void | Save (const char *szFile) const |
Save a CDat to the given file, guessing the format from the file's extension. | |
unsigned char | GetValues () const |
Returns the number of discrete values taken by this data pair. | |
bool | IsContinuous () const |
Returns true if the data pair has no associated discretization information. | |
bool | Open (const CDat &DatKnown, const std::vector< CGenes * > &vecpOther, const CGenome &Genome, bool fKnownNegatives) |
Construct a data pair from the given known gene relationships and gene sets and with no discretization information. | |
bool | Open (const std::vector< std::string > &vecstrGenes, const CDistanceMatrix &MatScores) |
Construct a new data pair with the given gene names and values and with no discretization information. |
Encapsulates a CDat paired with a quantization file.
A data pair consists of a CDat (often on disk in DAB format) paired with quantization information. This information is generally stored in a QUANT file with the same name and location as the CDat. For example, a DAB file named data.dab
and a QUANT file named data.quant
might reside in the same directory; these would be loaded together as a CDataPair.
A QUANT file consists of a single line of text containing tab-delimited increasing numbers. These numbers represent bin edges for discretizing the CDat associated with the QUANT. The number of bins is equal to the number of numbers in the QUANT, meaning that the last number will be ignored. Upper bin edges are inclusive, lower bin edges are exclusive. This means that for a QUANT file containing:
-0.1 0.3 0.6
the associated CDat will be discretized into three values:
Definition at line 55 of file datapair.h.
unsigned char Sleipnir::CDataPair::GetValues | ( | ) | const [inline] |
Returns the number of discrete values taken by this data pair.
Definition at line 95 of file datapair.h.
Referenced by Sleipnir::CDatFilter::GetValues(), and Sleipnir::CDatasetCompact::Open().
bool Sleipnir::CDataPair::IsContinuous | ( | ) | const [inline] |
Returns true if the data pair has no associated discretization information.
Definition at line 109 of file datapair.h.
Referenced by Sleipnir::CDatasetCompact::Open().
bool Sleipnir::CDataPair::Open | ( | const char * | szDatafile, |
bool | fContinuous, | ||
bool | fMemmap = false , |
||
size_t | iSkip = 2 , |
||
bool | fZScore = false , |
||
bool | fSeek = false |
||
) |
Open the given data file as a CDat and load discretization bin edges from an accompanying QUANT file.
szDatafile | Filename from which CDat is loaded. |
fContinuous | If true, do not load an associated QUANT file and only open the underlying CDat. |
fMemmap | If true, memory map file rather than allocating memory and copying its contents. |
iSkip | If the given file is a PCL, the number of columns to skip between the ID and experiments. |
fZScore | If true and the given file is a PCL, z-score similarity measures after pairwise calculation. |
Definition at line 114 of file datapair.cpp.
References Sleipnir::CDat::Open(), and OpenQuants().
Referenced by Open(), Sleipnir::CDataset::Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CDataSubset::Open(), and OpenQuants().
bool Sleipnir::CDataPair::Open | ( | const CSlim & | Slim | ) |
Construct an unbinned CDat from the given ontology slim.
Slim | Set of ontology terms from which to generate a QUANT-less data pair. |
Reimplemented from Sleipnir::CDat.
Definition at line 84 of file datapair.cpp.
References Open().
bool Sleipnir::CDataPair::Open | ( | const CDat & | Dat | ) |
Construct a copy of the given CDat.
Dat | Data to be copied. |
Reimplemented from Sleipnir::CDat.
Definition at line 231 of file datapair.cpp.
References Sleipnir::CDat::Open().
bool Sleipnir::CDataPair::Open | ( | const CDat & | DatKnown, |
const std::vector< CGenes * > & | vecpOther, | ||
const CGenome & | Genome, | ||
bool | fKnownNegatives | ||
) | [inline] |
Construct a data pair from the given known gene relationships and gene sets and with no discretization information.
DatKnown | Known pairwise scores, either positive or negative as indicated. |
vecpOther | Gene sets, either positive or nonnegative as indicated (possibly empty). |
Genome | Genome containing all genes of interest. |
fKnownNegatives | If true, DatKnown contains known negative gene pairs (0 scores); if false, it contains known related gene pairs (1 scores). In the former case, positives are generated from pairs coannotated to the given gene sets; in the latter, negatives are generated from pairs not coannotated to the given gene sets. |
Definition at line 142 of file datapair.h.
References Open().
bool Sleipnir::CDataPair::Open | ( | const std::vector< std::string > & | vecstrGenes, |
const CDistanceMatrix & | MatScores | ||
) | [inline] |
Construct a new data pair with the given gene names and values and with no discretization information.
vecstrGenes | Gene names and size to associate with the data pair. |
MatScores | Values to associate with the data pair. |
Reimplemented from Sleipnir::CDat.
Definition at line 166 of file datapair.h.
References Open().
bool Sleipnir::CDataPair::OpenQuants | ( | const char * | szDatafile | ) |
Open only the QUANT file associated with the given data file name.
szDatafile | CDat filename for which the accompanying QUANT file should be loaded. |
Definition at line 251 of file datapair.cpp.
References Open().
Referenced by Open().
size_t Sleipnir::CDataPair::Quantize | ( | float | dValue | ) | const |
Return the discretized form of the given value using the data pair's current bin edges.
dValue | Continuous value to be discretized. |
Discretizes a given continuous value using the data pair's bin edges. Standard usage is:
DP.Quantize( DP.Get( i, j ) );
Definition at line 282 of file datapair.cpp.
References Sleipnir::CMeta::Quantize().
Referenced by Quantize(), and Sleipnir::CDatFilter::Quantize().
size_t Sleipnir::CDataPair::Quantize | ( | size_t | iY, |
size_t | iX, | ||
size_t | iZero | ||
) | const |
Return the discretized form of the given value using the data pair's current bin edges, or return the provided missing data value.
dValue | Continuous value to be discretized. |
Discretizes a given continuous value using the data pair's bin edges. Standard usage is:
DP.Quantize( DP.Get( i, j, 0 ) );
Definition at line 326 of file datapair.cpp.
References Sleipnir::CDat::Get(), Sleipnir::CMeta::IsNaN(), and Quantize().
void Sleipnir::CDataPair::Save | ( | const char * | szFile | ) | const |
Save a CDat to the given file, guessing the format from the file's extension.
szFile | Filename into which CDat is saved. |
Save a CDat to the given file, guessing the format (DAT, DAB, or DAS) from the extension. If null, the CDat will be saved as a DAT to standard output.
Reimplemented from Sleipnir::CDat.
Definition at line 383 of file datapair.cpp.
References Sleipnir::CDat::Get(), Sleipnir::CDat::GetGenes(), and Sleipnir::CMeta::IsNaN().
void Sleipnir::CDataPair::SetQuants | ( | const float * | adBinEdges, |
size_t | iBins | ||
) | [inline] |
Set the data pair's bin edges.
adBinEdges | Array of values corresponding to discretization bin edges (the last of which is ignored). |
iBins | Number of discretization bins. |
Reimplemented from Sleipnir::CDataPairImpl.
Definition at line 62 of file datapair.h.
Referenced by Sleipnir::CDatasetCompact::Open().
void Sleipnir::CDataPair::SetQuants | ( | const std::vector< float > & | vecdBinEdges | ) |
Set the data pair's bin edges.
vecdBinEdges | Vector of values corresponding to discretization bin edges (the last of which is ignored). |
Definition at line 377 of file datapair.cpp.