Sleipnir
Public Member Functions
Sleipnir::CDataPair Class Reference

Encapsulates a CDat paired with a quantization file. More...

#include <datapair.h>

Inheritance diagram for Sleipnir::CDataPair:
Sleipnir::CDataPairImpl Sleipnir::CPairImpl Sleipnir::CDat Sleipnir::CDatImpl

Public Member Functions

bool Open (const char *szDatafile, bool fContinuous, bool fMemmap=false, size_t iSkip=2, bool fZScore=false, bool fSeek=false)
 Open the given data file as a CDat and load discretization bin edges from an accompanying QUANT file.
bool Open (const CSlim &Slim)
 Construct an unbinned CDat from the given ontology slim.
bool Open (const CDat &dat)
 Construct a copy of the given CDat.
bool OpenQuants (const char *szDatafile)
 Open only the QUANT file associated with the given data file name.
void SetQuants (const float *adBinEdges, size_t iBins)
 Set the data pair's bin edges.
void SetQuants (const std::vector< float > &vecdBinEdges)
 Set the data pair's bin edges.
std::vector< float > GetQuants ()
size_t Quantize (float dValue) const
 Return the discretized form of the given value using the data pair's current bin edges.
void Quantize ()
size_t Quantize (size_t iY, size_t iX, size_t iZero) const
 Return the discretized form of the given value using the data pair's current bin edges, or return the provided missing data value.
void Save (const char *szFile) const
 Save a CDat to the given file, guessing the format from the file's extension.
unsigned char GetValues () const
 Returns the number of discrete values taken by this data pair.
bool IsContinuous () const
 Returns true if the data pair has no associated discretization information.
bool Open (const CDat &DatKnown, const std::vector< CGenes * > &vecpOther, const CGenome &Genome, bool fKnownNegatives)
 Construct a data pair from the given known gene relationships and gene sets and with no discretization information.
bool Open (const std::vector< std::string > &vecstrGenes, const CDistanceMatrix &MatScores)
 Construct a new data pair with the given gene names and values and with no discretization information.

Detailed Description

Encapsulates a CDat paired with a quantization file.

A data pair consists of a CDat (often on disk in DAB format) paired with quantization information. This information is generally stored in a QUANT file with the same name and location as the CDat. For example, a DAB file named data.dab and a QUANT file named data.quant might reside in the same directory; these would be loaded together as a CDataPair.

A QUANT file consists of a single line of text containing tab-delimited increasing numbers. These numbers represent bin edges for discretizing the CDat associated with the QUANT. The number of bins is equal to the number of numbers in the QUANT, meaning that the last number will be ignored. Upper bin edges are inclusive, lower bin edges are exclusive. This means that for a QUANT file containing:

 -0.1   0.3 0.6

the associated CDat will be discretized into three values:

See also:
CMeta::Quantize

Definition at line 55 of file datapair.h.


Member Function Documentation

unsigned char Sleipnir::CDataPair::GetValues ( ) const [inline]

Returns the number of discrete values taken by this data pair.

Returns:
Number of discrete values taken by this data pair.
Remarks:
Equivalent to number of bins in the data pair and number of bin edges in the QUANT file.
See also:
SetQuants | Quantize

Definition at line 95 of file datapair.h.

Referenced by Sleipnir::CDatFilter::GetValues(), and Sleipnir::CDatasetCompact::Open().

bool Sleipnir::CDataPair::IsContinuous ( ) const [inline]

Returns true if the data pair has no associated discretization information.

Returns:
True if the data pair has no associated discretization information.
Remarks:
Generally only useful with continuous Bayes nets, which themselves aren't that useful.

Definition at line 109 of file datapair.h.

Referenced by Sleipnir::CDatasetCompact::Open().

bool Sleipnir::CDataPair::Open ( const char *  szDatafile,
bool  fContinuous,
bool  fMemmap = false,
size_t  iSkip = 2,
bool  fZScore = false,
bool  fSeek = false 
)

Open the given data file as a CDat and load discretization bin edges from an accompanying QUANT file.

Parameters:
szDatafileFilename from which CDat is loaded.
fContinuousIf true, do not load an associated QUANT file and only open the underlying CDat.
fMemmapIf true, memory map file rather than allocating memory and copying its contents.
iSkipIf the given file is a PCL, the number of columns to skip between the ID and experiments.
fZScoreIf true and the given file is a PCL, z-score similarity measures after pairwise calculation.
Returns:
True if data pair was successfully opened.
See also:
CDat::Open

Definition at line 114 of file datapair.cpp.

References Sleipnir::CDat::Open(), and OpenQuants().

Referenced by Open(), Sleipnir::CDataset::Open(), Sleipnir::CDatasetCompact::Open(), Sleipnir::CDataSubset::Open(), and OpenQuants().

bool Sleipnir::CDataPair::Open ( const CSlim Slim)

Construct an unbinned CDat from the given ontology slim.

Parameters:
SlimSet of ontology terms from which to generate a QUANT-less data pair.
Returns:
True if data pair was generated successfully.
Remarks:
Quantize will behave inconsistently if the data pair is not assigned bin edges through some other means.
See also:
CDat::Open

Reimplemented from Sleipnir::CDat.

Definition at line 84 of file datapair.cpp.

References Open().

bool Sleipnir::CDataPair::Open ( const CDat Dat)

Construct a copy of the given CDat.

Parameters:
DatData to be copied.
Returns:
True if the copy was successful.

Reimplemented from Sleipnir::CDat.

Definition at line 231 of file datapair.cpp.

References Sleipnir::CDat::Open().

bool Sleipnir::CDataPair::Open ( const CDat DatKnown,
const std::vector< CGenes * > &  vecpOther,
const CGenome Genome,
bool  fKnownNegatives 
) [inline]

Construct a data pair from the given known gene relationships and gene sets and with no discretization information.

Parameters:
DatKnownKnown pairwise scores, either positive or negative as indicated.
vecpOtherGene sets, either positive or nonnegative as indicated (possibly empty).
GenomeGenome containing all genes of interest.
fKnownNegativesIf true, DatKnown contains known negative gene pairs (0 scores); if false, it contains known related gene pairs (1 scores). In the former case, positives are generated from pairs coannotated to the given gene sets; in the latter, negatives are generated from pairs not coannotated to the given gene sets.
Returns:
True if data pair was generated successfully.
Remarks:
Quantize will behave inconsistently if the data pair is not assigned bin edges through some other means.
See also:
CDat::Open

Definition at line 142 of file datapair.h.

References Open().

bool Sleipnir::CDataPair::Open ( const std::vector< std::string > &  vecstrGenes,
const CDistanceMatrix MatScores 
) [inline]

Construct a new data pair with the given gene names and values and with no discretization information.

Parameters:
vecstrGenesGene names and size to associate with the data pair.
MatScoresValues to associate with the data pair.
Returns:
True if data pair was generated successfully.
Remarks:
Quantize will behave inconsistently if the data pair is not assigned bin edges through some other means.
See also:
CDat::Open

Reimplemented from Sleipnir::CDat.

Definition at line 166 of file datapair.h.

References Open().

bool Sleipnir::CDataPair::OpenQuants ( const char *  szDatafile)

Open only the QUANT file associated with the given data file name.

Parameters:
szDatafileCDat filename for which the accompanying QUANT file should be loaded.
Returns:
True if bin edges were loaded successfully.
Remarks:
Get calls to the underlying CDat will behave inconsistently unless data is loaded through some other means; this method will only load the data pair's bin edges (which allows Quantize calls to be made).

Definition at line 251 of file datapair.cpp.

References Open().

Referenced by Open().

size_t Sleipnir::CDataPair::Quantize ( float  dValue) const

Return the discretized form of the given value using the data pair's current bin edges.

Parameters:
dValueContinuous value to be discretized.
Returns:
Discretized version of the given value, less than GetValues; -1 if the given value is not finite.

Discretizes a given continuous value using the data pair's bin edges. Standard usage is:

 DP.Quantize( DP.Get( i, j ) );
See also:
SetQuants | CMeta::Quantize

Definition at line 282 of file datapair.cpp.

References Sleipnir::CMeta::Quantize().

Referenced by Quantize(), and Sleipnir::CDatFilter::Quantize().

size_t Sleipnir::CDataPair::Quantize ( size_t  iY,
size_t  iX,
size_t  iZero 
) const

Return the discretized form of the given value using the data pair's current bin edges, or return the provided missing data value.

Parameters:
dValueContinuous value to be discretized.
Returns:
Discretized version of the given value, less than GetValues; -1 if the given value is not finite, or if either gene does not exist in dataset

Discretizes a given continuous value using the data pair's bin edges. Standard usage is:

 DP.Quantize( DP.Get( i, j, 0 ) );
See also:
SetQuants | CMeta::Quantize

Definition at line 326 of file datapair.cpp.

References Sleipnir::CDat::Get(), Sleipnir::CMeta::IsNaN(), and Quantize().

void Sleipnir::CDataPair::Save ( const char *  szFile) const

Save a CDat to the given file, guessing the format from the file's extension.

Parameters:
szFileFilename into which CDat is saved.

Save a CDat to the given file, guessing the format (DAT, DAB, or DAS) from the extension. If null, the CDat will be saved as a DAT to standard output.

Remarks:
CDats cannot be saved to PCLs, only loaded from them. If the extension is not recognized, DAB format is assumed.
See also:
Open

Reimplemented from Sleipnir::CDat.

Definition at line 383 of file datapair.cpp.

References Sleipnir::CDat::Get(), Sleipnir::CDat::GetGenes(), and Sleipnir::CMeta::IsNaN().

void Sleipnir::CDataPair::SetQuants ( const float *  adBinEdges,
size_t  iBins 
) [inline]

Set the data pair's bin edges.

Parameters:
adBinEdgesArray of values corresponding to discretization bin edges (the last of which is ignored).
iBinsNumber of discretization bins.
See also:
GetValues | Quantize

Reimplemented from Sleipnir::CDataPairImpl.

Definition at line 62 of file datapair.h.

Referenced by Sleipnir::CDatasetCompact::Open().

void Sleipnir::CDataPair::SetQuants ( const std::vector< float > &  vecdBinEdges)

Set the data pair's bin edges.

Parameters:
vecdBinEdgesVector of values corresponding to discretization bin edges (the last of which is ignored).
See also:
GetValues | Quantize

Definition at line 377 of file datapair.cpp.


The documentation for this class was generated from the following files: