Sleipnir
Static Public Member Functions
Sleipnir::CClustQTC Class Reference

Utility class containing static quality threshold clustering methods. More...

#include <clustqtc.h>

Inheritance diagram for Sleipnir::CClustQTC:
Sleipnir::CClustQTCImpl

Static Public Member Functions

static uint16_t Cluster (const CDataMatrix &MatData, const IMeasure *pMeasure, float dDiameter, size_t iSize, std::vector< uint16_t > &vecsClusters, const CDataMatrix *pMatWeights=NULL)
 Cluster a set of elements with the quality threshold algorithm using the given data and pairwise similarity score.
static uint16_t Cluster (const CDistanceMatrix &MatSimilarities, float dDiameter, size_t iSize, std::vector< uint16_t > &vecsClusters)
 Cluster a set of elements with the quality threshold algorithm using the given similarity scores.
static void Cluster (const CDataMatrix &MatData, const IMeasure *pMeasure, float dMinDiameter, float dMaxDiameter, float dDeltaDiameter, size_t iSize, CDistanceMatrix &MatResults, const CDataMatrix *pMatWeights=NULL)
 Record the smallest cluster diameter within some range at which each gene pair clusters.
static void Cluster (const CDistanceMatrix &MatSimilarities, float dMinDiameter, float dMaxDiameter, float dDeltaDiameter, size_t iSize, CDistanceMatrix &MatResults)
 Record the smallest cluster diameter within some range at which each gene pair clusters.

Detailed Description

Utility class containing static quality threshold clustering methods.

Definition at line 33 of file clustqtc.h.


Member Function Documentation

uint16_t Sleipnir::CClustQTC::Cluster ( const CDataMatrix MatData,
const IMeasure pMeasure,
float  dDiameter,
size_t  iSize,
std::vector< uint16_t > &  vecsClusters,
const CDataMatrix pMatWeights = NULL 
) [static]

Cluster a set of elements with the quality threshold algorithm using the given data and pairwise similarity score.

Parameters:
MatDataData vectors for each element, generally microarray values from a PCL file.
pMeasureSimilarity measure to use for clustering.
dDiameterMaximum cluster diameter.
iSizeMinimum cluster size.
vecsClustersOutput cluster IDs for each gene; unclustered genes are grouped in the last cluster.
pMatWeightsIf non-null, weights to use for each gene/condition value. These can be used to up/downweight aneuploidies present under only certain conditions, for example. Default assumes all ones.
Returns:
Total number of clusters.

Clusters elements using the quality threshold algorithm due to Heyer et al. Each gene is assigned to at most one cluster. Briefly, the most similar gene pair is grouped together, and each other gene within the given diameter of that group is added to the cluster. These genes are then removed from the pool and the process is repeated. Gene groups that cannot reach the minimum cluster size are discarded.

Remarks:
If N clusters are generated, unclustered genes will be assigned ID N in the output vector.
See also:
CClustKMeans::Cluster

Definition at line 68 of file clustqtc.cpp.

References Sleipnir::CFullMatrix< tType >::GetRows().

Referenced by Cluster().

uint16_t Sleipnir::CClustQTC::Cluster ( const CDistanceMatrix MatSimilarities,
float  dDiameter,
size_t  iSize,
std::vector< uint16_t > &  vecsClusters 
) [static]

Cluster a set of elements with the quality threshold algorithm using the given similarity scores.

Parameters:
MatSimilaritiesPrecalculated similarity scores, generally using microarray values from a PCL file.
dDiameterMaximum cluster diameter.
iSizeMinimum cluster size.
vecsClustersOutput cluster IDs for each gene; unclustered genes are grouped in the last cluster.
Returns:
Total number of clusters.

Clusters elements using the quality threshold algorithm due to Heyer et al. Each gene is assigned to at most one cluster. Briefly, the most similar gene pair is grouped together, and each other gene within the given diameter of that group is added to the cluster. These genes are then removed from the pool and the process is repeated. Gene groups that cannot reach the minimum cluster size are discarded.

Remarks:
If N clusters are generated, unclustered genes will be assigned ID N in the output vector.
See also:
CClustKMeans::Cluster

Definition at line 105 of file clustqtc.cpp.

References Sleipnir::CHalfMatrix< tType >::GetSize().

void Sleipnir::CClustQTC::Cluster ( const CDataMatrix MatData,
const IMeasure pMeasure,
float  dMinDiameter,
float  dMaxDiameter,
float  dDeltaDiameter,
size_t  iSize,
CDistanceMatrix MatResults,
const CDataMatrix pMatWeights = NULL 
) [static]

Record the smallest cluster diameter within some range at which each gene pair clusters.

Parameters:
MatDataData vectors for each element, generally microarray values from a PCL file.
pMeasureSimilarity measure to use for clustering.
dMinDiameterMinimum cluster diameter at which to attempt clustering.
dMaxDiameterMaximum cluster diameter at which to attempt clustering.
dDeltaDiameterIncrement of cluster diameters to scan between minimum and maximum.
iSizeMinimum cluster size.
MatResultsOutput matrix recording the smallest diameter at which each gene pair coclustered, or NaN if the pair did not cocluster within the given diameter range.
pMatWeightsIf non-null, weights to use for each gene/condition value. These can be used to up/downweight aneuploidies present under only certain conditions, for example. Default assumes all ones.

This clustering method incrementally attempts to quality threshold cluster the given elements at each cluster diameter between the given minimum and maximum, in steps of the requested delta. For each gene pair, the smallest diameter at which they coclustered (appeared in some cluster together) is recorded. This can be used to rapidly scan through a range of cluster "sizes" to find the strictest diameter cutoff at which gene pairs cocluster.

Remarks:
MatResults must be pre-initialized to the same size as MatData.

Definition at line 150 of file clustqtc.cpp.

References Cluster().

void Sleipnir::CClustQTC::Cluster ( const CDistanceMatrix MatSimilarities,
float  dMinDiameter,
float  dMaxDiameter,
float  dDeltaDiameter,
size_t  iSize,
CDistanceMatrix MatResults 
) [static]

Record the smallest cluster diameter within some range at which each gene pair clusters.

Parameters:
MatSimilaritiesPrecalculated similarity scores for every pair of entities, generally genes from a microarray PCL.
dMinDiameterMinimum cluster diameter at which to attempt clustering.
dMaxDiameterMaximum cluster diameter at which to attempt clustering.
dDeltaDiameterIncrement of cluster diameters to scan between minimum and maximum.
iSizeMinimum cluster size.
MatResultsOutput matrix recording the smallest diameter at which each gene pair coclustered, or NaN if the pair did not cocluster within the given diameter range.

This clustering method incrementally attempts to quality threshold cluster the given elements at each cluster diameter between the given minimum and maximum, in steps of the requested delta. For each gene pair, the smallest diameter at which they coclustered (appeared in some cluster together) is recorded. This can be used to rapidly scan through a range of cluster "sizes" to find the strictest diameter cutoff at which gene pairs cocluster.

Remarks:
MatResults must be pre-initialized to the same size as MatSimilarities.

Definition at line 190 of file clustqtc.cpp.

References Sleipnir::CHalfMatrix< tType >::Get(), Sleipnir::CHalfMatrix< tType >::GetSize(), Sleipnir::CMeta::IsNaN(), and Sleipnir::CHalfMatrix< tType >::Set().


The documentation for this class was generated from the following files: