Sleipnir
|
Utility class containing static quality threshold clustering methods. More...
#include <clustqtc.h>
Static Public Member Functions | |
static uint16_t | Cluster (const CDataMatrix &MatData, const IMeasure *pMeasure, float dDiameter, size_t iSize, std::vector< uint16_t > &vecsClusters, const CDataMatrix *pMatWeights=NULL) |
Cluster a set of elements with the quality threshold algorithm using the given data and pairwise similarity score. | |
static uint16_t | Cluster (const CDistanceMatrix &MatSimilarities, float dDiameter, size_t iSize, std::vector< uint16_t > &vecsClusters) |
Cluster a set of elements with the quality threshold algorithm using the given similarity scores. | |
static void | Cluster (const CDataMatrix &MatData, const IMeasure *pMeasure, float dMinDiameter, float dMaxDiameter, float dDeltaDiameter, size_t iSize, CDistanceMatrix &MatResults, const CDataMatrix *pMatWeights=NULL) |
Record the smallest cluster diameter within some range at which each gene pair clusters. | |
static void | Cluster (const CDistanceMatrix &MatSimilarities, float dMinDiameter, float dMaxDiameter, float dDeltaDiameter, size_t iSize, CDistanceMatrix &MatResults) |
Record the smallest cluster diameter within some range at which each gene pair clusters. |
Utility class containing static quality threshold clustering methods.
Definition at line 33 of file clustqtc.h.
uint16_t Sleipnir::CClustQTC::Cluster | ( | const CDataMatrix & | MatData, |
const IMeasure * | pMeasure, | ||
float | dDiameter, | ||
size_t | iSize, | ||
std::vector< uint16_t > & | vecsClusters, | ||
const CDataMatrix * | pMatWeights = NULL |
||
) | [static] |
Cluster a set of elements with the quality threshold algorithm using the given data and pairwise similarity score.
MatData | Data vectors for each element, generally microarray values from a PCL file. |
pMeasure | Similarity measure to use for clustering. |
dDiameter | Maximum cluster diameter. |
iSize | Minimum cluster size. |
vecsClusters | Output cluster IDs for each gene; unclustered genes are grouped in the last cluster. |
pMatWeights | If non-null, weights to use for each gene/condition value. These can be used to up/downweight aneuploidies present under only certain conditions, for example. Default assumes all ones. |
Clusters elements using the quality threshold algorithm due to Heyer et al. Each gene is assigned to at most one cluster. Briefly, the most similar gene pair is grouped together, and each other gene within the given diameter of that group is added to the cluster. These genes are then removed from the pool and the process is repeated. Gene groups that cannot reach the minimum cluster size are discarded.
Definition at line 68 of file clustqtc.cpp.
References Sleipnir::CFullMatrix< tType >::GetRows().
Referenced by Cluster().
uint16_t Sleipnir::CClustQTC::Cluster | ( | const CDistanceMatrix & | MatSimilarities, |
float | dDiameter, | ||
size_t | iSize, | ||
std::vector< uint16_t > & | vecsClusters | ||
) | [static] |
Cluster a set of elements with the quality threshold algorithm using the given similarity scores.
MatSimilarities | Precalculated similarity scores, generally using microarray values from a PCL file. |
dDiameter | Maximum cluster diameter. |
iSize | Minimum cluster size. |
vecsClusters | Output cluster IDs for each gene; unclustered genes are grouped in the last cluster. |
Clusters elements using the quality threshold algorithm due to Heyer et al. Each gene is assigned to at most one cluster. Briefly, the most similar gene pair is grouped together, and each other gene within the given diameter of that group is added to the cluster. These genes are then removed from the pool and the process is repeated. Gene groups that cannot reach the minimum cluster size are discarded.
Definition at line 105 of file clustqtc.cpp.
References Sleipnir::CHalfMatrix< tType >::GetSize().
void Sleipnir::CClustQTC::Cluster | ( | const CDataMatrix & | MatData, |
const IMeasure * | pMeasure, | ||
float | dMinDiameter, | ||
float | dMaxDiameter, | ||
float | dDeltaDiameter, | ||
size_t | iSize, | ||
CDistanceMatrix & | MatResults, | ||
const CDataMatrix * | pMatWeights = NULL |
||
) | [static] |
Record the smallest cluster diameter within some range at which each gene pair clusters.
MatData | Data vectors for each element, generally microarray values from a PCL file. |
pMeasure | Similarity measure to use for clustering. |
dMinDiameter | Minimum cluster diameter at which to attempt clustering. |
dMaxDiameter | Maximum cluster diameter at which to attempt clustering. |
dDeltaDiameter | Increment of cluster diameters to scan between minimum and maximum. |
iSize | Minimum cluster size. |
MatResults | Output matrix recording the smallest diameter at which each gene pair coclustered, or NaN if the pair did not cocluster within the given diameter range. |
pMatWeights | If non-null, weights to use for each gene/condition value. These can be used to up/downweight aneuploidies present under only certain conditions, for example. Default assumes all ones. |
This clustering method incrementally attempts to quality threshold cluster the given elements at each cluster diameter between the given minimum and maximum, in steps of the requested delta. For each gene pair, the smallest diameter at which they coclustered (appeared in some cluster together) is recorded. This can be used to rapidly scan through a range of cluster "sizes" to find the strictest diameter cutoff at which gene pairs cocluster.
Definition at line 150 of file clustqtc.cpp.
References Cluster().
void Sleipnir::CClustQTC::Cluster | ( | const CDistanceMatrix & | MatSimilarities, |
float | dMinDiameter, | ||
float | dMaxDiameter, | ||
float | dDeltaDiameter, | ||
size_t | iSize, | ||
CDistanceMatrix & | MatResults | ||
) | [static] |
Record the smallest cluster diameter within some range at which each gene pair clusters.
MatSimilarities | Precalculated similarity scores for every pair of entities, generally genes from a microarray PCL. |
dMinDiameter | Minimum cluster diameter at which to attempt clustering. |
dMaxDiameter | Maximum cluster diameter at which to attempt clustering. |
dDeltaDiameter | Increment of cluster diameters to scan between minimum and maximum. |
iSize | Minimum cluster size. |
MatResults | Output matrix recording the smallest diameter at which each gene pair coclustered, or NaN if the pair did not cocluster within the given diameter range. |
This clustering method incrementally attempts to quality threshold cluster the given elements at each cluster diameter between the given minimum and maximum, in steps of the requested delta. For each gene pair, the smallest diameter at which they coclustered (appeared in some cluster together) is recorded. This can be used to rapidly scan through a range of cluster "sizes" to find the strictest diameter cutoff at which gene pairs cocluster.
Definition at line 190 of file clustqtc.cpp.
References Sleipnir::CHalfMatrix< tType >::Get(), Sleipnir::CHalfMatrix< tType >::GetSize(), Sleipnir::CMeta::IsNaN(), and Sleipnir::CHalfMatrix< tType >::Set().