Sleipnir
|
Clusterer performs non-hierarchical clustering, k-means or quality threshhold clustering (QTC), on an input microarray dataset (PCL) using one of Sleipnir's many similarity measures.
Clusterer -i <data.pcl> -k <clusters>
Output (to standard output) a list of k-means clusters formed from the microarray PCL data data.pcl
with k equal to clusters
using the default similarity measure (which can be modified using -d
).
Clusterer -i <data.pcl> -a qtc -k <min_size> -m <max_diameter)
Output a list of quality threshhold clusters formed from the microarray PCL data data.pcl
with a minimum cluster size of min_size
and a maximum cluster diameter of max_diameter
.
Clusterer -i <data.pcl> -o <cocluster.dab> -k <min_size> -M <min_diameter> -m <max_diameter> -e <delta_diameter>
Create a DAT/DAB file cocluster.dab
with gene pair scores indicating the minimum diameter size at which each gene pair from data.pcl
coclustered, using QTC with a minimum cluster size of min_size
and testing cluster diameters from min_diameter
to max_diameter
by steps fo delta_diameter
.
package "Clusterer"
version "1.0"
purpose "QTC and other hard clustering methods"
section "Main"
option "input" i "Input PCL/DAB file"
string typestr="filename"
option "algorithm" a "Clustering algorithm"
values="qtc","kmeans" default="kmeans"
option "weights" w "Input weights file"
string typestr="filename"
section "Clustering"
option "distance" d "Similarity measure"
values="pearson","euclidean","kendalls","kolm-smir","spearman","quickpear"
default="pearson"
option "size" k "Number of clusters/minimum cluster size"
int default="10"
option "diameter" m "Maximum cluster diameter"
double default="0.5"
section "Cocluster Threshhold"
option "output" o "Output DAB file"
string typestr="filename"
option "diamineter" M "Minimum cluster diameter"
double default="0"
option "delta" e "Cluster diameter step size"
double default="0"
section "Optional"
option "output_info" O "Output file for clustering info (membership or summary)"
string typestr="filename"
option "pcl" p "PCL input if precalculated DAB provided"
string typestr="filename"
option "skip" s "Columns to skip in input PCL"
int default="2"
option "normalize" n "Normalize distances before clustering"
flag on
option "autocorrelate" c "Autocorrelate similarity measures"
flag off
option "summary" S "Summarize cluster info"
flag off
option "pcl_out" P "Output PCL and clusters as a single PCL"
flag off
option "random" r "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | PCL text file | Input PCL file of microarray data to be clustered. |
-a | kmeans | kmeans or qtc | Clustering algorithm to be used. |
-w | None | PCL text file | If given, a PCL file with dimensions equal to the data given with -i . However, the values in the cells of the weights PCL represent the relative weight given to each gene/experiment pair. If no weights file is given, all weights default to 1. |
-d | pearson | pearson, euclidean, kendalls, kolm-smir, spearman, or quickpear | Similarity measure to be used for clustering. "quickpear" is a simplified Pearson correlation that cannot deal with missing values or weights not equal to 1. |
-k | 10 | Integer | For k-means clustering, the desired number of clusters k. For QTC, the minimum cluster size (i.e. minimum number of genes in a cluster). |
-m | 0.5 | Double | For QTC, the maximum cluster diameter. Note that this is similarity measure dependent. |
-o | stdout | DAT/DAB file | For QTC cocluster threshholding, the output DAT/DAB file to contain the minimum diameter at which each gene pair coclusters. |
-M | 0 | Double | For QTC cocluster threshholding, the smallest maximum cluster diameter to consider. |
-e | 0 | Double | For QTC cocluster threshholding, the size of steps to take between -M and -m . Coclustering is only performed when the given value is nonzero. |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
-n | off | Flag | If on, normalize input edges to the range [0,1] before processing. |
-c | off | Flag | If on, autocorrelate similarity scores (find the maximum similarity score over all possible lags of the two vectors; see Sleipnir::CMeasureAutocorrelate). |