Sleipnir: Clusterer

Clusterer performs non-hierarchical clustering, k-means or quality threshhold clustering (QTC), on an input microarray dataset (PCL) using one of Sleipnir's many similarity measures.

Usage

Basic Usage

 Clusterer -i <data.pcl> -k <clusters>

Output (to standard output) a list of k-means clusters formed from the microarray PCL data data.pcl with k equal to clusters using the default similarity measure (which can be modified using -d).

 Clusterer -i <data.pcl> -a qtc -k <min_size> -m <max_diameter)

Output a list of quality threshhold clusters formed from the microarray PCL data data.pcl with a minimum cluster size of min_size and a maximum cluster diameter of max_diameter.

 Clusterer -i <data.pcl> -o <cocluster.dab> -k <min_size> -M <min_diameter> -m <max_diameter>
        -e <delta_diameter>

Create a DAT/DAB file cocluster.dab with gene pair scores indicating the minimum diameter size at which each gene pair from data.pcl coclustered, using QTC with a minimum cluster size of min_size and testing cluster diameters from min_diameter to max_diameter by steps fo delta_diameter.

Detailed Usage

package "Clusterer"
version "1.0"
purpose "QTC and other hard clustering methods"

section "Main"
option  "input"         i   "Input PCL/DAB file"
                            string  typestr="filename"
option  "algorithm"     a   "Clustering algorithm"
                            values="qtc","kmeans"   default="kmeans"
option  "weights"       w   "Input weights file"
                            string  typestr="filename"

section "Clustering"
option  "distance"      d   "Similarity measure"
                            values="pearson","euclidean","kendalls","kolm-smir","spearman","quickpear"
                            default="pearson"
option  "size"          k   "Number of clusters/minimum cluster size"
                            int default="10"
option  "diameter"      m   "Maximum cluster diameter"
                            double  default="0.5"

section "Cocluster Threshhold"
option  "output"        o   "Output DAB file"
                            string  typestr="filename"
option  "diamineter"    M   "Minimum cluster diameter"
                            double  default="0"
option  "delta"         e   "Cluster diameter step size"
                            double  default="0"

section "Optional"
option  "output_info"       O   "Output file for clustering info (membership or summary)"
                            string  typestr="filename"

option  "pcl"           p   "PCL input if precalculated DAB provided"
                            string  typestr="filename"
option  "skip"          s   "Columns to skip in input PCL"
                            int default="2"
option  "normalize"     n   "Normalize distances before clustering"
                            flag    on
option  "autocorrelate" c   "Autocorrelate similarity measures"
                            flag    off
option  "summary"   S   "Summarize cluster info"
                            flag    off         
option  "pcl_out"   P   "Output PCL and clusters as a single PCL"
                            flag    off                                             
option  "random"        r   "Seed random generator"
                            int default="0"
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
-i	stdin	PCL text file	Input PCL file of microarray data to be clustered.
-a	kmeans	kmeans or qtc	Clustering algorithm to be used.
-w	None	PCL text file	If given, a PCL file with dimensions equal to the data given with `-i`. However, the values in the cells of the weights PCL represent the relative weight given to each gene/experiment pair. If no weights file is given, all weights default to 1.
-d	pearson	pearson, euclidean, kendalls, kolm-smir, spearman, or quickpear	Similarity measure to be used for clustering. "quickpear" is a simplified Pearson correlation that cannot deal with missing values or weights not equal to 1.
-k	10	Integer	For k-means clustering, the desired number of clusters k. For QTC, the minimum cluster size (i.e. minimum number of genes in a cluster).
-m	0.5	Double	For QTC, the maximum cluster diameter. Note that this is similarity measure dependent.
-o	stdout	DAT/DAB file	For QTC cocluster threshholding, the output DAT/DAB file to contain the minimum diameter at which each gene pair coclusters.
-M	0	Double	For QTC cocluster threshholding, the smallest maximum cluster diameter to consider.
-e	0	Double	For QTC cocluster threshholding, the size of steps to take between `-M` and `-m`. Coclustering is only performed when the given value is nonzero.
-s	2	Integer	Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.
-n	off	Flag	If on, normalize input edges to the range [0,1] before processing.
-c	off	Flag	If on, autocorrelate similarity scores (find the maximum similarity score over all possible lags of the two vectors; see Sleipnir::CMeasureAutocorrelate).