|
Sleipnir
|
Clusterer performs non-hierarchical clustering, k-means or quality threshhold clustering (QTC), on an input microarray dataset (PCL) using one of Sleipnir's many similarity measures.
Clusterer -i <data.pcl> -k <clusters>
Output (to standard output) a list of k-means clusters formed from the microarray PCL data data.pcl with k equal to clusters using the default similarity measure (which can be modified using -d).
Clusterer -i <data.pcl> -a qtc -k <min_size> -m <max_diameter)
Output a list of quality threshhold clusters formed from the microarray PCL data data.pcl with a minimum cluster size of min_size and a maximum cluster diameter of max_diameter.
Clusterer -i <data.pcl> -o <cocluster.dab> -k <min_size> -M <min_diameter> -m <max_diameter>
-e <delta_diameter>
Create a DAT/DAB file cocluster.dab with gene pair scores indicating the minimum diameter size at which each gene pair from data.pcl coclustered, using QTC with a minimum cluster size of min_size and testing cluster diameters from min_diameter to max_diameter by steps fo delta_diameter.
package "Clusterer"
version "1.0"
purpose "QTC and other hard clustering methods"
section "Main"
option "input" i "Input PCL/DAB file"
string typestr="filename"
option "algorithm" a "Clustering algorithm"
values="qtc","kmeans" default="kmeans"
option "weights" w "Input weights file"
string typestr="filename"
section "Clustering"
option "distance" d "Similarity measure"
values="pearson","euclidean","kendalls","kolm-smir","spearman","quickpear"
default="pearson"
option "size" k "Number of clusters/minimum cluster size"
int default="10"
option "diameter" m "Maximum cluster diameter"
double default="0.5"
section "Cocluster Threshhold"
option "output" o "Output DAB file"
string typestr="filename"
option "diamineter" M "Minimum cluster diameter"
double default="0"
option "delta" e "Cluster diameter step size"
double default="0"
section "Optional"
option "output_info" O "Output file for clustering info (membership or summary)"
string typestr="filename"
option "pcl" p "PCL input if precalculated DAB provided"
string typestr="filename"
option "skip" s "Columns to skip in input PCL"
int default="2"
option "normalize" n "Normalize distances before clustering"
flag on
option "autocorrelate" c "Autocorrelate similarity measures"
flag off
option "summary" S "Summarize cluster info"
flag off
option "pcl_out" P "Output PCL and clusters as a single PCL"
flag off
option "random" r "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
| Flag | Default | Type | Description |
|---|---|---|---|
| -i | stdin | PCL text file | Input PCL file of microarray data to be clustered. |
| -a | kmeans | kmeans or qtc | Clustering algorithm to be used. |
| -w | None | PCL text file | If given, a PCL file with dimensions equal to the data given with -i. However, the values in the cells of the weights PCL represent the relative weight given to each gene/experiment pair. If no weights file is given, all weights default to 1. |
| -d | pearson | pearson, euclidean, kendalls, kolm-smir, spearman, or quickpear | Similarity measure to be used for clustering. "quickpear" is a simplified Pearson correlation that cannot deal with missing values or weights not equal to 1. |
| -k | 10 | Integer | For k-means clustering, the desired number of clusters k. For QTC, the minimum cluster size (i.e. minimum number of genes in a cluster). |
| -m | 0.5 | Double | For QTC, the maximum cluster diameter. Note that this is similarity measure dependent. |
| -o | stdout | DAT/DAB file | For QTC cocluster threshholding, the output DAT/DAB file to contain the minimum diameter at which each gene pair coclusters. |
| -M | 0 | Double | For QTC cocluster threshholding, the smallest maximum cluster diameter to consider. |
| -e | 0 | Double | For QTC cocluster threshholding, the size of steps to take between -M and -m. Coclustering is only performed when the given value is nonzero. |
| -s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
| -n | off | Flag | If on, normalize input edges to the range [0,1] before processing. |
| -c | off | Flag | If on, autocorrelate similarity scores (find the maximum similarity score over all possible lags of the two vectors; see Sleipnir::CMeasureAutocorrelate). |
1.7.6.1