Sleipnir: Clusters2Dab

Clusters2Dab converts the output of many common non-hierarchical clustering algorithms into pairwise scores based on the frequency or confidence of coclustering. For example, if two genes are scored by the number of times they cocluster over many random seeds, then high scores will be indicative of a stronger pairwise relationship than low scores.

Usage

Basic Usage

 Clusters2Dab -i <clusters.txt> -o <coclusters.dab>

Create a new DAT/DAB file coclusters.dab in which each gene pair is given a score of one if they cluster together in the hard clustering clusters.txt and a score of zero if they do not.

 Clusters2Dab -t samba -i <clusters.txt> -o <coclusters.dab>

Create a new DAT/DAB file coclusters.dab in which each gene pair is given a score equal to the confidence of their strongest shared SAMBA bicluster in clusters.txt if one exists or zero if it does not.

 Clusters2Dab -t param -i <clusters.txt> -o <coclusters.dab>

Create a new DAT/DAB file coclusters.dab in which each gene pair is given a score equal to the maximum parameter value in clusters.txt at which they cocluster.

Detailed Usage

package "Clusters2Dab"
version "1.0"
purpose "Generate pairwise scores from preclustered output."

section "Main"
option  "input"     i   "Input cluster file"
                        string  typestr="filename"
option  "output"    o   "Output DAT/DAB file"
                        string  typestr="filename"
option  "type"      t   "Type of input cluster"
                        values="samba","list","param","fuzzy"   default="list"

section "Optional"
option  "counts"    c   "Calculate pair weight by cocluster frequency"
                        flag    off
option  "size"      z   "Calculate pair weight by cluster size"
                        flag    off
option  "skip"      s   "Columns to skip in input PCL"
                        int default="2"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag	Default	Type	Description
-i	stdin	Text file	Input cluster file in one of the text formats supported by `-t`.
-o	stdout	DAT/DAB file	Output DAT/DAB file containing pairwise scores appropriate to the input cluster type.
-t	list	list, samba, param, or fuzzy	Type of cluster file provided to `-i`. `list` assumes that each line contains a gene ID and a cluster index separated by a tab, `samba` reads biclustering output from the EXPANDER program by Sharan, Shamir, et al, `param` reads hard clustering output from EXPANDER, and `fuzzy` reads output from the Aerie fuzzy k-means program by Gasch, Eisen, et al.
-c	off	Flag	If on, calculate pairwise scores solely by cocluster frequency (counts); otherwise, pairwise scores are weighted by cluster confidence.
-s	2	Integer	Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.