Sleipnir
|
Clusters2Dab converts the output of many common non-hierarchical clustering algorithms into pairwise scores based on the frequency or confidence of coclustering. For example, if two genes are scored by the number of times they cocluster over many random seeds, then high scores will be indicative of a stronger pairwise relationship than low scores.
Clusters2Dab -i <clusters.txt> -o <coclusters.dab>
Create a new DAT/DAB file coclusters.dab
in which each gene pair is given a score of one if they cluster together in the hard clustering clusters.txt
and a score of zero if they do not.
Clusters2Dab -t samba -i <clusters.txt> -o <coclusters.dab>
Create a new DAT/DAB file coclusters.dab
in which each gene pair is given a score equal to the confidence of their strongest shared SAMBA bicluster in clusters.txt
if one exists or zero if it does not.
Clusters2Dab -t param -i <clusters.txt> -o <coclusters.dab>
Create a new DAT/DAB file coclusters.dab
in which each gene pair is given a score equal to the maximum parameter value in clusters.txt
at which they cocluster.
package "Clusters2Dab"
version "1.0"
purpose "Generate pairwise scores from preclustered output."
section "Main"
option "input" i "Input cluster file"
string typestr="filename"
option "output" o "Output DAT/DAB file"
string typestr="filename"
option "type" t "Type of input cluster"
values="samba","list","param","fuzzy" default="list"
section "Optional"
option "counts" c "Calculate pair weight by cocluster frequency"
flag off
option "size" z "Calculate pair weight by cluster size"
flag off
option "skip" s "Columns to skip in input PCL"
int default="2"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | Text file | Input cluster file in one of the text formats supported by -t . |
-o | stdout | DAT/DAB file | Output DAT/DAB file containing pairwise scores appropriate to the input cluster type. |
-t | list | list, samba, param, or fuzzy | Type of cluster file provided to -i . list assumes that each line contains a gene ID and a cluster index separated by a tab, samba reads biclustering output from the EXPANDER program by Sharan, Shamir, et al, param reads hard clustering output from EXPANDER, and fuzzy reads output from the Aerie fuzzy k-means program by Gasch, Eisen, et al. |
-c | off | Flag | If on, calculate pairwise scores solely by cocluster frequency (counts); otherwise, pairwise scores are weighted by cluster confidence. |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |