Sleipnir: Distancer

Distancer converts a microarray PCL file into a collection of pairwise similarity scores (DAT/DAB file) using one of a large variety of similarity/distance measures (Pearson correlation, Euclidean distance, etc.)

Usage

Basic Usage

 Distancer -i <data.pcl> -o <data.dab>

Output a DAT/DAB file data.dab containing pairwise similarity scores calculated from the microarray data in data.pcl. The default settings will produce z-score, z-transformed Pearson correlations; this behavior can be extensively configured using the detailed options.

Detailed Usage

package "Distancer"
version "1.0"
purpose "PCL gene-gene distance calculation tool."

section "Main"
option  "input"         i   "Input PCL file"
                            string  typestr="filename"
option  "output"        o   "Output DAT/DAB file"
                            string  typestr="filename"
option  "distance"      d   "Similarity measure"
                            values="pearson","euclidean","kendalls","kolm-smir","spearman",
                            "pearnorm","hypergeom","innerprod","bininnerprod","quickpear",
                            "mutinfo","relauc","pearsig","dice","dcor","sdcor"  default="pearnorm"

section "Miscellaneous"
option  "weights"       w   "Input weights file"
                            string  typestr="filename"
option  "autocorrelate" a   "Autocorrelate distances"
                            flag    off
option  "freqweight"    q   "Weight conditions by frequency"
                            flag    off

section "Preprocessing"
option  "normalize"     n   "Normalize distances"
                            flag    off
option  "zscore"        z   "Convert correlations to z-scores"
                            flag    on
option  "flip"          f   "Calculate one minus values"
                            flag    off
option  "centering"     c   "Scale distance value to 0-1"
                            flag    on

section "Filtering"
option  "genes"         g   "Gene inclusion file"
                            string  typestr="filename"
option  "cutoff"        e   "Remove scores below cutoff"
                            double

section "Optional"
option  "alpha"         A   "Alpha parameter for similarity measure"
                            float   default="0"
option  "skip"          s   "Columns to skip in input PCL"
                            int default="2"
option  "limit"         l   "Gene count limit for caching"
                            int default="-1"
option  "verbosity"     v   "Message verbosity"
                            int default="5"
option  "threads"       t   "Number of threads to use"
                            int default="1"

Flag	Default	Type	Description
-i	stdin	PCL text file	Input PCL file of microarray data to be scored.
-o	stdout	DAT/DAB file	Output DAT/DAB file to contain pairwise scores.
-d	pearnorm	Lots!	Similarity measure to be used for scoring. See Sleipnir::IMeasure.
-c	on	Flag	If on, scale the calculated distances to values between 0 and 1: d = ( 1 + d ) / 2. If on, scaling would be performed for the following distance measures: KendallsTau, Pearson, Spearman, PearsonSignificance, QuickPearson. For all other measures, turning this flag on/off has no effect. Users SHOULD review this flag if using one of the affected measures. If users want to print raw Pearson/Spearman values, they should turn this flag off!
-w	None	PCL text file	If given, a PCL file with dimensions equal to the data given with `-i`. However, the values in the cells of the weights PCL represent the relative weight given to each gene/experiment pair. If no weights file is given, all weights default to 1.
-a	off	Flag	If on, autocorrelate similarity scores (find the maximum similarity score over all possible lags of the two vectors; see Sleipnir::CMeasureAutocorrelate).
-n	off	Flag	If on, normalize input edges to the range [0,1] before processing.
-z	on	Flag	If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing.
-f	off	Flag	If on, output one minus the input's values.
-g	None	Text gene list	If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-e	None	Double	If given, remove all input edges below the given cutoff (after optional normalization).
-s	2	Integer	Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.
-l	-1	Integer	Maximum number of genes in input file before in-memory score caching is disabled. If -1, caching is never performed. Caching greatly speeds up processing, but can consume large amounts of memory for inputs with many genes (rows).