Sleipnir
|
Distancer converts a microarray PCL file into a collection of pairwise similarity scores (DAT/DAB file) using one of a large variety of similarity/distance measures (Pearson correlation, Euclidean distance, etc.)
Distancer -i <data.pcl> -o <data.dab>
Output a DAT/DAB file data.dab
containing pairwise similarity scores calculated from the microarray data in data.pcl
. The default settings will produce z-score, z-transformed Pearson correlations; this behavior can be extensively configured using the detailed options.
package "Distancer"
version "1.0"
purpose "PCL gene-gene distance calculation tool."
section "Main"
option "input" i "Input PCL file"
string typestr="filename"
option "output" o "Output DAT/DAB file"
string typestr="filename"
option "distance" d "Similarity measure"
values="pearson","euclidean","kendalls","kolm-smir","spearman",
"pearnorm","hypergeom","innerprod","bininnerprod","quickpear",
"mutinfo","relauc","pearsig","dice","dcor","sdcor" default="pearnorm"
section "Miscellaneous"
option "weights" w "Input weights file"
string typestr="filename"
option "autocorrelate" a "Autocorrelate distances"
flag off
option "freqweight" q "Weight conditions by frequency"
flag off
section "Preprocessing"
option "normalize" n "Normalize distances"
flag off
option "zscore" z "Convert correlations to z-scores"
flag on
option "flip" f "Calculate one minus values"
flag off
option "centering" c "Scale distance value to 0-1"
flag on
section "Filtering"
option "genes" g "Gene inclusion file"
string typestr="filename"
option "cutoff" e "Remove scores below cutoff"
double
section "Optional"
option "alpha" A "Alpha parameter for similarity measure"
float default="0"
option "skip" s "Columns to skip in input PCL"
int default="2"
option "limit" l "Gene count limit for caching"
int default="-1"
option "verbosity" v "Message verbosity"
int default="5"
option "threads" t "Number of threads to use"
int default="1"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | PCL text file | Input PCL file of microarray data to be scored. |
-o | stdout | DAT/DAB file | Output DAT/DAB file to contain pairwise scores. |
-d | pearnorm | Lots! | Similarity measure to be used for scoring. See Sleipnir::IMeasure. |
-c | on | Flag | If on, scale the calculated distances to values between 0 and 1: d = ( 1 + d ) / 2. If on, scaling would be performed for the following distance measures: KendallsTau, Pearson, Spearman, PearsonSignificance, QuickPearson. For all other measures, turning this flag on/off has no effect. Users SHOULD review this flag if using one of the affected measures. If users want to print raw Pearson/Spearman values, they should turn this flag off! |
-w | None | PCL text file | If given, a PCL file with dimensions equal to the data given with -i . However, the values in the cells of the weights PCL represent the relative weight given to each gene/experiment pair. If no weights file is given, all weights default to 1. |
-a | off | Flag | If on, autocorrelate similarity scores (find the maximum similarity score over all possible lags of the two vectors; see Sleipnir::CMeasureAutocorrelate). |
-n | off | Flag | If on, normalize input edges to the range [0,1] before processing. |
-z | on | Flag | If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing. |
-f | off | Flag | If on, output one minus the input's values. |
-g | None | Text gene list | If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes. |
-e | None | Double | If given, remove all input edges below the given cutoff (after optional normalization). |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
-l | -1 | Integer | Maximum number of genes in input file before in-memory score caching is disabled. If -1, caching is never performed. Caching greatly speeds up processing, but can consume large amounts of memory for inputs with many genes (rows). |