Sleipnir
Matcher

Matcher calculates the similarity between pairs of input datasets using a variety of similarity/distance measures. It is similar to MIer, but makes no assumption that the input datasets cover the same gene sets (or even organisms), and it optimized to quickly assess subsets of the input datasets.

Usage

Basic Usage

 Matcher -d <data1_dir> <data2.dab>*

Compute pairwise similarities between each pair of datasets from the directory of DAT/DAB files data1_dir and the individual DAT/DAB files data2.dab, outputting a table of scores to standard output.

Detailed Usage

package "Matcher"
version "1.0"
purpose "Data set pairwise similarity calculator."

section "Main"
option  "input"     i   "Directory with input DABs"
                        string  typestr="directory" yes
option  "distance"  d   "Similarity measure"
                        values="pearson","quickpear","euclidean","kendalls","kolm-smir",
                        "hypergeom","innerprod","bininnerprod","mi" default="kolm-smir"
option  "size_min"  z   "Minimum points to compare"
                        int default="0"
option  "size_max"  Z   "Maximum points to compare"
                        int default="1000000000"

section "Optional"
option  "table"     t   "Format output as a 2D table"
                        flag    on
option  "memmap"    m   "Memory map input/output"
                        flag    off
option  "random"    r   "Seed random generator"
                        int default="0"
option  "verbosity" v   "Message verbosity"
                        int default="5"
Flag Default Type Description
None None DAT/DAB files Datasets for which pairwise similarities will be calculated relative to the datasets in -i.
-i None Directory Directory of DAT/DAB files for which pairwise similarities will be calculated relative to the datasets given on the command line.
-d kolm-smir pearson, quickpear, euclidean, kendalls, kolm-smir, hypergeom, innerprod, bininnerprod, mi Similarity measure to be used for dataset comparisons.
-z 0 Integer Minimum number of data points to subsample from each dataset.
-Z 1000000000 Integer Maximum number of data points to subsample from each dataset.
-t on Flag If on, format output as a two-dimensional table; otherwise, format as a list of pairs.
-m off Flag If given, memory map the input files when possible.