Sleipnir: Matcher

Matcher calculates the similarity between pairs of input datasets using a variety of similarity/distance measures. It is similar to MIer, but makes no assumption that the input datasets cover the same gene sets (or even organisms), and it optimized to quickly assess subsets of the input datasets.

Usage

Basic Usage

 Matcher -d <data1_dir> <data2.dab>*

Compute pairwise similarities between each pair of datasets from the directory of DAT/DAB files data1_dir and the individual DAT/DAB files data2.dab, outputting a table of scores to standard output.

Detailed Usage

package "Matcher"
version "1.0"
purpose "Data set pairwise similarity calculator."

section "Main"
option  "input"     i   "Directory with input DABs"
                        string  typestr="directory" yes
option  "distance"  d   "Similarity measure"
                        values="pearson","quickpear","euclidean","kendalls","kolm-smir",
                        "hypergeom","innerprod","bininnerprod","mi" default="kolm-smir"
option  "size_min"  z   "Minimum points to compare"
                        int default="0"
option  "size_max"  Z   "Maximum points to compare"
                        int default="1000000000"

section "Optional"
option  "table"     t   "Format output as a 2D table"
                        flag    on
option  "memmap"    m   "Memory map input/output"
                        flag    off
option  "random"    r   "Seed random generator"
                        int default="0"
option  "verbosity" v   "Message verbosity"
                        int default="5"

Flag	Default	Type	Description
None	None	DAT/DAB files	Datasets for which pairwise similarities will be calculated relative to the datasets in `-i`.
-i	None	Directory	Directory of DAT/DAB files for which pairwise similarities will be calculated relative to the datasets given on the command line.
-d	kolm-smir	pearson, quickpear, euclidean, kendalls, kolm-smir, hypergeom, innerprod, bininnerprod, mi	Similarity measure to be used for dataset comparisons.
-z	0	Integer	Minimum number of data points to subsample from each dataset.
-Z	1000000000	Integer	Maximum number of data points to subsample from each dataset.
-t	on	Flag	If on, format output as a two-dimensional table; otherwise, format as a list of pairs.
-m	off	Flag	If given, memory map the input files when possible.