Sleipnir: Filterer

Filterer removes values within a given range or ranges from a DAT/DAB file. These ranges can be specified using a rudimentary query language allowing inclusions, exclusions, intersections, and unions.

Usage

Basic Usage

 Filterer -i <input.dab> -o <output.dab> i0=1

Include in output.dab only the gene pair values from input.dab that are between 0 and 1.

 Filterer -i <input.dab> -o <output.dab> x0=1

Include in output.dab only the gene pair values from input.dab that are not between 0 and 1.

 Filterer -i <input.dab> -o <output.dab> i0=1 i3=

Include in output.dab only the gene pair values from input.dab that are between 0 and 1 or at least 3.

 Filterer -i <input.dab> -o <output.dab> x0=1 i=-0.5

Include in output.dab only the gene pair values from input.dab that are at least -0.5 and not between 0 and 1.

 Filterer -i <input.dab> -o <output.dab> i=-2 i-1= x-2.5=-0.5

Include in output.dab only the gene pair values from input.dab that are at most -2 or at least -1 and are not between -2.5 and -0.5.

Detailed Usage

package "Dat2Dab"
version "1.0"
purpose "Text/binary data file interconversion"

section "Main"
option  "input"         i   "Input DAT/DAB file"
                            string  typestr="filename"
option  "output"        o   "Output DAT/DAB file"
                            string  typestr="filename"
option  "quant"         q   "Input Quant file"
                            string  typestr="filename"

section "Preprocessing"
option  "flip"          f   "Calculate one minus values"
                            flag    off
option  "abs"           B   "Calculate absolute values"
                            flag    off
option  "normalize"     n   "Normalize to the range [0,1]"
                            flag    off
option  "normalizeNPone"    w   "Normalize to the range [-1,1]"
                            flag    off
option  "normalizeDeg"      j   "Normalize by incident node degrees"
                            flag    off
option  "normalizeLoc"      k   "Normalize by local neighborhood"
                            flag    off
option  "zscore"        z   "Convert values to z-scores"
                            flag    off
option  "rank"          r   "Rank transform data"
                            flag    off
option  "randomize"     a   "Randomize data"
                            flag    off
option  "NegExp"        K   "Transform all values to their negative exponential (converts -log of prob back to prob space)"
                            flag    off

section "Filtering"
option  "genes"         g   "Process only genes from the given set"
                            string  typestr="filename"
option  "genex"         G   "Exclude all genes from the given set"
                            string  typestr="filename"
option  "genee"         D   "Process only edges including a gene from the given set"
                            string  typestr="filename"
option  "edges"         e   "Process only edges from the given DAT/DAB"
                            string  typestr="filename"
option  "exedges"       x   "Exclude edges from the given DAT/DAB"
                            string  typestr="filename"
option  "gexedges"      X   "Exclude all edges which both genes from the given set"
                            string  typestr="filename"
option  "cutoff"        c   "Exclude edges below cutoff"
                            double
option  "zero"          Z   "Zero missing values"
                            flag    off
option  "dval"          V   "set all non-missing values to a set default value"
                            float   
option  "dmissing"      M   "set missing values to a set default value"
                            float   
option  "duplicates"        d   "Allow dissimilar duplicate values"
                            flag    off
option  "subsample"     u   "Fraction of output to randomly subsample"
                            float   default="1"

section "Lookups"
option  "lookup1"       l   "First lookup gene"
                            string
option  "lookup2"       L   "Second lookup gene"
                            string
option  "lookups1"      t   "First lookup gene set"
                            string  typestr="filename"
option  "lookups2"      T   "First lookup gene set"
                            string  typestr="filename"
option  "genelist"      E   "Only list genes"
                            flag    off
option  "paircount"     P   "Only count pairs above cutoff"
                            flag    off
option  "ccoeff"        C   "Output clustering coefficient for each gene"
                            flag    off
option  "hubbiness"     H   "Output the average edge weight for each gene"
                            flag    off
option  "mar"           J   "Output the maximum adjacency ratio for each gene"
                            flag    off

section "Optional"
option  "remap"         p   "Gene name remapping file"
                            string  typestr="filename"
option  "table"         b   "Produce table formatted output"
                            flag    off
option  "skip"          s   "Columns to skip in input PCL"
                            int default="2"
option  "memmap"        m   "Memory map input/output"
                            flag    off
option  "random"        R   "Seed random generator (default -1 uses current time)"
                            int default="-1"
option  "noise"         N   "Add noise from standard Normal to all non-missing values"
                            flag    off
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
-i	stdin	DAT/DAB file	Input DAT/DAB file.
-o	stdout	DAT/DAB file	Output (filtered) DAT/DAB file.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.