Sleipnir: Dat2Dab

Dat2Dab converts tab-delimited text DAT files into binary DAB files and vice versa. It can also convert PCL and DAS files (see Sleipnir::CDat), perform a variety of normalizations or filters during the conversion process, or lookup individual genes' or gene pairs' values from DAB files.

Usage

Basic Usage

 Dat2Dab -i <data.dab> -o <data.dat>

Convert the input binary DAB file data.dab into the output tab-delimited text DAT file data.dat.

 Dat2Dab -o <data.dab> -n -f -d < <data.dat>

Read a text DAT file data.dat from standard input, allowing duplicates, normalize all scores to the range [0,1], then invert them and save the results to the binary DAB file data.dab.

 Dat2Dab -i <data.dab> -m -l <gene1> -L <gene2>

Open the binary DAB file data.dab using memory mapping and output the score for the gene pair gene1 and gene2.

Detailed Usage

package "Dat2Dab"
version "1.0"
purpose "Text/binary data file interconversion"

section "Main"
option  "input"         i   "Input DAT/DAB file"
                            string  typestr="filename"
option  "output"        o   "Output DAT/DAB file"
                            string  typestr="filename"
option  "quant"         q   "Input Quant file"
                            string  typestr="filename"

section "Preprocessing"
option  "flip"          f   "Calculate one minus values"
                            flag    off
option  "abs"           B   "Calculate absolute values"
                            flag    off
option  "normalize"     n   "Normalize to the range [0,1]"
                            flag    off
option  "normalizeNPone"    w   "Normalize to the range [-1,1]"
                            flag    off
option  "normalizeDeg"      j   "Normalize by incident node degrees"
                            flag    off
option  "normalizeLoc"      k   "Normalize by local neighborhood"
                            flag    off
option  "zscore"        z   "Convert values to z-scores"
                            flag    off
option  "rank"          r   "Rank transform data"
                            flag    off
option  "randomize"     a   "Randomize data"
                            flag    off
option  "NegExp"        K   "Transform all values to their negative exponential (converts -log of prob back to prob space)"
                            flag    off

section "Filtering"
option  "genes"         g   "Process only genes from the given set"
                            string  typestr="filename"
option  "genex"         G   "Exclude all genes from the given set"
                            string  typestr="filename"
option  "genee"         D   "Process only edges including a gene from the given set"
                            string  typestr="filename"
option  "edges"         e   "Process only edges from the given DAT/DAB"
                            string  typestr="filename"
option  "exedges"       x   "Exclude edges from the given DAT/DAB"
                            string  typestr="filename"
option  "gexedges"      X   "Exclude all edges which both genes from the given set"
                            string  typestr="filename"
option  "cutoff"        c   "Exclude edges below cutoff"
                            double
option  "zero"          Z   "Zero missing values"
                            flag    off
option  "dval"          V   "set all non-missing values to a set default value"
                            float   
option  "dmissing"      M   "set missing values to a set default value"
                            float   
option  "duplicates"        d   "Allow dissimilar duplicate values"
                            flag    off
option  "subsample"     u   "Fraction of output to randomly subsample"
                            float   default="1"

section "Lookups"
option  "lookup1"       l   "First lookup gene"
                            string
option  "lookup2"       L   "Second lookup gene"
                            string
option  "lookups1"      t   "First lookup gene set"
                            string  typestr="filename"
option  "lookups2"      T   "First lookup gene set"
                            string  typestr="filename"
option  "genelist"      E   "Only list genes"
                            flag    off
option  "paircount"     P   "Only count pairs above cutoff"
                            flag    off
option  "ccoeff"        C   "Output clustering coefficient for each gene"
                            flag    off
option  "hubbiness"     H   "Output the average edge weight for each gene"
                            flag    off
option  "mar"           J   "Output the maximum adjacency ratio for each gene"
                            flag    off

section "Optional"
option  "remap"         p   "Gene name remapping file"
                            string  typestr="filename"
option  "table"         b   "Produce table formatted output"
                            flag    off
option  "skip"          s   "Columns to skip in input PCL"
                            int default="2"
option  "memmap"        m   "Memory map input/output"
                            flag    off
option  "random"        R   "Seed random generator (default -1 uses current time)"
                            int default="-1"
option  "noise"         N   "Add noise from standard Normal to all non-missing values"
                            flag    off
option  "verbosity"     v   "Message verbosity"
                            int default="5"

Flag	Default	Type	Description
-i	stdin	DAT/DAB file	Input DAT, DAB, DAS, or PCL file.
-o	stdout	DAT/DAB file	Output DAT, DAB, or DAS file.
-f	off	Flag	If on, output one minus the input's values.
-n	off	Flag	If on, normalize input edges to the range [0,1] before processing.
-z	off	Flag	If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing.
-r	off	Flag	If on, transform input values to integer ranks before processing.
-g	None	Text gene list	If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-c	None	Double	If given, remove all input edges below the given cutoff (after optional normalization).
-e	off	Flag	If on, replace all missing values with zeros.
-d	off	Flag	If on, allow (with a warning) duplicate pairs in text-based input.
-G	off	Flag	If on, only print list of genes that would be included in the normal output file.
-l	None	String	If given, lookup all values for pairs involving the requested gene.
-L	None	String	If given with `-l`, lookup all values for the requested gene pair.
-t	None	Gene text file	If given with `-l`, lookup all pairs between `-l` and the given gene set. If given alone, lookup all pairs between genes in the given set. If given with `-T`, lookup all pairs spanning the two gene sets.
-T	None	Gene text file	Must be given with `-t`; looks up all gene pairs spanning the two gene sets (i.e. one gene in the set `-t`, one in the set `-T`).
-E	off	Flag	If set, produce no output other than a list of genes that would be in at least one of the normally output pairs.
-p	None	Gene pair text file	Tab-delimited text file containing two columns, both gene IDs. If given, replace each gene ID from the first column with the corresponding ID in the second column.
-b	off	Flag	If given, produce output in a tab-delimited half matrix table. Not recommended for DAT/DABs with more than a few dozen genes!
-s	2	Integer	Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL.
-m	off	Flag	If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.