Sleipnir
|
Dat2Dab converts tab-delimited text DAT files into binary DAB files and vice versa. It can also convert PCL and DAS files (see Sleipnir::CDat), perform a variety of normalizations or filters during the conversion process, or lookup individual genes' or gene pairs' values from DAB files.
Dat2Dab -i <data.dab> -o <data.dat>
Convert the input binary DAB file data.dab
into the output tab-delimited text DAT file data.dat
.
Dat2Dab -o <data.dab> -n -f -d < <data.dat>
Read a text DAT file data.dat
from standard input, allowing duplicates, normalize all scores to the range [0,1], then invert them and save the results to the binary DAB file data.dab
.
Dat2Dab -i <data.dab> -m -l <gene1> -L <gene2>
Open the binary DAB file data.dab
using memory mapping and output the score for the gene pair gene1
and gene2
.
package "Dat2Dab"
version "1.0"
purpose "Text/binary data file interconversion"
section "Main"
option "input" i "Input DAT/DAB file"
string typestr="filename"
option "output" o "Output DAT/DAB file"
string typestr="filename"
option "quant" q "Input Quant file"
string typestr="filename"
section "Preprocessing"
option "flip" f "Calculate one minus values"
flag off
option "abs" B "Calculate absolute values"
flag off
option "normalize" n "Normalize to the range [0,1]"
flag off
option "normalizeNPone" w "Normalize to the range [-1,1]"
flag off
option "normalizeDeg" j "Normalize by incident node degrees"
flag off
option "normalizeLoc" k "Normalize by local neighborhood"
flag off
option "zscore" z "Convert values to z-scores"
flag off
option "rank" r "Rank transform data"
flag off
option "randomize" a "Randomize data"
flag off
option "NegExp" K "Transform all values to their negative exponential (converts -log of prob back to prob space)"
flag off
section "Filtering"
option "genes" g "Process only genes from the given set"
string typestr="filename"
option "genex" G "Exclude all genes from the given set"
string typestr="filename"
option "genee" D "Process only edges including a gene from the given set"
string typestr="filename"
option "edges" e "Process only edges from the given DAT/DAB"
string typestr="filename"
option "exedges" x "Exclude edges from the given DAT/DAB"
string typestr="filename"
option "gexedges" X "Exclude all edges which both genes from the given set"
string typestr="filename"
option "cutoff" c "Exclude edges below cutoff"
double
option "zero" Z "Zero missing values"
flag off
option "dval" V "set all non-missing values to a set default value"
float
option "dmissing" M "set missing values to a set default value"
float
option "duplicates" d "Allow dissimilar duplicate values"
flag off
option "subsample" u "Fraction of output to randomly subsample"
float default="1"
section "Lookups"
option "lookup1" l "First lookup gene"
string
option "lookup2" L "Second lookup gene"
string
option "lookups1" t "First lookup gene set"
string typestr="filename"
option "lookups2" T "First lookup gene set"
string typestr="filename"
option "genelist" E "Only list genes"
flag off
option "paircount" P "Only count pairs above cutoff"
flag off
option "ccoeff" C "Output clustering coefficient for each gene"
flag off
option "hubbiness" H "Output the average edge weight for each gene"
flag off
option "mar" J "Output the maximum adjacency ratio for each gene"
flag off
section "Optional"
option "remap" p "Gene name remapping file"
string typestr="filename"
option "table" b "Produce table formatted output"
flag off
option "skip" s "Columns to skip in input PCL"
int default="2"
option "memmap" m "Memory map input/output"
flag off
option "random" R "Seed random generator (default -1 uses current time)"
int default="-1"
option "noise" N "Add noise from standard Normal to all non-missing values"
flag off
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | DAT/DAB file | Input DAT, DAB, DAS, or PCL file. |
-o | stdout | DAT/DAB file | Output DAT, DAB, or DAS file. |
-f | off | Flag | If on, output one minus the input's values. |
-n | off | Flag | If on, normalize input edges to the range [0,1] before processing. |
-z | off | Flag | If on, normalize input edges to z-scores (subtract mean, divide by standard deviation) before processing. |
-r | off | Flag | If on, transform input values to integer ranks before processing. |
-g | None | Text gene list | If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes. |
-c | None | Double | If given, remove all input edges below the given cutoff (after optional normalization). |
-e | off | Flag | If on, replace all missing values with zeros. |
-d | off | Flag | If on, allow (with a warning) duplicate pairs in text-based input. |
-G | off | Flag | If on, only print list of genes that would be included in the normal output file. |
-l | None | String | If given, lookup all values for pairs involving the requested gene. |
-L | None | String | If given with -l , lookup all values for the requested gene pair. |
-t | None | Gene text file | If given with -l , lookup all pairs between -l and the given gene set. If given alone, lookup all pairs between genes in the given set. If given with -T , lookup all pairs spanning the two gene sets. |
-T | None | Gene text file | Must be given with -t ; looks up all gene pairs spanning the two gene sets (i.e. one gene in the set -t , one in the set -T ). |
-E | off | Flag | If set, produce no output other than a list of genes that would be in at least one of the normally output pairs. |
-p | None | Gene pair text file | Tab-delimited text file containing two columns, both gene IDs. If given, replace each gene ID from the first column with the corresponding ID in the second column. |
-b | off | Flag | If given, produce output in a tab-delimited half matrix table. Not recommended for DAT/DABs with more than a few dozen genes! |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
-m | off | Flag | If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped. |