Sleipnir
DChecker

DChecker inputs a gold standard answer file and a DAT/DAB of predicted functional relationships (or other interactions) and outputs the information necessary to perform a performance analysis (ROC curve, precision/recall curve, or AUC score) for the given predictions.

Usage

Basic Usage

 DChecker -w <answers.dab> -i <predictions.dab>

Output (to standard output) positive gene counts, true and false positive pair counts, true and false negative pair counts, and an overall AUC score using the default binning of continuous data in predictions.dab and the binary gold standard answers in answers.dab.

 DChecker -w <answers.dab> -i <predictions.dab> -f

Output true/false positive and negative pair counts assuming that predictions.dab contains only a finite number of different values and using these as bins. This is appropriate for inherently discrete data, e.g. cocluster counts.

 DChecker -w <answers.dab> -i <predictions.dab> -b 0 -n -m 0 -M 1 -e 0.01

Output true/false positive and negative pair counts by normalizing the given data predictions.dab to the range [0,1] and creating bin cutoffs at 0.01 increments between 0 and 1. This can provide a finer grained binning than the default -b setting for some prediction/data sets (and a less fine grained binning for others; when in doubt, try both).

 DChecker -w <answers.dab> -i <predictions.dab> -c <context.txt>

Output true/false positive and negative pair counts for the given predictions.dab using only the gene pairs relevant to the given biological function context.txt. This is appropriate for evaluating context-specific functional relationship predictions.

Detailed Usage

package "DChecker"
version "1.0"
purpose "Similarity to answer file checker"

section "Main"
option  "input"         i   "Similarity DAT/DAB file"
                            string  typestr="filename"  yes
option  "answers"       w   "Answer DAT/DAB file"
                            string  typestr="filename"  yes

section "Miscellaneous"
option  "directory"     d   "Output directory"
                            string  typestr="directory" default="."
option  "auc"           a   "Use alternative AUCn calculation"
                            float   default="0"
option  "randomize"     R   "Calculate specified number of randomized scores"
                            int default="0"

section "Ranking Method"
option  "bins"          b   "Bins for quantile sorting"
                            int default="1000"
option  "finite"        f   "Count finitely many bins"
                            flag    off
option  "min"           m   "Minimum correlation to process"
                            float   default="0"
option  "max"           M   "Maximum correlation to process"
                            float   default="1"
option  "delta"         e   "Size of correlation bins"
                            double  default="0.01"

section "Learning/Evaluation"
option  "genes"         g   "Gene inclusion file"
                            string  typestr="filename"
option  "genex"         G   "Gene exclusion file"
                            string  typestr="filename"
option  "ubiqg"                 P       "Ubiquitous gene file (-j and -J refer to connections to ubiq instead of all bridging pairs)"
                                                        string  typestr="filename"
option  "genet"         c   "Term inclusion file"
                            string  typestr="filename"
option  "genee"         C   "Edge inclusion file"
                            string  typestr="filename"
option  "genep"         l   "Gene inclusion file for positives"
                            string  typestr="filename"
option  "ctxtpos"               q       "Use positive edges between context genes"
                                                        flag    on
option  "ctxtneg"               Q       "Use negative edges between context genes"
                                                        flag    on
option  "bridgepos"             j       "Use bridging positives between context and non-context genes"
                                                        flag    off
option  "bridgeneg"             J       "Use bridging negatives between context and non-context genes"
                                                        flag    on
option  "outpos"                u       "Use positive edges outside the context"
                                                        flag    off
option  "outneg"                U       "Use negative edges outside the context"
                                                        flag    off
option  "weights"           W   "Weight file"
                            string  typestr="filename"
option  "flipneg"           F       "Flip weights(one minus original) for negative standards"
                                                        flag    on

section "Preprocessing"
option  "normalize"     n   "Normalize scores before processing"
                            flag    off
option  "invert"        t   "Invert correlations to distances"
                            flag    off
option  "abs"           A   "Convert input to its absolute values"
                            float   default="0.0"

section "Optional"
option  "sse"           s   "Calculate sum of squared errors"
                            flag    off
option  "memmap"        p   "Memory map input DABs"
                            flag    off
option  "verbosity"     v   "Message verbosity"
                            int default="5"
Flag Default Type Description
None None Gene text files If given, contexts in which multiple context-specific evaluations are performed. Each gene set is read, treated as a "term" filter (see Sleipnir::CDat::FilterGenes) on the given answer file, and a context-specific evaluation is saved in the directory -d.
-i stdin DAT/DAB file Input DAT, DAB, DAS, or PCL file.
-w None DAT/DAB file Functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN).
-d . Directory If multiple contexts are being checked, output directory in which individual contexts' score files are placed.
-b 1000 Integer If nonzero, number of quantile bins into which input scores are sorted. Each bin is then used as a cutoff for predicted positives and negatives.
-f off Flag If on, assume the input predictions contain a small, finite number of distinct values and bin quantiles appropriate. Bad things will happen if -f is on and there are actually a large number of distinct input values.
-m 0 Float If -b is zero and -f is off, minimum input score to treat as a positive/negative cutoff.
-M 1 Float If -b is zero and -f is off, maximum input score to treat as a positive/negative cutoff.
-e 0.01 Double If -b is zero and -f is off, size of step to take for cutoffs between -m and -M.
-g None Text gene list If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes.
-G None Text gene list If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes.
-c None Text gene list If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes.
-C None Text gene list If given, use only gene pairs passing an "edge" filter against the list. For details, see Sleipnir::CDat::FilterGenes.
-n off Flag If on, normalize input edges to the range [0,1] before processing.
-t off Flag If on, output one minus the input's values.
-s off Flag If on, output sum of squared error between input predictions and answer file (assumes a continuous rather than discrete answer file).
-p off Flag If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped.