Sleipnir
|
DChecker inputs a gold standard answer file and a DAT/DAB of predicted functional relationships (or other interactions) and outputs the information necessary to perform a performance analysis (ROC curve, precision/recall curve, or AUC score) for the given predictions.
DChecker -w <answers.dab> -i <predictions.dab>
Output (to standard output) positive gene counts, true and false positive pair counts, true and false negative pair counts, and an overall AUC score using the default binning of continuous data in predictions.dab
and the binary gold standard answers in answers.dab
.
DChecker -w <answers.dab> -i <predictions.dab> -f
Output true/false positive and negative pair counts assuming that predictions.dab
contains only a finite number of different values and using these as bins. This is appropriate for inherently discrete data, e.g. cocluster counts.
DChecker -w <answers.dab> -i <predictions.dab> -b 0 -n -m 0 -M 1 -e 0.01
Output true/false positive and negative pair counts by normalizing the given data predictions.dab
to the range [0,1] and creating bin cutoffs at 0.01 increments between 0 and 1. This can provide a finer grained binning than the default -b
setting for some prediction/data sets (and a less fine grained binning for others; when in doubt, try both).
DChecker -w <answers.dab> -i <predictions.dab> -c <context.txt>
Output true/false positive and negative pair counts for the given predictions.dab
using only the gene pairs relevant to the given biological function context.txt
. This is appropriate for evaluating context-specific functional relationship predictions.
package "DChecker"
version "1.0"
purpose "Similarity to answer file checker"
section "Main"
option "input" i "Similarity DAT/DAB file"
string typestr="filename" yes
option "answers" w "Answer DAT/DAB file"
string typestr="filename" yes
section "Miscellaneous"
option "directory" d "Output directory"
string typestr="directory" default="."
option "auc" a "Use alternative AUCn calculation"
float default="0"
option "randomize" R "Calculate specified number of randomized scores"
int default="0"
section "Ranking Method"
option "bins" b "Bins for quantile sorting"
int default="1000"
option "finite" f "Count finitely many bins"
flag off
option "min" m "Minimum correlation to process"
float default="0"
option "max" M "Maximum correlation to process"
float default="1"
option "delta" e "Size of correlation bins"
double default="0.01"
section "Learning/Evaluation"
option "genes" g "Gene inclusion file"
string typestr="filename"
option "genex" G "Gene exclusion file"
string typestr="filename"
option "ubiqg" P "Ubiquitous gene file (-j and -J refer to connections to ubiq instead of all bridging pairs)"
string typestr="filename"
option "genet" c "Term inclusion file"
string typestr="filename"
option "genee" C "Edge inclusion file"
string typestr="filename"
option "genep" l "Gene inclusion file for positives"
string typestr="filename"
option "ctxtpos" q "Use positive edges between context genes"
flag on
option "ctxtneg" Q "Use negative edges between context genes"
flag on
option "bridgepos" j "Use bridging positives between context and non-context genes"
flag off
option "bridgeneg" J "Use bridging negatives between context and non-context genes"
flag on
option "outpos" u "Use positive edges outside the context"
flag off
option "outneg" U "Use negative edges outside the context"
flag off
option "weights" W "Weight file"
string typestr="filename"
option "flipneg" F "Flip weights(one minus original) for negative standards"
flag on
section "Preprocessing"
option "normalize" n "Normalize scores before processing"
flag off
option "invert" t "Invert correlations to distances"
flag off
option "abs" A "Convert input to its absolute values"
float default="0.0"
section "Optional"
option "sse" s "Calculate sum of squared errors"
flag off
option "memmap" p "Memory map input DABs"
flag off
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
None | None | Gene text files | If given, contexts in which multiple context-specific evaluations are performed. Each gene set is read, treated as a "term" filter (see Sleipnir::CDat::FilterGenes) on the given answer file, and a context-specific evaluation is saved in the directory -d . |
-i | stdin | DAT/DAB file | Input DAT, DAB, DAS, or PCL file. |
-w | None | DAT/DAB file | Functional gold standard for learning. Should consist of gene pairs with scores of 0 (unrelated), 1 (related), or missing (NaN). |
-d | . | Directory | If multiple contexts are being checked, output directory in which individual contexts' score files are placed. |
-b | 1000 | Integer | If nonzero, number of quantile bins into which input scores are sorted. Each bin is then used as a cutoff for predicted positives and negatives. |
-f | off | Flag | If on, assume the input predictions contain a small, finite number of distinct values and bin quantiles appropriate. Bad things will happen if -f is on and there are actually a large number of distinct input values. |
-m | 0 | Float | If -b is zero and -f is off, minimum input score to treat as a positive/negative cutoff. |
-M | 1 | Float | If -b is zero and -f is off, maximum input score to treat as a positive/negative cutoff. |
-e | 0.01 | Double | If -b is zero and -f is off, size of step to take for cutoffs between -m and -M . |
-g | None | Text gene list | If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes. |
-G | None | Text gene list | If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes. |
-c | None | Text gene list | If given, use only gene pairs passing a "term" filter against the list. For details, see Sleipnir::CDat::FilterGenes. |
-C | None | Text gene list | If given, use only gene pairs passing an "edge" filter against the list. For details, see Sleipnir::CDat::FilterGenes. |
-n | off | Flag | If on, normalize input edges to the range [0,1] before processing. |
-t | off | Flag | If on, output one minus the input's values. |
-s | off | Flag | If on, output sum of squared error between input predictions and answer file (assumes a continuous rather than discrete answer file). |
-p | off | Flag | If given, memory map the input files when possible. DAT and PCL inputs cannot be memmapped. |