Sleipnir
|
MEFIT performs all steps necessary to produce context-specific predicted functional relationship networks (DAT/DAB files) from input microarray PCL files as described in Huttenhower et al 2006. This is essentially a summarization of work performed by Answerer, Distancer, BNCreator, and BNTruster.
MEFIT -r <related_dir> -u <unrelated.txt> -b <bins.quant> -o <learned_dir> -O <learned.xdsl> -p <predictions_dir> -t <trusts.txt> <data.pcl>*
First, construct a gold standard by reading related gene set text files from related_dir
and unrelated gene pairs from unrelated.txt
; any two genes coannotated to some set in related_dir
are considered functionally related, and any gene pair listed in unrelated.txt
is considered to be unrelated. Next, compute normalized pairwise correlations for all genes in the given data.pcl
microarray data files and discretize these scores based on the QUANT file bins.quant
. Learn a global Bayesian classifier learned.xdsl
and context-specific classifiers for each positive gene set, stored in learned_dir
. Finally, save a table of functional activity scores for each dataset in each context in trusts.txt
, and save context-specific predicted functional relationship networks in predictions_dir
.
package "MEFIT"
version "1.2"
purpose "Microarray Expression Functional Integration Technique (Huttenhower et al,
Bioinformatics 2006)
MEFIT takes as input:
1. A collection of microarray data sets (PCL files provided on the command line)
2. A collection of known biological functions (lists of related genes provided
using the -r flag)
3. A collection of known unrelated gene pairs (provided using the -u flag)
It produces as output:
1. A global Bayesian network learned by considered all of the data sets
independently of biological function (specified using the -O flag)
2. One Bayesian network per biological function (placed in the directory
specified by the -o flag)
3. Predicted probabilities of functional relationships within each biological
function of interest (placed in the directory specified by the -p flag)
4. Trust scores for each input data set and function indicating how
predictive a data set is within a function (specified by the -t flag)"
section "Inputs"
option "related" r "Directory containing lists of known related genes"
string typestr="directory" yes
option "unrelated" u "List of known unrelated gene pairs"
string typestr="filename" yes
option "distance" d "Similarity measure"
values="pearson","euclidean","kendalls","kolm-smir",
"spearman","pearnorm" default="pearnorm"
option "bins" b "Tab separated QUANT bin cutoffs"
string typestr="filename"
section "Outputs"
option "output" o "Directory to contain learned per-function Bayesian networks"
string typestr="directory" yes
option "global" O "Global learned Bayesian network"
string typestr="filename" yes
option "predictions" p "Directory to contain predicted probabilities of functional relationship"
string typestr="directory" yes
option "trusts" t "Trust scores learned per data set and function"
string typestr="filename" yes
section "Learning/Evaluation/Features"
option "genes" g "Subset of genes to include in evaluation"
string typestr="filename"
option "genex" G "Subset of genes to exclude from evaluation"
string typestr="filename"
option "zero" z "Zero missing values"
flag off
option "cutoff" c "Include only confidences above cutoff"
double default="0"
option "skip" s "Additional columns to skip in input PCLs"
int default="2"
option "xdsl" x "Output XDSL files in place of DSLs"
flag on
option "dab" a "Output DAB files in place of DATs"
flag on
option "random" R "Seed random generator"
int default="0"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
None | None | PCL text files | Microarray datasets which will be integrated by MEFIT. Each dataset will correspond to one node in each of the learned Bayesian classifiers and assigned a trust score in each biological context. All input PCLs must have the same number of skip columns -s . |
-r | None | Directory | Input directory containing related (positive) gene lists. Each gene list is a text file containing one systematic gene ID per line (see Answerer). |
-u | None | Gene pair text file | Input tab-delimited text file containing two columns; each line is a gene pair which is known to be functionally unrelated (e.g. annotated to two different Gene Ontology terms; see Answerer). |
-d | pearnorm | pearnorm, pearson, euclidean, kendalls, kolm-smir, or spearman | Similarity measure to be used for converting microarray data into pairwise similarity scores. pearnorm is the recommended Fisher's z-transformed Pearson correlation. |
-b | None | QUANT text file | Input tab-delimited QUANT file containing exactly one line of bin edges; these are used to discretize pairwise similarity scores. For details, see Sleipnir::CDataPair. |
-o | None | Directory | Output directory in which learned context-specific Bayesian classifiers are saved as (X)DSL files (see BNCreator). |
-O | None | (X)DSL file | Output file in which the learned global (non-context-specific) Bayesian classifier is saved (see BNCreator). |
-p | None | Directory | Directory in which predicted context-specific functional relationships (DAT/DAB files) are saved (see BNCreator). |
-t | None | PCL text file | Output PCL file in which dataset/context functional activity scores are saved (see BNTruster). |
-g | None | Text gene list | If given, use only gene pairs for which both genes are in the list. For details, see Sleipnir::CDat::FilterGenes. |
-G | None | Text gene list | If given, use only gene pairs for which neither gene is in the list. For details, see Sleipnir::CDat::FilterGenes. |
-z | off | Flag | If on, assume that all missing gene pairs in all datasets have a value of 0 (i.e. the first bin). |
-c | None | Double | If given, remove all input edges below the given cutoff (after optional normalization). |
-s | 2 | Integer | Number of columns to skip between the initial ID column and the first experimental (data) column in the input PCL. |
-x | on | Flag | If on, assume XDSL files will be used instead of DSL files. |
-a | on | Flag | If on, output DAB files instead of DAT files. |