Sleipnir
|
BNTruster evaluates the "influence" each node in a Bayesian network has on the posterior classification probability. This can be used to calculate the weight of each dataset during Bayesian integration.
BNTruster offers a collection of ways of evaluating the weight of each node in a Bayesian network. These methods are all variations on determining how much influence each node has on the classification posterior, i.e. how "different" each dataset's probability distributions are for the different class values. Most commonly, this is used to answer the question, "Given a Bayesian classifier that I've learned to integrate many datasets, how much weight is given to each dataset?" If the classifier(s) being evaluated are context-specific, this provides a measure of the functional activity of each dataset within each biological context.
The most principled trust calculation provides, for each dataset, the average weighted change in posterior probability of functional relationship over all values in that dataset. For a dataset D taking one or more discretized values d, this can be written as:
trust(D) = sum( P(D = d) * |P(FR) - P(FR|D = d)|, d in D )
That is, the trust or "influence" of dataset D is the sum over all its values of the probability of that value times the difference in posterior when that value is detected.
Suppose we have two datasets, a microarray dataset MA quantized into five bins and a physical binding dataset PB quantized into two bins. We've learned two context-specific naive Bayesian classifiers for these datasets using BNCreator, one for translation (translation.xdsl
) and one for MAPK signaling (mapk.xdsl
):
These networks have the following conditional probability tables:
Translation | |||||
---|---|---|---|---|---|
FR | MA | PB | |||
Value | FR 0 (No) | FR 1 (Yes) | FR 0 (No) | FR 1 (Yes) | |
0 | 0.9 | 0.1 | 0.05 | 0.4 | 0.3 |
1 | 0.1 | 0.3 | 0.2 | 0.6 | 0.7 |
2 | 0.4 | 0.3 | |||
3 | 0.15 | 0.3 | |||
4 | 0.05 | 0.15 |
MAPK Signaling | |||||
---|---|---|---|---|---|
FR | MA | PB | |||
Value | FR 0 (No) | FR 1 (Yes) | FR 0 (No) | FR 1 (Yes) | |
0 | 0.95 | 0.1 | 0.1 | 0.9 | 0.2 |
1 | 0.05 | 0.3 | 0.3 | 0.1 | 0.8 |
2 | 0.35 | 0.3 | |||
3 | 0.15 | 0.2 | |||
4 | 0.1 | 0.1 |
First, we find the trust of each dataset in the translation-specific network:
trust(MA) = sum( P(MA = d) * |P(FR) - P(FR|MA = d)|, d = 0 to 4 ) = P(MA = 0) * |P(FR) - P(FR|MA = 0)| + ... + P(MA = 5) * |P(FR) - P(FR|MA = 4)| = 0.095*|0.1 - 0.053| + 0.29*|0.1 - 0.069| + 0.39*|0.1 - 0.077| + 0.165*|0.1 - 0.182| + 0.06*|0.1 - 0.25| = 0.045 trust(PB) = sum( P(PB = d) * |P(FR) - P(FR|PB = d)|, d = 0 to 1 ) = P(PB = 0) * |P(FR) - P(FR|PB = 0)| + P(PB = 1) * |P(FR) - P(FR|PB = 1)| = 0.39*|0.1 - 0.077| + 0.61*|0.1 - 0.115| = 0.018
In the context of translation, the microarray data is somewhat more informative. However, in the MAPK signaling network:
trust(MA) = sum( P(MA = d) * |P(FR) - P(FR|MA = d)|, d = 0 to 4 ) = P(MA = 0) * |P(FR) - P(FR|MA = 0)| + ... + P(MA = 5) * |P(FR) - P(FR|MA = 4)| = 0.1*|0.05 - 0.05| + 0.3*|0.05 - 0.05| + 0.348*|0.05 - 0.043| + 0.153*|0.05 - 0.066| + 0.1*|0.05 - 0.05| = 0.0049 trust(PB) = sum( P(PB = d) * |P(FR) - P(FR|PB = d)|, d = 0 to 1 ) = P(PB = 0) * |P(FR) - P(FR|PB = 0)| + P(PB = 1) * |P(FR) - P(FR|PB = 1)| = 0.865*|0.05 - 0.012| + 0.135*|0.05 - 0.296| = 0.066
Unsurprisingly (since the example was completely cooked), the process of MAPK signaling is very active in our physical binding dataset.
To have BNTruster do all of this hard work for us, you can just run:
BNTruster translation.xdsl mapk.xdsl
Other trust calculations include sums, calculated as:
trust(D) = sum( |P(D = d|FR) - P(D = d|~FR)|, d in D )
and ratios, calculated as:
trust(D) = log( prod( max{P(D = d|FR), P(D = d|~FR)} / min{P(D = d|FR), P(D = d|~FR)}, d in D ) )
Finally, trust scores can also be calculated for individual bins (i.e. dataset values). These are only available when using the posteriors trust calculation method, and the value represents the fraction of possible change away from prior. Positive values represent an increase from prior to posterior, and negative values a decrease. That is:
trust(d) = ( P(FR|D = d) - P(FR) ) / ( ( P(FR|D = d) > P(FR) ) ? P(~FR) : P(FR) )
Note that this does not incorporate the probability of observing d
, i.e. P(D = d).
BNTruster <bayesnets.xdsl>*
Writes trust scores (by default, weighted average of difference in posterior) to standard output for all of the (X)DSL files given on the command line.
package "BNTruster"
version "1.0"
purpose "Bayes net dataset functional activity evaluation"
section "Optional"
option "type" t "Trust calculation type"
values="posteriors","sums","ratios" default="posteriors"
option "bins" b "Output individual bins"
flag off
option "threads" d "Number of simultaneous threads to use"
int default="1"
option "verbosity" v "Message verbosity"
int default="5"
Flag | Default | Type | Description |
---|---|---|---|
None | None | (X)DSL files | Bayesian networks in which each node's influence on the posterior should be evaluated. |
-t | posteriors | posteriors, sums, or ratios | Type of trust score to calculate, as described above. |
-b | off | Flag | If on, output individual bins' influence scores as described above. |
-d | 1 | Integer | Number of simultaneous threads to use for posterior trust calculations. Threads are per-(X)DSL, so the number of threads actually used is the minimum of -d and the number of input files. |