A lot of this was covered in high school, but I’d forgotten quite a bit…

The conditional probability above means the probability of event A occurring given that event B has occurred.
Bayes’ Theorem

The formula above provides a way to compute conditional probability from P(B) when new information A is given.
Bayes’ Theorem Example

- D: Newly observed data
- Theta: Hypothesis, the event being modeled, the parameter to be computed
- Posterior distribution: The probability that Theta holds given D was observed. Called “posterior” because it comes after observing the data.
- Prior distribution: The probability of Theta before D is observed. A pre-assumed parameter or probability distribution.
- Numerator of Bayes’ theorem: likelihood
- Denominator of Bayes’ theorem: Evidence, the distribution of the data itself
Bayes’ Theorem Example (COVID-99)
COVID-99 has an incidence rate of 10%. When actually infected with COVID-99, the detection probability is 99%. When not actually infected with COVID-99, the false detection probability is 1%. Given a positive test result, what is the probability of actually being infected with COVID-99?

Define Theta as the COVID-99 infection event (not directly observable). Define D as the test result.
Event probabilities for Theta and not-Theta can be defined as shown above.

To compute the evidence using Bayes’ theorem, we set up the formula as above. Multiply the likelihood by Theta’s probability and sum.
Visualizing Conditional Probability

True Positive: Recall. The probability of actually being positive when classified as positive. True Negative: The probability of actually being negative when classified as negative. False Positive: False alarm (Type I error). The probability of not being positive when classified as positive. False Negative: (Type II error). The probability of not being negative when classified as negative.
- Recall is determined by the prior probability P(Theta).
- Bayesian statistics can’t be applied without a prior probability.
- If the prior is unknown, it can be set arbitrarily, but credibility drops significantly.

Precision is computed as shown above.
Application of Conditional Probability
For example, consider a cancer detection problem. In this case, reducing Type II errors is critical. A Type II error is when a cancer patient is classified as not having cancer.
So when balancing Type I and Type II errors, Type II errors require more attention.
Updating Information Through Bayes’ Theorem

The posterior probability from the previous step can be used as the prior probability for the next step.
Usage Example

In COVID-99 testing, the first test’s detection probability was 52.4%. Testing the same person a second time in succession raises it to 91.7%.
This is an example of using the posterior probability from the previous step (52.4%) as the prior probability for the next step.
Interpreting Causality
Conditional probability alone shouldn’t be carelessly used to fully explain causal relationships.
Moreover, no matter how much data accumulates, causal relationships can’t be explained through conditional probability alone.
There may be cases where it can, but there’s never a guarantee. Causal relationships only emerge through very extensive data analysis.
Robust Models Using Causality
Typical model results look like this:
- Conditional probability-based prediction model (99% accuracy)
- Existing scenario (95% accuracy)
- Changed scenario (72% accuracy)
- Causality-based prediction model (85% accuracy)
- Existing scenario (83% accuracy)
- Changed scenario (82% accuracy)
Models using only conditional probability typically guarantee high accuracy for expected scenarios. But when data distributions change significantly, accuracy drops sharply.
Models considering only causality don’t guarantee high accuracy. But they’re robust to changes.
Causality
Used when building prediction models robust to changes in data distribution. 
To understand causality, the confounding factor Z — which affects both T and R — must be removed. If Z isn’t removed, spurious correlation results.
Causality Inference Example

For example, consider analyzing kidney stone treatment results for treatments a and b. Treatment a has a higher individual cure rate, but treatment b has a higher overall cure rate. This is Simpson’s paradox.
This can’t be resolved through conditional probability alone. The confounding factor caused by kidney stone size must be removed to properly analyze the actual cure rate.
Removing Z’s Influence
The intervention do(T=a) removes Z’s influence.  