Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, data scientist's work is very much restricted in what they are able to do.

A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.

Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:

Which loss-function is better than MSE in temperature prediction?

Binary Image Classification with CNN - best practices for choosing “negative” dataset?

How do neural networks account for outliers?

When?

The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:

What is the 'fundamental' idea of machine learning for estimating parameters?

Role of Bias in Neural Networks

How to find probability distribution and parameters for real data? (Python 3)

I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.

## Why?

Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, data scientist's work is very much restricted in what they are able to do.

A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.

Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:

## When?

The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:

I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.

## Reading

This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.