The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, data scientist's work is very much restricted in what they are able to do.
A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.
Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:
The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:
I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.
This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.
Hosmer et al., Applied Logistic Regression. An exhaustive guide to the perils and pitfalls of logistic regression. Logistic regression is the power tool of interpretable statistical models, but if you don't understand it, it will take your foot off (concretely, your inferences will be wrong and your peers will laugh at you.) This book is essential. Graduate level, or perhaps advanced undergraduate, intended for STEM and social science grad students.
Peter Christen's Data Matching. Record Linkage is a relatively niche concept, so Christen's book has no right to be as good as it is. But it covers every relevant topic in a clear, even-handed way. If you are working on a record linkage system, then there's nothing in this book you can afford not to know. Undergraduate level, but intended for industry practitioners.
Max Kuhn's Applied Predictive Modeling. Even if you don't use R, this is an incredibly good introduction to how predictive modeling is done in practice. Early undergraduate level.
The Elements of Statistical Learning. Probably the single most respected book in machine learning. Exhaustive and essential. Advanced undergraduate level.
Kevin Murphy's Machine Learning: A Probabilistic Perspective. Covers lots of the same ground as Elements but is a little easier. Undergraduate level.
Taboga's Lectures on Probability Theory and Mathematical Statistics. Has the distinction of being available for free in web-friendly format at https://www.amazon.com/Lectures-Probability-Theory-Mathemati...
Russel and Norvig's book is probably the best introduction to "old fashioned" AI:
GOFAI may not have lead directly to true AI, but it produced a ton of useful algorithms such as A* and minimax. Although the attention has turned to machine learning algorithms (à la https://www.amazon.com/Elements-Statistical-Learning-Predict...) the hybrid of GOFAI and ML has produced some extraordinary results, such as AlphaZero: