Pattern Recognition and Machine Learning (Information Science and Statistics)

Category: Computer Science
Author: Christopher M. Bishop
All Stack Overflow 9
This Month Reddit 4

About This Book

This is the first text on pattern recognition to present the Bayesian viewpoint, one that has become increasing popular in the last five years.

It presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It provides the first text to use graphical models to describe probability distributions when there are no other books that apply graphical models to machine learning. It is also the first four-color book on pattern recognition.

The book is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. Extensive support is provided for course instructors, including more than 400 exercises, graded according to difficulty. Example solutions for a subset of the exercises are available from the book web site, while solutions for the remainder can be obtained by instructors from the publisher.


by majordyson   2019-11-17

Having done an MEng at Oxford where I dabbled in ML, the 3 key texts that came up as references in a lot of lectures were these:

Pattern Recognition and Machine Learning (Information Science and Statistics) (Information Science and Statistics)

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning Series)

(Pretty sure Murphy was one of our lecturers actually?)

Bayesian Reasoning and Machine Learning

There were ofc others, and plenty of other sources and references too, but you can't go buying dozens of text books, not least cuz they would repeat the same things. If you need some general maths reading too then pretty much all the useful (non specialist) maths we used for 4 years is all in this: Advanced Engineering Mathematics

by adomian   2019-11-17

If you're worried about not doing projects and participating in Kaggle competitions, why not do those things? They're pretty low risk and high reward. If you're feeling shaky on the theory, read papers and reference textbooks, take notes, and implement things that interest you. For deep learning stuff there are some good resources here: For more traditional methods you can't go wrong with Chris Bishop's book (try googling it for a cheaper alternative to Amazon ;): Side projects can really help here, and the key is to use references, but don't just copy-paste. Think of something you'd like to apply machine learning to with a reasonable scope. Search google scholar/arxiv for papers that do this or something similar, read them, and learn the techniques. For reading research papers in an area where you're not extremely knowledgeable, use the references in the text or google things you don't know and make sure you understand so you could teach someone else. If you're interested in the topic and exhausted the references, go up the tree and use google scholar to find papers that list the one you're reading as a reference - you usually find interesting applications or improvements on the technique. You can also often find open source training data in the appendices of papers. Kaggle also has a ton of datasets, including obviously the ones they provide for competitions.

by anonymous   2019-07-21

Short answer: standard (Bayesian) regression methods usually treat all variables on the same footing, and thus don't incorporate information on the advance of the system. For example, a common assumption is that the probability density is separable p(x_1, ... x_N) = p(x_1) * ... * p(x_N).

Methods like Hidden Markov Models (and also Gaussian Processes, if I remember right my reading of Bishop's Machine Learning), instead use a conditioned probability density (a Markov process), for example:

p(x_1, ... x_N) = p(x_N | x_{N-1}) * ... * p(x_2 |x_1) * p(x_1)

With this it becomes possible to incorporate some information on the evolution of the system.

by anonymous   2019-07-21

In Mathematica the following expression will enumerate all the possible combinations of {0,1} of length 64.

Tuples[{1, 0}, {64}]

But there are 2^62 or 18446744073709551616 of them, so I'm not sure what use that will be to you.

Maybe you just wanted the unique sequences contained in each set, in that case all you need is the Mathematica Union[] function applied to the set. If you have a the sets grouped together in a list in Mathematica, say mySets, then you can apply the Union operator to every set in the list my using the map operator.


If you want to do some type of prediction a little more information might be useful.

Thanks you for the clarifications.

Machine Learning

The task you want to solve falls under the disciplines known by a variety of names, but probably most commonly as Machine Learning or Pattern Recognition and if you know which examples represent the same gestures, your case would be known as supervised learning.

Question: In your case do you know which gesture each example represents ?

You have a series of examples for which you know a label ( the form of gesture it is ) from which you want to train a model and use that model to label an unseen example to one of a finite set of classes. In your case, one of a number of gestures. This is typically known as classification.

Learning Resources

There is a very extensive background of research on this topic, but a popular introduction to the subject is machine learning by Christopher Bishop. Stanford have a series of machine learning video lectures Standford ML available on the web.


You might want to consider how you will determine the accuracy of your system at predicting the type of gesture for an unseen example. Typically you train the model using some of your examples and then test its performance using examples the model has not seen. The two of the most common methods used to do this are 10 fold Cross Validation or repeated 50/50 holdout. Having a measure of accuracy enables you to compare one method against another to see which is superior.

Have you thought about what level of accuracy you require in your task, is 70% accuracy enough, 85%, 99% or better?

Machine learning methods are typically quite sensitive to the specific type of data you have and the amount of examples you have to train the system with, the more examples, generally the better the performance.

You could try the method suggested above and compare it against a variety of well proven methods, amongst which would be Random Forests, support vector machines and Neural Networks. All of which and many more are available to download in a variety of free toolboxes.


Mathematica is a wonderful system, is infinitely flexible and my favourite environment, but out of the box it doesn't have a great deal of support for machine learning.

I suspect you will make a great deal of progress more quickly by using a custom toolbox designed for machine learning. Two of the most popular free toolboxes are WEKA and R both support more than 50 different methods for solving your task along with methods for measuring the accuracy of the solutions.

With just a little data reformatting, you can convert your gestures to a simple file format called ARFF, load them into WEKA or R and experiment with dozens of different algorithms to see how each performs on your data. The explorer tool in WEKA is definitely the easiest to use, requiring little more than a few mouse clicks and typing some parameters to get started.

Once you have an idea of how well the established methods perform on your data you have a good starting point to compare a customised approach against should they fail to meet your criteria.

Handwritten Digit Recognition

Your problem is similar to a very well researched machine learning problem known as hand written digit recognition. The methods that work well on this public data set of handwritten digits are likely to work well on your gestures.

by anonymous   2019-07-21

I think you should look at book: There is a good chapters on modeling point distributions.

by anonymous   2019-07-21

There isn't a set, single, way to feed-forward neural nets - it's a general technique. One popular thing to do is logistic(W*In), where W*In is the dot product of the node's weights and the input nodes' activations, and logistic(x) = 1/(1+e^-x). There are many, many subtleties to applying this method, and the "meat" of the technique is how you determine / train the weights W for each node. I recommend getting a good text on machine learning / neural networks, perhaps - (even if it's not specifically talking about autoencoders, the general techniques used for multi-layer nets will be similar):

by selmat   2019-07-12
From my experience, these resources are worth read:

[1] Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher M. Bishop

Andreas Brandmaier's permutation distribution clustering is a method rooted in the dissimilarities between time series, formalized as the divergence between their permutation distributions. Personally, I think this is your "best" option

by anonymous   2019-01-13


Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, data scientist's work is very much restricted in what they are able to do.

A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.

Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:

  • Which loss-function is better than MSE in temperature prediction?
  • Binary Image Classification with CNN - best practices for choosing “negative” dataset?
  • How do neural networks account for outliers?


The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:

  • What is the 'fundamental' idea of machine learning for estimating parameters?
  • Role of Bias in Neural Networks
  • How to find probability distribution and parameters for real data? (Python 3)

I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.


This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.

by boltzmannbrain   2019-01-03
> study textbooks. Do exercises. Treat it like academic studying

This. Highly recommend Russel & Norvig [1] for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning" [2] and Koller's PGM book [3] for the fundamentals.

Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation [4].

FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.





by Yadi   2018-10-04
In machine learning, hands down these are some of the best related textbooks:

- [0] Pattern Recognition and Machine Learning (Information Science and Statistics)

and also:

- [1] The Elements of Statistical Learning

- [2] Reinforcement Learning: An Introduction by Barto and Sutton

- [3] The Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio

- [4] Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg

Then some math tid-bits:

[5] Introduction to Linear Algebra by Strang

----------- links:

- [0] [PDF](

- [2] [amz](

- [2] [site](

- [3] [pdf](

- [5] [amz](

by neel8986   2018-01-20
This is kind of a masters degree course i created for myself to get knowledge of Machine Learning from bottoms up

First, you need a strong mathematical base. Otherwise, you can copy paste an algorithm or use an API but you will not get any idea of what is happening inside Following concepts are very essential

1) Linear Algebra (MIT ) 2) Probability (Harvard

Both the above course and book are super easy to follow. You will get a good idea of basic concepts but they lack in depth. Now you should move to more intense books and courses

You can get more in-depth knowledge of Machine learning from following sources

1)Nando machine learning course (

Especially Bishops book is really deep and covers almost all basic concepts.

Now for recent advances in Deep learning. I will suggest two brilliant courses from Stanford

1) Vision ( )

2) NLP (

The Vision course by Karparthy can be a very good introduction to Deep learning. Also, the mother book for deep learning ( )is good

by anonymous   2017-08-20

What’s the best approach to recognize patterns in data, and what’s the best way to learn more on the topic?

The best approach is to study pattern recognition and machine learning. I would start with Duda's Pattern Classification and use Bishop's Pattern Recognition and Machine Learning as reference. It would take a good while for the material to sink in, but getting basic sense of pattern recognition and major approaches of classification problem should give you the direction. I can sit here and make some assumptions about your data, but honestly you probably have the best idea about the data set since you've been dealing with it more than anyone. Some of the useful technique for instance could be support vector machine and boosting.

Edit: An interesting application of boosting is real-time face detection. See Viola/Jones's Rapid Object Detection using a Boosted Cascade of Simple Features (pdf). Also, looking at the sample images, I'd say you should try improving the edge detection a bit. Maybe smoothing the image with Gaussian and running more aggressive edge detection can increase detection of smaller cracks.

by anonymous   2017-08-20

I think the best that I know of are:

Stanford's Lectures on Machine Learning

Books: (In decreasing order of ease of understanding - IMHO)

Machine Learning: An Algorithmic Perspective by Stephen Marsland

Pattern Recognition and Machine Learning by Christopher Bishop

Introduction to Machine Learning - Ethem Alpaydin

by anonymous   2017-08-20

In its classical flavour the Support Vector Machine (SVM) is a binary classifier (i.e., it solves classification problems involving two classes). However, it can be also used to solve multi-class classification problems by applying techniques likes One versus One, One Versus All or Error Correcting Output Codes [Alwein et al.]. Also recently, a new modification of the classical SVM the multiclass-SVM allows to solve directly multi-class classification problems [Crammer et al.].

Now as far as it concerns document classification, your main problem is feature extraction (i.e., how to acquire certain classification features from your documents). This is not a trivial task and there's a batch of bibliography on the topic (e.g., [Rehman et al.], [Lewis]).

Once you've overcome the obstacle of feature extraction, and have labeled and placed your document samples in a feature space you can apply any classification algorithm like SVMs, AdaBoost e.t.c.

Introductory books on machine learning: [Flach], [Mohri], [Alpaydin], [Bishop], [Hastie]

Books specific for SVMs: [Schlkopf], [Cristianini]

Some specific bibliography on document classification and SVMs: [Miner et al.], [Srivastava et al.], [Weiss et al.], [Pilászy], [Joachims], [Joachims01], [Joachims97], [Sassano]

by todd8   2017-08-19
Depending on your level of programming ability, one algorithm a day, IMHO, is completely doable. A number of comments and suggestions say that one per day is an unrealistic goal (yes, maybe it is) but the idea of setting a goal and working through a list of algorithms is very reasonable.

If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.

If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.

If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.

Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.

I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.

A great starting point is Sedgewick's book/course, Algorithms [1]. For more depth and theory try [2], Cormen and Leiserson's excellent Introduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick, An Introduction to the Analysis of Algorithms [3]. A classic reference that goes far beyond these other books is of course Knuth [4], suitable for serious students of Computer Science less so as a book of recipes.

After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.

Numerical methods -- Numerical Recipes 3rd Edition: The Art of Scientific Computing by Tuekolsky and Vetterling. I love this book. [5]

Randomized algorithms -- Randomized Algorithms by Motwani and Raghavan. [6], Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher, [7]

Hard problems (like NP) -- Approximation Algorithms by Vazirani [8]. How to Solve It: Modern Heuristics by Michalewicz and Fogel. [9]

Data structures -- Advanced Data Structures by Brass. [10]

Functional programming -- Pearls of Functional Algorithm Design by Bird [11] and Purely Functional Data Structures by Okasaki [12].

Bit twiddling -- Hacker's Delight by Warren [13].

Distributed and parallel programming -- this material gets very hard so perhaps Distributed Algorithms by Lynch [14].

Machine learning and AI related algorithms -- Bishop's Pattern Recognition and Machine Learning [15] and Norvig's Artificial Intelligence: A Modern Approach [16]

These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.

[1] Sedgewick, Algorithms, 2015.

[2] Cormen, et al., Introduction to Algorithms, 2009.

[4] Knuth, The Art of Computer Programming, 2011.

[5] Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007.



[9] Michalewicz and Fogel,

[10] Brass,

[11] Bird,

[12] Okasaki,

[13] Warren,

[14] Lynch,

[15] Bishop,

[16] Norvig,

by tfh   2017-08-19
The author is Chris Bishop... who wrote one of the "essential" machine learning books :