# Pattern Recognition and Machine Learning (Information Science and Statistics)

## About This Book

This is the first text on pattern recognition to present the Bayesian viewpoint, one that has become increasing popular in the last five years.

It presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It provides the first text to use graphical models to describe probability distributions when there are no other books that apply graphical models to machine learning. It is also the first four-color book on pattern recognition.

The book is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. Extensive support is provided for course instructors, including more than 400 exercises, graded according to difficulty. Example solutions for a subset of the exercises are available from the book web site, while solutions for the remainder can be obtained by instructors from the publisher.

https://www.amazon.com/Pattern-Recognition-Learning-Informat...

Having done an MEng at Oxford where I dabbled in ML, the 3 key texts that came up as references in a lot of lectures were these:

Pattern Recognition and Machine Learning (Information Science and Statistics) (Information Science and Statistics) https://www.amazon.co.uk/dp/0387310738/ref=cm_sw_r_cp_apa_i_TZGnDb24TFV9M

Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning Series) https://www.amazon.co.uk/dp/0262018020/ref=cm_sw_r_cp_apa_i_g1GnDb5VTRRP9

(Pretty sure Murphy was one of our lecturers actually?)

Bayesian Reasoning and Machine Learning https://www.amazon.co.uk/dp/0521518148/ref=cm_sw_r_cp_apa_i_81GnDbV7YQ2WJ

There were ofc others, and plenty of other sources and references too, but you can't go buying dozens of text books, not least cuz they would repeat the same things. If you need some general maths reading too then pretty much all the useful (non specialist) maths we used for 4 years is all in this: Advanced Engineering Mathematics https://www.amazon.co.uk/dp/0470646136/ref=cm_sw_r_cp_apa_i_B5GnDbNST8HZR

If you're worried about not doing projects and participating in Kaggle competitions, why not do those things? They're pretty low risk and high reward. If you're feeling shaky on the theory, read papers and reference textbooks, take notes, and implement things that interest you. For deep learning stuff there are some good resources here: https://github.com/ChristosChristofidis/awesome-deep-learning. For more traditional methods you can't go wrong with Chris Bishop's book (try googling it for a cheaper alternative to Amazon ;): https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738. Side projects can really help here, and the key is to use references, but don't just copy-paste. Think of something you'd like to apply machine learning to with a reasonable scope. Search google scholar/arxiv for papers that do this or something similar, read them, and learn the techniques. For reading research papers in an area where you're not extremely knowledgeable, use the references in the text or google things you don't know and make sure you understand so you could teach someone else. If you're interested in the topic and exhausted the references, go up the tree and use google scholar to find papers that list the one you're reading as a reference - you usually find interesting applications or improvements on the technique. You can also often find open source training data in the appendices of papers. Kaggle also has a ton of datasets, including obviously the ones they provide for competitions.

Short answer: standard (Bayesian) regression methods usually treat all variables on the same footing, and thus don't incorporate information on the advance of the system. For example, a common assumption is that the probability density is separable

`p(x_1, ... x_N) = p(x_1) * ... * p(x_N)`

.Methods like Hidden Markov Models (and also Gaussian Processes, if I remember right my reading of Bishop's Machine Learning), instead use a conditioned probability density (a Markov process), for example:

With this it becomes possible to incorporate some information on the evolution of the system.

In Mathematica the following expression will enumerate all the possible combinations of {0,1} of length 64.

But there are 2^62 or 18446744073709551616 of them, so I'm not sure what use that will be to you.

Maybe you just wanted the unique sequences contained in each set, in that case all you need is the Mathematica Union[] function applied to the set. If you have a the sets grouped together in a list in Mathematica, say mySets, then you can apply the Union operator to every set in the list my using the map operator.

If you want to do some type of prediction a little more information might be useful.

Thanks you for the clarifications.

Machine LearningThe task you want to solve falls under the disciplines known by a variety of names, but probably most commonly as Machine Learning or Pattern Recognition and if you know which examples represent the same gestures, your case would be known as supervised learning.

Question: In your case do you know which gesture each example represents ?You have a series of examples for which you know a label ( the form of gesture it is ) from which you want to train a model and use that model to label an unseen example to one of a finite set of classes. In your case, one of a number of gestures. This is typically known as classification.

Learning ResourcesThere is a very extensive background of research on this topic, but a popular introduction to the subject is machine learning by Christopher Bishop. Stanford have a series of machine learning video lectures Standford ML available on the web.

AccuracyYou might want to consider how you will determine the accuracy of your system at predicting the type of gesture for an unseen example. Typically you train the model using some of your examples and then test its performance using examples the model has not seen. The two of the most common methods used to do this are 10 fold Cross Validation or repeated 50/50 holdout. Having a measure of accuracy enables you to compare one method against another to see which is superior.

Have you thought about what level of accuracy you require in your task, is 70% accuracy enough, 85%, 99% or better?

Machine learning methods are typically quite sensitive to the specific type of data you have and the amount of examples you have to train the system with, the more examples, generally the better the performance.

You could try the method suggested above and compare it against a variety of well proven methods, amongst which would be Random Forests, support vector machines and Neural Networks. All of which and many more are available to download in a variety of free toolboxes.

ToolboxesMathematica is a wonderful system, is infinitely flexible and my favourite environment, but out of the box it doesn't have a great deal of support for machine learning.

I suspect you will make a great deal of progress more quickly by using a custom toolbox designed for machine learning. Two of the most popular free toolboxes are WEKA and R both support more than 50 different methods for solving your task along with methods for measuring the accuracy of the solutions.

With just a little data reformatting, you can convert your gestures to a simple file format called ARFF, load them into WEKA or R and experiment with dozens of different algorithms to see how each performs on your data. The explorer tool in WEKA is definitely the easiest to use, requiring little more than a few mouse clicks and typing some parameters to get started.

Once you have an idea of how well the established methods perform on your data you have a good starting point to compare a customised approach against should they fail to meet your criteria.

Handwritten Digit RecognitionYour problem is similar to a very well researched machine learning problem known as hand written digit recognition. The methods that work well on this public data set of handwritten digits are likely to work well on your gestures.

I think you should look at book: http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738 There is a good chapters on modeling point distributions.

There isn't a set, single, way to feed-forward neural nets - it's a general technique. One popular thing to do is

`logistic(W*In)`

, where`W*In`

is the dot product of the node's weights and the input nodes' activations, and`logistic(x) = 1/(1+e^-x)`

. There are many, many subtleties to applying this method, and the "meat" of the technique is how you determine / train the weights`W`

for each node. I recommend getting a good text on machine learning / neural networks, perhaps - (even if it's not specifically talking about autoencoders, the general techniques used for multi-layer nets will be similar): http://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738/ref=zg_bs_3894_3[1] Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher M. Bishop

Andreas Brandmaier's permutation distribution clustering is a method rooted in the dissimilarities between time series, formalized as the divergence between their permutation distributions. Personally, I think this is your "best" option https://www.amazon.com/Pattern-Recognition-Learning-Informat...

## Why?

Probability theory is very important for modern data-science and machine-learning applications, because (in a lot of cases) it allows to "open up a black box" and shed some light into the model's inner workings, and with luck find necessary ingredients to transform a poor model into a great model. Without it, data scientist's work is very much restricted in what they are able to do.

A PDF is a fundamental building block of the probability theory, absolutely necessary to do any sort of probability reasoning, along with expectation, variance, prior and posterior, and so on.

Some examples here on StackOverflow, from my own experience, where a practical issue boils down to understanding data distribution:

## When?

The questions above provide some examples, here're a few more if you're interested, and the list is by no means complete:

I personally try to find probabilistic interpretation whenever possible (choice of loss function, parameters, regularization, architecture, etc), because this way I can move from blind guessing to making reasonable decisions.

## Reading

This is very opinion-based, but at least few books are really worth mentioning: The Elements of Statistical Learning, An Introduction to Statistical Learning: with Applications in R or Pattern Recognition and Machine Learning (if your primary interest is machine learning). That's just a start, there dozens of books on more specific topics, like computer vision, natural language processing and reinforcement learning.

This. Highly recommend Russel & Norvig [1] for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning" [2] and Koller's PGM book [3] for the fundamentals.

Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation [4].

FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.

[1] https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

[2] https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...

[4] https://youtu.be/-z5lKPHcumo

- [0] Pattern Recognition and Machine Learning (Information Science and Statistics)

and also:

- [1] The Elements of Statistical Learning

- [2] Reinforcement Learning: An Introduction by Barto and Sutton

- [3] The Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio

- [4] Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg

Then some math tid-bits:

[5] Introduction to Linear Algebra by Strang

----------- links:

- [0] [PDF](https://www.amazon.com/Pattern-Recognition-Learning-Informat...)

- [2] [amz](https://www.amazon.com/Reinforcement-Learning-Introduction-A...)

- [2] [site](https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma...)

- [3] [pdf](https://www.amazon.com/Language-Processing-Synthesis-Lecture...)

- [5] [amz](https://www.amazon.com/Introduction-Linear-Algebra-Gilbert-S...)

First, you need a strong mathematical base. Otherwise, you can copy paste an algorithm or use an API but you will not get any idea of what is happening inside Following concepts are very essential

1) Linear Algebra (MIT https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra... ) 2) Probability (Harvard https://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/00...)

Both the above course and book are super easy to follow. You will get a good idea of basic concepts but they lack in depth. Now you should move to more intense books and courses

You can get more in-depth knowledge of Machine learning from following sources

1)Nando machine learning course ( https://www.amazon.in/Pattern-Recognition-Learning-Informati...)

Especially Bishops book is really deep and covers almost all basic concepts.

Now for recent advances in Deep learning. I will suggest two brilliant courses from Stanford

1) Vision ( https://www.youtube.com/watch?v=NfnWJUyUJYU )

2) NLP ( https://www.youtube.com/watch?v=OQQ-W_63UgQ)

The Vision course by Karparthy can be a very good introduction to Deep learning. Also, the mother book for deep learning ( http://www.deeplearningbook.org/ )is good

Some people might prefer seeing things from another approach. http://pgm.stanford.edu/

Good luck.

The best approach is to study pattern recognition and machine learning. I would start with Duda's Pattern Classification and use Bishop's Pattern Recognition and Machine Learning as reference. It would take a good while for the material to sink in, but getting basic sense of pattern recognition and major approaches of classification problem should give you the direction. I can sit here and make some assumptions about your data, but honestly you probably have the best idea about the data set since you've been dealing with it more than anyone. Some of the useful technique for instance could be support vector machine and boosting.

Edit: An interesting application of boosting is real-time face detection. See Viola/Jones's Rapid Object Detection using a Boosted Cascade of Simple Features (pdf). Also, looking at the sample images, I'd say you should try improving the edge detection a bit. Maybe smoothing the image with Gaussian and running more aggressive edge detection can increase detection of smaller cracks.I think the best that I know of are:

Stanford's Lectures on Machine Learning

Books: (In decreasing order of ease of understanding - IMHO)

Machine Learning: An Algorithmic Perspective by Stephen Marsland

Pattern Recognition and Machine Learning by Christopher Bishop

Introduction to Machine Learning - Ethem Alpaydin

In its classical flavour the Support Vector Machine (SVM) is a binary classifier (i.e., it solves classification problems involving two classes). However, it can be also used to solve multi-class classification problems by applying techniques likes One versus One, One Versus All or Error Correcting Output Codes [Alwein et al.]. Also recently, a new modification of the classical SVM the multiclass-SVM allows to solve directly multi-class classification problems [Crammer et al.].

Now as far as it concerns document classification, your main problem is feature extraction (i.e., how to acquire certain classification features from your documents). This is not a trivial task and there's a batch of bibliography on the topic (e.g., [Rehman et al.], [Lewis]).

Once you've overcome the obstacle of feature extraction, and have labeled and placed your document samples in a feature space you can apply any classification algorithm like SVMs, AdaBoost e.t.c.

Introductory books on machine learning: [Flach], [Mohri], [Alpaydin], [Bishop], [Hastie]

Books specific for SVMs: [Schlkopf], [Cristianini]

Some specific bibliography on document classification and SVMs: [Miner et al.], [Srivastava et al.], [Weiss et al.], [Pilászy], [Joachims], [Joachims01], [Joachims97], [Sassano]

If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.

If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.

If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.

Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.

I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.

A great starting point is Sedgewick's book/course,

Algorithms[1]. For more depth and theory try [2], Cormen and Leiserson's excellentIntroduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick,An Introduction to the Analysis of Algorithms[3]. A classic reference that goes far beyond these other books is of course Knuth [4], suitable for serious students of Computer Science less so as a book of recipes.After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.

Numerical methods --

Numerical Recipes 3rd Edition: The Art of Scientific Computingby Tuekolsky and Vetterling. I love this book. [5]Randomized algorithms --

Randomized Algorithmsby Motwani and Raghavan. [6],Probability and Computing: Randomized Algorithms and Probabilistic Analysisby Michael Mitzenmacher, [7]Hard problems (like NP) --

Approximation Algorithmsby Vazirani [8].How to Solve It: Modern Heuristicsby Michalewicz and Fogel. [9]Data structures --

Advanced Data Structuresby Brass. [10]Functional programming --

Pearls of Functional Algorithm Designby Bird [11] andPurely Functional Data Structuresby Okasaki [12].Bit twiddling --

Hacker's Delightby Warren [13].Distributed and parallel programming -- this material gets very hard so perhaps

Distributed Algorithmsby Lynch [14].Machine learning and AI related algorithms -- Bishop's

Pattern Recognition and Machine Learning[15] and Norvig'sArtificial Intelligence: A Modern Approach[16]These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.

[1] Sedgewick, Algorithms, 2015. https://www.amazon.com/Algorithms-Fourth-Deluxe-24-Part-Lect...

[2] Cormen, et al., Introduction to Algorithms, 2009. https://www.amazon.com/Introduction-Analysis-Algorithms-2nd/...

[4] Knuth, The Art of Computer Programming, 2011. https://www.amazon.com/Computer-Programming-Volumes-1-4A-Box...

[5] Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007. https://www.amazon.com/Numerical-Recipes-3rd-Scientific-Comp...

[6] https://www.amazon.com/Randomized-Algorithms-Rajeev-Motwani/...

[7]https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazi...

[9] Michalewicz and Fogel, https://www.amazon.com/How-Solve-Heuristics-Zbigniew-Michale...

[10] Brass, https://www.amazon.com/Advanced-Data-Structures-Peter-Brass/...

[11] Bird, https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...

[12] Okasaki, https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

[13] Warren, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

[14] Lynch, https://www.amazon.com/Distributed-Algorithms-Kaufmann-Manag...

[15] Bishop, https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[16] Norvig, https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

http://www.amazon.com/Pattern-Recognition-Learning-Informati... http://www.amazon.com/The-Elements-Statistical-Learning-Pred...

http://www.amazon.com/Pattern-Recognition-Learning-Informati...