# The Grammar of Graphics (Statistics and Computing)

All
Hacker News 7

This Year
Hacker News 3

This Month
Hacker News 1

Personally, my main encounter with plotting was in Python. I'm not a big fan of matplotlib, I got the impression that with increasing plot complexity, code complexity grew exponentially. Then there's bokeh [0], which I preferred to matplotlib, due to it being more declarative. HoloViews [1] is more declarative than both matplotlib and bokeh, and boasts that "usually [you can] express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting". I've not used HoloViews yet.

Then I've heard of R's ggplot [2], which is based on (or inspired by?) The Grammar of Graphics [3]. This books is definitely something I want to check out.

Vega [4], an “assembly language” for visualization, is neither here nor there as far as this discussion goes, but nonetheless I just stumbled upon it and I'm quite optimistic about the initiative. Maybe someone will not have heard of it.

[0] https://bokeh.pydata.org/en/latest/docs/user_guide/concepts.... [1] http://holoviews.org/ [2] http://r4ds.had.co.nz/data-visualisation.htm [3] https://www.amazon.com/Grammar-Graphics-Statistics-Computing... [4] https://vega.github.io/vega/about/

> To be honest, matplotlib seems a good contender to me (http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis...

[2] http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...

[3] http://vita.had.co.nz/papers/layered-grammar.html

The book is kinda expensive https://smile.amazon.com/Grammar-Graphics-Statistics-Computi... but that's because it is in full color and it has some of the nicest looking and most instructive graphics I've ever seen even for things that I understand, such as Central Limit Theorem. It makes sense the the best graphics would be in the book written by the guy who wrote a book on how to do visualizations mathematically.

The book is also interesting if you are doing any sort of UI interfaces, because UI interfaces are definitely just a subset of graphical visualizations.

# Elements of Programming

https://www.amazon.com/Elements-Programming-Alexander-Stepan...

This book proposes how to write C++-ish code in a mathematical way that makes all your code terse. In this talk, Sean Parent, at that time working on Adobe Photoshop, estimated that the PS codebase could be reduced from 3,000,000 LOC to 30,000 LOC (=100x!!) if they followed ideas from the book https://www.amazon.com/Grammar-Graphics-Statistics-Computing...

This book changed my perception of creativity, aesthetics and mathematics and their relationships. Fundamentally, the book provides all the diverse tools to give you confidence that your graphics are mathematically sound and visually pleasing. After reading this, Tufte just doesn't cut it anymore. It's such a weird book because it talks about topics as disparate Bayesian rule, OOP, color theory, SQL, chaotic models of time (lolwut), style-sheet language design and a bjillion other topics but always somehow all of these are very relevant. It's like if Bret Victor was a book, a tour de force of polymathical insanity.

The book is in full color and it has some of the nicest looking and most instructive graphics I've ever seen even for things that I understand, such as Central Limit Theorem. It makes sense the the best graphics would be in the book written by the guy who wrote a book on how to do visualizations mathematically. The book is also interesting if you are doing any sort of UI interfaces, because UI interfaces are definitely just a subset of graphical visualizations.

# Scala for Machine Learning

https://www.amazon.com/Scala-Machine-Learning-Patrick-Nicola...

This book almost never gets mentioned but it's a superb intro to machine learning if you dig types, scalable back-ends or JVM.

It’s the only ML book that I’ve seen that contains the word monad so if you sometimes get a hankering for some monading (esp. in the context of ML pipelines), look no further.

Discusses setup of actual large scale ML pipelines using modern concurrency primitives such as actors using the Akka framework.

# Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques for Building Intelligent Systems

https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-T...

Not released yet but I've been reading the drafts and it's a nice intro to machine learning using modern ML frameworks, TensorFlow and Scikit-Learn.

# Basic Category Theory for Computer Scientists

https://www.amazon.com/Markov-Logic-Interface-Artificial-Int...

Have you ever wondered what's the relationship between machine learning and logic? If so look no further.

# Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

https://www.amazon.com/Designing-Scalability-Erlang-OTP-Faul...

Even though this is an Erlang book (I don't really know Erlang), 1/3 of the book is devoted to designing scalable and robust distributed systems in a general setting which I found the book worth it on it's own.

# Practical Foundations for Programming Languages

https://www.amazon.com/First-Course-Network-Theory/dp/019872...

Up until recently I didn't know the difference between graphs and networks. But look at me now, I still don't but at least I have a book on it.

and Hadley Wickham wrote about it in http://vita.had.co.nz/papers/layered-grammar.pdf.

I'm no expert, but I think that one of the main ideas is to separate the elements of making a plot from the way that the data is presented. For example, in ggplot2, you have the data that will go into the graph, the type of plot (or "geometry") that defines how the data are presented (scatterplot, bar plot, etc.), and then various "layers" that can be added that affect style.

In order to split a plot into subplots, you simply define how it is to be faceted (what column should be used to define groups). Grammar-of-graphics moves plotting away from the "turtle graphics" model and lets you specify what

shouldbe done. Then ggplot figures out how to do it, kind of like SQL vs. writing for loops to retrieve information.