Data Science from Scratch: First Principles with Python

Category: Programming
Author: Joel Grus
4.3
This Month Reddit 4

Comments

by melling   2020-03-04
No answers. Any recommendations on books or blogs to learn data science?

I working through this book, which I do like:

https://www.amazon.com/Data-Science-Scratch-Principles-Pytho...

I’m rewriting the examples in Swift to help me learn:

https://github.com/melling/data-science-from-scratch-swift

Something with a little more theory might be good. Lots of the questions seem to require more theoretical knowledge.

by melling   2020-02-27
Data science seems like a gateway drug to doing more math.

I’ve been working through Joel Grus’ Data Science from Scratch,

https://www.amazon.com/Data-Science-Scratch-Principles-Pytho...

rewriting the Python examples in Swift:

https://github.com/melling/data-science-from-scratch-swift

by melling   2020-02-19
With Data Science and Machine Learning, my interest in math has greatly increased too. There’s definitely more of a need.

I’m working my way through this book:

https://www.amazon.com/Data-Science-Scratch-Principles-Pytho...

by rewriting the examples in Swift:

https://github.com/melling/data-science-from-scratch-swift

I do like the author’s idea of creating his Simple Statistics library:

https://macwright.org/2012/06/26/simple-statistics.html

You definitely learn much more by building something, even if it’s less than perfect.

by commentsrus   2019-11-17

I work with econ/stat people who are great at running and interpreting models and thinking about causality issues, but don't know much about programming. They've specialized, I get it, but in the future teams would benefit from everyone knowing some basics. It'll also make stats people more productive and help prevent errors. Also also, econ, other sciences, and the policy world really should embrace open source, open science, open access, etc.

But anyway, here's how to do it.

Below are a bunch of random resources. If you're looking for free courses, Software Carpentry has a bunch on the topics listed below and more. The terminal and Bash, Python, R, Matlab, Git, SQL, GNU Make, continuous integration, and data visualization. Data Carpentry has lessons for some of these topics, geared more toward social scientists. Apparently they're developing a course for doing econ with Bash(?). If you're into macro or computational stuff and want to learn Python, can't do wrong with QuantEcon.

I'll echo what the other guy said. If you have a Mac, cool. If not, consider dual booting with linux. It has a reputation for being difficult to use, but Ubuntu, Mint, and ElementaryOS are all very simple and work just like what you're used to in Proprietary World. It's possible to do the following with Windows, but requires a more setup work.

Learn to use the terminal (this is the point of using Mac or Linux, they come with a terminal and unix tools). Here's a decent book on the basics. Learn to navigate around your filesystem, run programs from the terminal, and use a bit of Bash. You can probably skip the chapters on actually programming with Bash. Bash as a programming language is cool, but not super necessary, and kinda quirky. It wouldn't be a waste of time though, since you can do certain things in Bash very quickly and easily. And you'll be a master haxxer.

Check out Data Science at the Command Line for a decent overview of stats programming in a linux environment. Goes over basic Python and R, and other tools to make life simple. There's also The Plain Person's Guide to Plain Text Social Science, geared toward people who do science but may not do programming atm. Covers more useful tools.

Learn Python or R or both. If Python, here. If R, here. If you're into ML, here for Python and possibly here for R but the code may be dated. Still, that book is The intro book for ML.

Learn Git. You should be in the habit of tracking changes you make to your code and the data/results it produces, especially if your data is being shared with anyone. If you use R, here's a great intro to Git and RStudio's fantastic Git integration.

Learn SQL. This one's harder to pick up on your own, at home, since you need a database set up to query. Look at the software/data carpentry courses.

Learn Docker. It makes your analyses/projects more shareable and--gasp--more reproducible (though I've gotten shit in the past for this, so let's compromise and say it helps but doesn't GUARANTEE reproducibility). This one is more optional than the others.

Once you have the basics down, you can do what interests you and learn best practices. Perhaps you want to know about Efficient R Programming (and general best practices). Or best practices in Python and more comprehensive coverage. Or how to make reports and papers with RMarkdown (want to make a paper that looks like it's published in AER? there's a template for that in Rmd).

by Aidtor   2019-11-17

If you want to be valuable to companies post graduation you should learn more about programming (design templates, how to write tests, how to go from a paper to code). I recommend this book as a good starting place. Once you're comfortable with how the different methods work, pick up this book.

by HuShifang   2019-05-12
A new edition of Grus comes out next week actually...

https://www.amazon.com/Data-Science-Scratch-Principles-Pytho...