Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Author: Martin Kleppmann
4.7
This Month Reddit 3

Comments

by deepakkarki   2020-11-22
Source and more info : https://martin.kleppmann.com/2020/11/18/distributed-systems-...

This recorded series is from Kleppmann's Concurrent and Distributed Systems course which he teaches at University of Cambridge. In case the name seems familiar, Kleppmann is the author of perhaps HN's favourite book "Designing Data-Intensive Applications" https://www.amazon.com/dp/1449373321

by snicky   2020-09-28
If you are interested in the computer science in general I highly recommend:

1. Structure and Interpretation of Computer Programs (available for free, e.g. here http://sarabander.github.io/sicp/html/index.xhtml

2. https://computationbook.com/

Also, I haven't read it yet, but this book has been praised here a lot recently: https://www.amazon.com/Designing-Data-Intensive-Applications...

by aespinoza   2020-09-17
In the case of technical books, I have found that pricing is not that different from digital to physical books.

This is a good example: https://www.amazon.com/dp/1449373321/ref=cm_sw_r_tw_dp_x_qQ8...

It is $29.99 for Kindle and Physical is $34.91.

by pqb   2020-08-28
I would say exactly the opposite. I regret of buying a book from Amazon [0] dedicated to Kindle-use, because it is DRM protected and I am forced to use "Amazon Kindle" application, otherwise I cannot open it. I am usually okay with DRMs but I miss a fact I haven't bought it elsewhere with less annoying protection.

[0]: https://www.amazon.com/Designing-Data-Intensive-Applications...

Psst, "Designing Data Intensive Applications" was very good read. Do you know similar books that focus on distributed systems?

by avremel   2020-01-03
How would you compare Database Internals to Designing Data Intensive Applications?

[1] https://www.amazon.com/Designing-Data-Intensive-Applications...

by PM_me_goat_gifs   2019-11-17

You say you didn’t do any SQL, but then your CV says you optimized queries. Are you selling yourself short in your post?

One path you could take is to double down on database expertise. To take some time and heavily study databases and infrastructure development. Definitely read Designing Data-Intensive Applications and maybe also read the google Site Reliability Engineering book. Then, read deeply about how a particular database of your choice works. Personally, I would go with Postgres because of the quality of the official documentation, the availability of high-quality unofficial documentation and the open-source community around it. Here’s a neat blog post: https://gocardless.com/blog/debugging-the-postgres-query-planner/

Do a bit to learn about backend development as well and study how data structures work...especially B-trees and hashes. I guess do some leetcode because that seems necessary these days.

Then, pitch yourself as a backend/infrastructure engineer who can compliment others’ OOP experience with your experience in databases. In the present, you can rewrite some of the lines of your CV with an eye toward the business impact and look like you’d slot right in to a scale-up or a medium size team who is starting to run into problems of scaling their application—because DB is often the bottleneck.

An advantage of this is that it makes your future job searches easier by making your first job look like the start to your path, whether you continue on as a backend SWE, go into infra/devops engineering, or go into data engineering.

by PM_me_goat_gifs   2019-11-17

If you are looking for a book recommendation, Designing Data Intensive Applications is great. But yes, you'll likely want to hire someone with experience in this. Sadly, I don't have experience in this hiring that I can point you towards.

by PM_me_goat_gifs   2019-11-17

For system design interviews, read the book https://www.amazon.co.uk/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 and practice explaining concepts from it on a whiteboard.

by malisper   2019-08-25
Referencing my copy of designing-data intensive applications[0], here are some approaches mentioned:

1) The naive approach is to assign all writes to a chunk randomly. This makes reads a lot more expensive as now a read for a particular key (e.g. device) will have to touch every chunk.

2) If you know a particular key is hot, you can spread writes for that particular key to random chunks. You need some extra bookeeping to keep track of which keys you are doing this for.

3) Splitting hot chunks into smaller chunks. You will wind up with varying sized chunks, but each chunk will now have a roughly equal write volume.

One more approach I would like to add is rate-limiting. If the reads or writes for a particular key crosses some threshold, you can drop any additional operations. Of course this is only fine if you are ok with having operations to hot keys often fail.

[0] https://www.amazon.com/Designing-Data-Intensive-Applications...

by meritt   2019-08-01
For anyone eager to read something now, Designing Data-Intensive Applications [1] is an excellent and completed book that covers nearly all of the same material with significant depth.

[1] https://www.amazon.com/Designing-Data-Intensive-Applications...

by healydorf   2019-07-21

I've read a handful of books on general design/architectural stuff involving large pots of data. Designing Data Intensive Applications is my favorite.

Also 3 different management books. The Manager's Path is my favorite in that camp.

by healydorf   2019-07-21

I am a fan of Designing Data-Intensive Applications.

by CowboyFromSmell   2019-07-21

Designing Data Intensive Applications by Martin Kleppmann is a solid overview of the field and gives you plenty more references for further investigation. It starts on singe-host databases and expands out to all kinds of distributed systems. Starting on single host systems is important because it helps you appreciate the designs of the distributed systems that replaced them.

Edit: markdown is hard

by vira28   2019-07-21

On a side note. I am currently reading https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321. Loving it so far. Author clearly explains the difference b/w relational & document model.

Highly recommended.

by Mofo_Turtles   2019-07-21

This book is a very good for Distributed Systems at a high level.

by fernandotakai   2019-07-21

i've been reading Designing Data-Intensive Applications by Martin Kleppman and i would recommend to all backend developers out there that want to step up their game.

(i also love that it's a language agnostic book)

by nw__dataeng   2019-07-15
I'd highly recommend reading [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications...). The book gives you a great overview of designing data systems - foundational knowledge you'll need in any DE role.

The reason you can't find data engineering materials online is because real data engineering really only happens at a handful of companies - and those companies maintain this knowledge base internally and do not share it.

I noticed that you listed tools / frameworks to learn, as well as languages. Another piece of advice would be to not focus on those because they come and go (for example, Hadoop is pretty much deprecated in any DE-heavy company). What lasts is an understanding of distributed systems, distributed query engines, storage technologies, and algorithms & data structures. If you have a firm grasp on those, you won't have to start from scratch every time a new framework is introduced. You'll immediately recognize what problems the tech is solving and how they're solving it, and based on your knowledge you can connect the dots and know if that solution is what you need.

Another thing to do is watch CS186 from Berkeley in its entirety. This course is about relational databases, but will give you the foundation you need to speak the DE language.

Source: I work as a data engineer at what some would call a big company :)

by tracer4201   2019-07-12
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

https://www.amazon.com/Designing-Data-Intensive-Applications...

I read through this book last year when I saw it recommended on HN. I recommended it to engineers on my team at work.

I’m reading it for a second time now, and just finished chapter 2 today. It’s dense but an amazingly detailed and thorough text.

by karolist   2019-07-12
I'll structure this in "current/future/recent_past" format if I may.

Currently:

* The Go Programming Language

https://www.amazon.com/Programming-Language-Addison-Wesley-P...

* Building Microservices

https://www.amazon.com/Building-Microservices-Designing-Fine...

Plan to do next:

* Designing Data-Intensive Applications

https://www.amazon.com/Designing-Data-Intensive-Applications...

* Designing Distributed Systems

https://www.amazon.com/Designing-Distributed-Systems-Pattern...

* Unix and Linux System Administration 5th ed, but probably just gonna skip/read chapters of interest, i.e. I wanna get a better understanding of SystemD.

https://www.amazon.com/UNIX-Linux-System-Administration-Hand...

Read last month:

* Learning React

Good for a quick intro but I probably wouldn't read cover-to-cover again, some sections are old, but overall an OK book.

https://www.amazon.com/Learning-React-Functional-Development...

* React Design Patterns and Best Practices

Really liked this one, picked a tonne of new ideas and approaches that are hard to find otherwise for a newbie in JS scene. These two books, some time spent reading up on webpack and lots of github/practice code made me not scared of JS anymore and not feeling the fatigue. I mean, I was one of the people who dismissed everything frontend related, big node_modules, electron, complicated build systems etc. But now I sort of understand why and am on the different side of the fence.

https://www.amazon.com/React-Design-Patterns-Best-Practices/...

* Flexbox in CSS

Wanted to understand what's the new flexbox layout is about since it's been a while when I've done some serious CSS work. Long story short I made it about half of this and dropped it - not any more useful than MDN docs and actually playing with someone's codepen gave me better understanding in 5 minutes than 3 hours spent with this book.

https://www.amazon.com/Flexbox-CSS-Estelle-Weyl-ebook/dp/B07...

by sambroner   2019-07-12
I haven't read Designing Distributed Systems, but I have read Designing Data-Intensive Applications [0] and it was fantastic.

An overview of databases (what and why, but also a lot of how) plus distributed concepts and modern architectures.

[0] https://www.amazon.com/Designing-Data-Intensive-Applications...