Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Author: Martin Kleppmann
4.7
This Month Reddit 2

Comments

by yla92   2022-06-30
Great post. Also highly recommend Designing Data-Intensive Applications by Martin Kleppmann (https://www.amazon.com/Designing-Data-Intensive-Applications...). The sections on "Storage and Retrieval", "Replication", "Partitioning" and "Transactions" really opened up my eyes!
by tacon   2022-05-30
This books has an excellent reputation for the foundations of data-intensive software:

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

https://www.amazon.com/Designing-Data-Intensive-Applications...

by fbftethrowaway   2022-01-04
I’m a Midwest dev w/ 8 YOE at non big tech who got multiple FAANG+ offers last year. I wrote the below guide for friends interested in following the same path so I’ll just post it here.

Took about four months of studying ~2 hours daily.

0. Total Compensation (TC)

Compensation data: https://www.levels.fyi/

Get the app Blind and start browsing it daily. People regularly post their offers, and it is the most up to date info on the market. It’s an anonymous forum where your company email is verified. You can DM employees of target companies for referrals or information about roles.

1. Leetcode (LC)

Buy a yearlong Leetcode premium subscription and do all the modules listed here, in no particular order, but skip decision trees and machine learning: https://leetcode.com/explore/learn/

When you are done with that, do all the problems on this list: https://www.teamblind.com/post/New-Year-Gift---Curated-List-...

A lot of these problems are on the modules linked previously, so you will only have 30-40 new problems here

Next, do random problems until you "see through the matrix."

Focus on medium level problems. Try to do something like 35% easy, 50% medium, 15% hard.

If you can't find the optimal solution to a problem, "upsolve" by reading a bit of the solution and trying again. If you still can't get it, copy the code of the solution and study it. Then erase it and try to solve it from memory.

Periodically go back over solved problems and re-solve them while taking notes.

Your goal should be to solve two random LC mediums in ~35 minutes. Consider using Python as your interview language if you are comfortable enough with it. It's faster than Java for writing.

Some places will have you run the code, others it will be a glorified whiteboard, so don't use the run button as a crutch.

Around two weeks before your interview, start doing company tagged problems like: https://leetcode.com/company/doordash/

Start doing this part first and grind it hard. It might take 3 months, it might take a year. It takes as long as it takes until you think you can crush it. I spent around 2 hrs each day in the morning on LC.

2. System Design

If you are being considered for senior level roles, this will be by far the most important part of your interview as far as leveling. If you are shaky, they will downlevel.

Buy DDIA: https://www.amazon.com/Designing-Data-Intensive-Applications...

Read it more than once.

These courses on educative.io are useful: https://www.educative.io/courses/grokking-the-system-design-... https://www.educative.io/courses/grokking-adv-system-design-...

These videos are also really good: https://www.codekarle.com/

Tech talks on Cassandra/Kafka and stuff like that are good.

Videos are the best last minute prep before interviews for design.

3. Companies

Amazon tends to be easier in terms of LC problems but ask more behavioral. Amazon also has a reputation of being stressful and pay is not at the level of Meta/Google, though that might be changing. I would do this interview first since it’s good practice for getting behavioral stories real sharp.

Google is way slower than these other companies, so if you wanna consider them, get the process started as early as you can.

If you are interested in remote, also consider Zoom, Square, Twitter, and Coinbase.

4. Applying

Get referrals wherever you can. Most places will ignore you unless you have them. I applied to probably 25+ companies and got rejects or ignored for all but Uber and AirBnB. Places I had referrals to I scored onsites for 100% of the time, including places that rejected me before a referral.

You can get referrlas off Blind. I didn’t do this, but I guess it happens! You probably also have people somewhere in your network in FANG and top tier companies if you look. If people think you have a chance of passing they’ll be happy to refer. Referral bonuses are several thousand dollars. Ask them for mock interviews as well.

5. Interviewing

The process is recruiter call -> "phone screen" (do an LC problem on Hackerrank while on a zoom call) -> "onsite" which is 5 hours of zoom...usually 2 coding, 1 behavioral (maybe a small coding question as well), 1 design.

Do mock interviews with friends/colleagues for LC problems. I had 3 different people give me a total of 6 mock interviews. You can also pay for this with different companies like interviewing.io or randoms off Blind.

Getting mock interviews for system design is harder, and you might have to pay for it. I did and it was the best money I spent that year.

Also for interviews you can interview over 2-3 days after 3pm PST to avoid taking time off work if you’re not in PST.

Recruiters will let you push back interviews for any reason multiple times, especially if it's for more interview prep, so if you aren't where you want to be before one, it's totally fine to ask for more time.

6. Negotiating

You should try to get all your interviews lined up very close together to get competing offers, especially if you want Google, who tends to lowball candidates that do not have competing offers.

by PM_me_goat_gifs   2021-12-10

> scalability was a rare issue

Designing Data-Intensive Applications is a great book. Get yourself into some good personal habits, learn to cook efficiently, find a good gym near your new job, and spend some time sitting in the park reading that book.

by PM_ME_YOUR_DOOTFILES   2021-12-10

> Data Intensive systems book

Are you referring to this book? Seems like a good book according to Amazon.

by adampk   2021-11-25
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems https://www.amazon.com/Designing-Data-Intensive-Applications...

I surprisingly really enjoyed it. Well written and it pulled back the veil on a lot of concepts that I thought were too complex for me to understand/enjoy.

by jacke   2021-11-23
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann https://www.amazon.com/Designing-Data-Intensive-Applications...

You can learn a lot of algorithms. It's useless unless you start to create architecture and use them in practice.

by master_yoda_1   2021-01-15
Coding and building teach you more than taking a course or watching a video. If you don't have any programming background, you can enroll in some coursera or udacity courses to start with. Then go through this course https://www.amazon.com/Designing-Data-Intensive-Applications.... Also learn some sql. Take some data, feed into sql light db, and ask question and convert question into query. Becoming good at this takes some time. Be patience. The learning curve is like hokey stick, initial phase might have a dip but it accelerate in the later phase. BY ANY CHANCE DO NOT JOIN A BOOTCAMP.
by deepakkarki   2020-11-22
Source and more info : https://martin.kleppmann.com/2020/11/18/distributed-systems-...

This recorded series is from Kleppmann's Concurrent and Distributed Systems course which he teaches at University of Cambridge. In case the name seems familiar, Kleppmann is the author of perhaps HN's favourite book "Designing Data-Intensive Applications" https://www.amazon.com/dp/1449373321

by snicky   2020-09-28
If you are interested in the computer science in general I highly recommend:

1. Structure and Interpretation of Computer Programs (available for free, e.g. here http://sarabander.github.io/sicp/html/index.xhtml

2. https://computationbook.com/

Also, I haven't read it yet, but this book has been praised here a lot recently: https://www.amazon.com/Designing-Data-Intensive-Applications...

by aespinoza   2020-09-17
In the case of technical books, I have found that pricing is not that different from digital to physical books.

This is a good example: https://www.amazon.com/dp/1449373321/ref=cm_sw_r_tw_dp_x_qQ8...

It is $29.99 for Kindle and Physical is $34.91.

by pqb   2020-08-28
I would say exactly the opposite. I regret of buying a book from Amazon [0] dedicated to Kindle-use, because it is DRM protected and I am forced to use "Amazon Kindle" application, otherwise I cannot open it. I am usually okay with DRMs but I miss a fact I haven't bought it elsewhere with less annoying protection.

[0]: https://www.amazon.com/Designing-Data-Intensive-Applications...

Psst, "Designing Data Intensive Applications" was very good read. Do you know similar books that focus on distributed systems?

by avremel   2020-01-03
How would you compare Database Internals to Designing Data Intensive Applications?

[1] https://www.amazon.com/Designing-Data-Intensive-Applications...

by PM_me_goat_gifs   2019-11-17

You say you didn’t do any SQL, but then your CV says you optimized queries. Are you selling yourself short in your post?

One path you could take is to double down on database expertise. To take some time and heavily study databases and infrastructure development. Definitely read Designing Data-Intensive Applications and maybe also read the google Site Reliability Engineering book. Then, read deeply about how a particular database of your choice works. Personally, I would go with Postgres because of the quality of the official documentation, the availability of high-quality unofficial documentation and the open-source community around it. Here’s a neat blog post: https://gocardless.com/blog/debugging-the-postgres-query-planner/

Do a bit to learn about backend development as well and study how data structures work...especially B-trees and hashes. I guess do some leetcode because that seems necessary these days.

Then, pitch yourself as a backend/infrastructure engineer who can compliment others’ OOP experience with your experience in databases. In the present, you can rewrite some of the lines of your CV with an eye toward the business impact and look like you’d slot right in to a scale-up or a medium size team who is starting to run into problems of scaling their application—because DB is often the bottleneck.

An advantage of this is that it makes your future job searches easier by making your first job look like the start to your path, whether you continue on as a backend SWE, go into infra/devops engineering, or go into data engineering.

by PM_me_goat_gifs   2019-11-17

If you are looking for a book recommendation, Designing Data Intensive Applications is great. But yes, you'll likely want to hire someone with experience in this. Sadly, I don't have experience in this hiring that I can point you towards.

by PM_me_goat_gifs   2019-11-17

For system design interviews, read the book https://www.amazon.co.uk/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 and practice explaining concepts from it on a whiteboard.

by malisper   2019-08-25
Referencing my copy of designing-data intensive applications[0], here are some approaches mentioned:

1) The naive approach is to assign all writes to a chunk randomly. This makes reads a lot more expensive as now a read for a particular key (e.g. device) will have to touch every chunk.

2) If you know a particular key is hot, you can spread writes for that particular key to random chunks. You need some extra bookeeping to keep track of which keys you are doing this for.

3) Splitting hot chunks into smaller chunks. You will wind up with varying sized chunks, but each chunk will now have a roughly equal write volume.

One more approach I would like to add is rate-limiting. If the reads or writes for a particular key crosses some threshold, you can drop any additional operations. Of course this is only fine if you are ok with having operations to hot keys often fail.

[0] https://www.amazon.com/Designing-Data-Intensive-Applications...

by meritt   2019-08-01
For anyone eager to read something now, Designing Data-Intensive Applications [1] is an excellent and completed book that covers nearly all of the same material with significant depth.

[1] https://www.amazon.com/Designing-Data-Intensive-Applications...

by healydorf   2019-07-21

I've read a handful of books on general design/architectural stuff involving large pots of data. Designing Data Intensive Applications is my favorite.

Also 3 different management books. The Manager's Path is my favorite in that camp.

by healydorf   2019-07-21

I am a fan of Designing Data-Intensive Applications.