The ultimate interview preparation framework. Part 4: System design

← Back to list
Posted on 20.05.2024
Image by AI on Midjourney

A system design interview is ...

The next part of the Ultimate interview preparation framework demystifies the system design interviews.

So far in my career I've met three types of interviews they call "System design":

  • A candidate is only given a task that demands to design a distributed system, and then it's required to infer constraints and negotiate a possible solution. This is the good old "FAANG style" system design.
  • The candidate is given a snippet of code and asked to improve it to the best degree possible. Typically that best degree is a fully-fledged microservice.
  • The candidate is given a certain database structure and asked to fix it. So they must fix obvious things like absence of primary and foreign keys, de-normalisation-without-apparent-reason, and so on.

An interviewer typically says the interview is open-ended, so it potentially can go on and on. However, the way they say this is a bit deceiving, because the amount of time given is limited, and typically there is always a certain checklist of things they expect to hear and cover. Failing to do so will most likely lead to rejection. That's why:

  • Always know how much time you have. Normally it is one hour, sometimes 1.5 hours. Don't loose track of time, plan your interview.
  • Don't spend too much time on the introduction round. In most of the cases this is just a certain kind of a ritual that typically doesn't let you earn any score.
  • Always know what the expectations are. Is it to build a distributed system? Or just fix the code? It is better to discuss this with the recruiter beforehand.
  • Don't spend time on chit-chat and idle talks, trying to show good personality of yours. Good personality won't get you the offer. Be polite and positive, but focus on the task.
  • Always listen to the interviewer, as they may steer the discussion to the desired direction, in case if you are a bit off the road.
  • Drive the interview process, as this is always expected.

About the technical part

From the technical point of view, there is a set of theoretical concepts about the system design as such, that are typically assumed and not discussed in depth, but it is expected for the candidate to have good understanding of.

This section is heavily inspired by an amazing book System Design Interview – An insider's guide, Vol. I that I've recently read. I highly recommend it.

CAP theorem

In every distributed system there are three parameters:

  • Consistency - each part of the system must have the most recent data.
  • Availability - the system must still function even if some part of it is down.
  • Network Partition tolerance - the system must tolerate temporary or permanent disruptions of connectivity between its parts.

The CAP theorem basically states, that any system always sacrifices either one of three properties, and keeps the rest two, so it's always a trade-off. It sounds intimidating, but in real life it boils down to the following:

  • Since there is no quantum data transmitters yet in our WIFI-routers, the connectivity will always suffer from partitioning. It means, that literary any system must be ready to cope with it.
  • Keeping the previous statement in mind, there are two possibilities left:
    • CP systems - consistency-first oriented. Such systems have a lot of blocking and sync operations in order to be consistent, I don't think a candidate will be asked to build one of those. A good example of such a system is a banking application.
    • AP systems - can be inconsistent, but have high availability. This is 99% of all web applications.

Eventual consistency

When building an AP system, one rule must be set: the system is allowed to be inconsistent, but only for a certain period of time (the shorter, the better). Eventually, the system must become consistent again, until the next change happens. This is called eventual consistency. A good example of an eventually consistent system is two microservices A and B talking to each other using Kafka. The microservice A received an update, and the update was communicated to the microservice B, but the message is not yet consumed and remains in a queue. It is said, that the system is currently inconsistent. But give it time, it will become consistent, eventually.

Vertical Scaling vs Horizontal Scaling

When load on one of the system's element grows, and it starts choking up, there are two ways to deal with it.

  • Vertical scaling - giving a instance more CPU power and memory amount to cope with increased load.
    • 👍 Rather cheap to do, especially in the cloud.
    • 👎 There is always a limit to what the amount of resources can be increased.
  • Horizontal scaling - add more copies of the element, and spread the load between them.
    • 👍 In theory, thanks to consistent hashing, horizontal scaling could grow the system to any amount of copies / partitions.
    • 👎 Additional infrastructure is needed to manage traffic between the copies / partitions.

When it is said, that the system should be able to process high load and hundreds of thousands of requests per second, it usually means the system should effectively scale horizontally.

Trading space for time and the other way around

It ain't possible to cheat laws of physics. The system you build always makes a tradeoff between time of execution and memory consumption. The algorithm in use can be either fast, but hungry on memory, or the other way around, or something balanced.

This is why every algorithm typically has two parameters of effectiveness: time complexity and space complexity. And that's exactly the reason why you shouldn't claim that the Bubble Sort algo sucks and Quicksort doesn't. It simply depends on what circumstances every specific algorithm is used under: the Bubble Sort has space complexity of O(1), so it is perfect for micro-controllers that only have as little as 2kb of RAM and low performance is generally expected.

Sync and async actions

There are two types of processes that usually take place inside any system.

  • Synchronous - an action is performed right away, and the user or another element of the system, which triggered the action, must wait. Examples:
    • a user gets a list of books via REST to see it in the UI,
    • one service calls another via gRPC to get a list of products.
  • Asynchronous - an action is triggered and then enqueued, thus postponed. The user or element moves on with their business and gets notified when the action is completed. Examples:
    • generating image previews after upload,
    • generate CSV files containing exported data and upload to a bucket.

Before committing to either one, the following question must be asked and answered: "Is the result needed now (in real time) or later?".

In order for the system to successfully scale horizontally to handle high amount of load, all heavy processes in the system must be switched over to the async mode.

In-memory vs disk operations

Every system that stores the data must decide where to store it. There are two options:

  • in-memory storage (e.g. Redis)
  • on-disk storage (file system, relational databases)

As usual, there is no right or wrong approach, it all depends on the system requirements and the type of data the system deals with. Large files can't be stored in memory (at least entirely), frequently read data shouldn't be stored on disk due to high access latency.

Rule of a thumb: in-memory is faster, disk operations are slower.

Stateful vs stateless API

The term stateless means that at any point of time any instance of the application can serve any request from any user identically successfully. Simply put, a user session isn't stored on the server.

However, being stateful isn't necessary a bad thing. Websockets are stateful, because a permanent connection between a client and a server is preserved and thus every user is "bound" to a specific instance of the system.

Streaming can also be both stateless and stateful, depending on the concrete task.

If you want your system to effectively scale horizontally, you generally want your API stateless.

Write-optimised vs read-optimised systems

It's always a good idea to understand whether the data in your system will be more frequently read or more frequently written. This is important to know, because then there is room for some optimisations. Most of the web applications are read-optimised, but there are exceptions such as metric and log aggregators, where there is constant influx of new data.

Consistent hashing

Consistent hashing is a technique that keeps that whole idea of horizontal scaling afloat. Imagine we have N instances that process user requests. Basically, thanks to the consistent hashing we can take any incoming ID and then map it to one of the instances. When the same ID comes next time, it is mapped to the same instance once again. Furthermore, new additional or replacement instances can be added to the pool, old instances - removed, and the system can also keep track of the unhealthy instances and promptly redirect the requests.

A good application of a consistent hashing is database partitioning. Since every record has an ID, it can be unequivocally mapped to a certain instance now and later.

In order for a database, message broker or in-memory store to scale horizontally, it should support partitioning.

Application tiers

When designing the application, we must clearly understand the request flow, and what data we can trust, and which can't be trusted. Also, most of the time data is sensitive and must be protected by authorization and authentication. When the data is a subject to a change, we must also record who introduced those changes.

Typically, there are these tiers:

  • Public tier (or web tier) - untrusted tier, must be handled with caution and ideally should be protected with authentication,
  • VPC tier - trusted, because in this tier all communications between microservices happen,
  • VPN tier - half-trusted I would say. In case of a VPN, we know that the tier is not entirely public, but we still might want to check who does what,
  • Database tier - here the data itself is trusted, but the authentication should still happen to protect the data from unauthorized access.


We shouldn't neglect observability. Handling an un-observed service is like flying a plane without any cockpit devices: all you know is the engine makes sound and the earth is below, not above, which means the plain is still in the air and hasn't crushed (yet). But, obviously, this is not enough.

Observability (aka O11y) implements cockpit instrumentation for your application.

Here is a list of standard metrics that are typically of interest:

  • Request per second (RPS), per endpoint and total
  • Request duration, per endpoint
  • P99, per endpoint - shows the slowest endpoints
  • CPU & Memory consumption
  • Daily active users (DAU)

In order to observe how effectively your system scales, you need to have observability instrumentation in place.

The contract

One of the most important thing is to describe how the microservice communicates with the user and other parts of the cloud native application.

There are several transports that are good to know:

  • REST
  • RPC (gRPC)
  • Websockets
  • Streaming
  • Event-based (Kafka, Google Pub/Sub, RabbitMQ, ...)

A bonus would be to mention ways to mitigate DDoS attack and prevention of resource bottlenecks by introducing:

  • API rate limiter - when a client sends too many requests, after reaching a certain threshold the requests are denied. Then there is cool down period.
  • Consumer throttling for the event based communication - when messages start coming in big numbers, you typically want to start making short pauses between acknowledgments, otherwise the CPU and DB CPU will start spiking and the events will start piling up.

Key system components

Each system always consists of a set of smaller components, that can be considered as "building blocks". When building a cloud native application, you typically want to re-use these blocks.

Load balancing

A load balancer is a special kind of software that distributes requests between multiple instances of the application. In basic situations, the round-robin algorithm is used, but there may be variations. Typically the balancer is a built-in feature of the Cloud Platform or K8s, but on a C4 diagram it must be clearly highlighted.

Databases: SQL vs NoSQL

When it comes to structured keeping of data, databases come into play. There are two main cohorts of databases:

  • traditional SQL (MySQL/MariaDB/Percona, Postgres, AWS Aurora, RDS, GCP BigQuery, etc.)
  • NoSQL (MongoDB, Redis, AWS DynamoDB, graph databases such as AWS Neptune, etc.)

As everything mentioned above, every type of a database comes with own tradeoffs. To make a choice, it ultimately boils down to answering certain questions, including, but not limited to:

  • Are flexible reports needed? If yes, proceed with SQL.
  • Is your data mostly flat or can be considered as a unit? If yes, proceed with document-based NoSQL.
  • Is the data rather simple and low read-latency crucial? If yes, proceed with in-memory NoSQL.
  • Is the data graph-like? If yes, it is actually tricky, because both SQL and NoSQL can handle tree-structured data.
  • Will the amount of data grow indefinitely? If yes, then either go with NoSQL and partitioning or with SQL and periodical dumping to a cold storage.
  • ...

Keep in mind, that SQL databases due to the nature of JOINs typically don't scale well horizontally. There could be some optimisations made though, such as master node and read replicas. When using replication, the database an becomes eventually consistent AP system.

Partitioning is also possible, but it comes with a price.

On the other hand, NoSQL allows better partitioning, due to the flat nature of data, but JOINs are obviously not available natively.

There are also hassle-free cloud relational databases, such as GCP BigQuery, but they ain't cheap.

There sometimes may be a combination of databases implemented. For example, a service manages data using MongoDB, but once in every N hours the data is dumped to BigQuery to populate intricate analytics down the road.


Cache is needed, when the data managed by a service is more frequently read than written.

Two things to keep in mind:

  • Cache invalidation strategy
    • keep the data for too long and you'll end up with stale data.
    • keep the data for too short, and you'll face frequent cache miss.
  • Maximum cache size and an eviction algorithm

There are some good caching techniques, such as LRU, LFU and Tagged caching.


CronJobs are periodical tasks used to keep the system up to date. The frequency of execution depends on the concrete task.


When it comes to security, it makes sense to at least mention JWT: the way it works and when it is typically used.

Algorithms & data structures

We should be aware of the most commonly used data structures and algorithms on those. It's not typically asked to implement any of these, as it's not a coding challenge, but there must be understanding of the basic principles of each and when to use what.

The FAANG script

As always, it's expected that the candidate drives the interview process. Imagine this being a real job task, when you are asked to design an application, and you make a presentation for your colleagues.

For better results, there is a certain script that should be followed.

1. Read the task and ask questions

Imagine you receive a task from your project manager, and it says: "Build me a system X". The very first and reasonable reaction from your side would be "What the heck, dude, give me more info! Where are the requirements?". This is exactly the same with this kind of interviews. The first thing to do is to start asking clarifying questions to define some acceptance criteria.

If nothing comes up, start asking the standard questions, such as:

  • What are the most important system requirements (features)? - a broadly scoped question that can help you get many useful insights.

  • If the system to be designed is well-known (such as Google Drive or Twitter) and enormous, it makes sense to ask instead What features are important? to narrow down the scope a little.

  • How many DAU (daily active users) should the system handle? - a good question for the back of the envelope estimation (see below). It is also quite generic and is asked more out of politeness, because they will most likely answer "ten thousands" or so.

  • For how long the data must be stored?

  • What is the average size of one data item? - another good question for the back of the envelope estimation.

  • ...

2. Propose a high-level architecture, get the buy-in

Here we can talk about tiers (web tier, VPC tier), very high level parts of the app, such as the BE app and the dashboard. Also talk about the way of communicating between these two.

3. Dive into details, layer by layer, and address all the aspects

Outline the contract between two parts. Dive into every component layer by layer, not too deep. Ask if some deeper explanation is needed.

Back of the envelope estimation

Basically, the Back of the envelope estimation is a technique that allows very-very-very rough estimation of average amount resources that the system will probably consume.

This must be well understood, as it could be asked during the interview.

Typically, what attracts interest is the following:

  1. Query Per Second and peak Query Per Second
  2. Storage size for N years
  3. Bandwidth per second

In order to get these values, we need to know some indications:

  1. Average active users (AAU) per a day (or daily active users (DAU))
  2. Percentage of requests that save something
  3. Average data size

Then we can easily make the calculations. Consider the example:

Let's say, that
1. AAU per day = 500 // this amount comes and does something on the platform
2. Data write requests = 50% // half of the users posts a message
3. Average message size = 300 kB
1. Requests per an hour = 500 / 24 =~ 21
2. Users per a minute = 21 / 60 = 0.35
3. QPS = 0.35 / 60 = 0.006
4. Peak QPS = 2 * 0.006 = 0.012
5. Amount of requests that posts a message = 500 * 0.5 = 250
6. Message volume per a day = 250 * 300 kB = 73 mB
7. Bandwidth of new messages per second = ((75000 kB / 24) / 60) / 60) = 0.9 kB
The code is licensed under the MIT license

Bonus: the checklist for making a distributed system

  1. Outline high level components
  2. Establish a contract between them
  3. Go layer by layer, talking about each system component
  4. Talk about the database, caching
  5. Talk about security
  6. Mention observability

Wow, we've covered a lot. As before, this article is a work in progress, I will enrich and expand it when I have new experience to share.


Sergei Gannochenko

Business-oriented fullstack engineer, in ❤️ with Tech.
Golang, React, TypeScript, Docker, AWS, Jamstack.
15+ years in dev.