Refill!

The ultimate interview preparation framework. Part 4: System design

← Back to list

Last updated: August 17, 2024

Image by AI on Midjourney

Articles in this series

👉 The ultimate interview preparation framework. Part 1: Before it all starts

👉 The ultimate interview preparation framework. Part 2: Initial screening

👉 The ultimate interview preparation framework. Part 3: Live coding (aka Data Structures and Algorithms)

🧐 The ultimate interview preparation framework. Part 4: System design

👉 The ultimate interview preparation framework. Part 5: Cultural fit

👉 The ultimate interview preparation framework. Part 6: Accepting the offer

***

In the previous article we uncovered ways to pass the DSA interview. The next part of the Ultimate interview preparation framework demystifies the system design interviews.

# What to expect

So far in my career I've met three types of interviews they call "System design":

A candidate is only given a task that demands to design a distributed system, and then it's required to infer constraints and negotiate a possible solution. This is a classic "FAANG style" system design, such as "Design Twitter", "Design Google Drive", "Design Payment System" and so on. This kind of interview can be learned and done by the book. - The candidate is given a snippet of code and asked to improve it to the best degree possible. Typically that best degree is a fully-fledged microservice. - The candidate is given a certain database structure and asked to fix it. So they must fix obvious things like absence of primary and foreign keys, de-normalisation-without-apparent-reason, and so on.

Don't come unarmed, scout the battleground. It is always a good idea to call your recruiter and ask to give your pointers on what to expect. Recruiters are sincerely interested in your success, so they cooperate willingly.

# How to behave

An interviewer typically says the interview is open-ended, so it potentially can go on and on. However, the way they say this is a bit deceiving, because the amount of time given is limited, and typically there is always a certain checklist of things they expect to hear and cover. Failing to do so will most likely lead to rejection. That's why:

Always know how much time you have. Normally it is one hour, sometimes 1.5 hours. Don't loose track of time, plan your interview. - Don't spend too much time on the introduction round. In most of the cases this is just a certain kind of a ritual that typically doesn't let you earn any score. - Always know what the expectations are. Is it to build a distributed system? Or just fix the code? It is better to discuss this with the recruiter beforehand. - Don't spend time on chit-chat and idle talks, trying to show good personality of yours. Good personality won't get you the offer. Be polite and positive, but focus on the task. - Always listen to the interviewer, as they may steer the discussion to the desired direction, in case if you are a bit off the road. - Drive the interview process, as this is always expected.

# The FAANG script

Once again, it's expected that the candidate drives the interview process. Imagine this being a real job task, when you are asked to design an application, and you make a presentation for your colleagues.

For better results, there is a certain script that should be followed. I call it "The FAANG script", because this is how the interviews go in Facebook, for instance.

# 1. Read the task and ask questions

Imagine you receive a task from your project manager, and it says: "Build me a system X". The very first and reasonable reaction from your side would be "Where are the requirements?" or "Should we outline the requirements together?". This is exactly the same with this kind of interviews. The first thing to do is to start asking clarifying questions to define some acceptance criteria.

If nothing comes up, start asking the standard questions, such as:

What are the most important system requirements (features)? - a broadly scoped question that can help you get many useful insights.
If the system to be designed is well-known (such as Google Drive or Twitter) and enormous, it makes sense to ask instead What features are important? to narrow down the scope a little.
How many DAU (daily active users) should the system handle? - a good question for the back of the envelope estimation (see below). It is also quite generic and is asked more out of politeness, because they will most likely answer "ten thousands" or so.
For how long the data must be stored?
What is the average size of one data item? - another good question for the back of the envelope estimation.
...

Starting off with drawing blocks and endpoints having the questions skipped can lead to a failure. Just like you can't jump over the discovery phase working on a real feature, you can't proceed with the design if you poorly understand what must be designed. If you skip this step, it's a clear sign that at your current position you are mostly a doer, not a researcher.

# 2. Make a list of must-have features and requirements

A list of basic features must be written down, unless given. Without a very well scoped and fostered list of mandatory features it's not possible to proceed, because then it would be hard to outline the limits and the scope of the solution itself.

# 3. Propose a high-level architecture, get the buy-in

Here we can talk about tiers (web tier, VPC tier), very high level parts of the app, such as the BE app and the dashboard. Also talk about the way of communicating between these two.

No need to care about the performance concerns at that stage, no need to introduce any caching or multi-thread processing. The main components of the system and the way the parts interact with each other must be defined.

At this point don't even try talking about specific technologies or frameworks, it's too early for that!

# 4. Dive into details, layer by layer, and address all the aspects

Outline the contract between the parts. Dive into every component layer by layer, not too deep. Ask if some deeper explanation is needed.

A bonus would be talking about security, observability, performance, etc.

If the time allows, you can talk about specific technologies and frameworks that could be used to implement the system.

Here is an example of system design, similar to which allowed me to land an offer once.

# AI assisted interview

At this stage limited AI assistance may be allowed, but an explicit permission should be granted. The company is interested in assessing your take and your approach in solving a real world problem, so they will be also observing which tools you are willing to utilize in order to complete the task.

# About the technical part

From the technical point of view, there is a set of theoretical concepts about the system design as such, that are typically assumed and not discussed in depth, but it is expected for the candidate to have good understanding of.

This section is heavily inspired by an amazing book System Design Interview – An insider's guide, Vol. I that I've recently read. I highly recommend it.

# CAP theorem

In every distributed system there are three parameters:

Consistency - each part of the system must have the most recent data.
Availability - the system must still function even if some part of it is down.
Network Partition tolerance - the system must tolerate temporary or permanent disruptions of connectivity between its parts.

The CAP theorem basically states, that any system always sacrifices either one of three properties, and keeps the rest two, so it's always a trade-off. Since the Network Partition tolerance is a must for every system, in real life it boils down to the following:

CP systems - consistency-first oriented. Such systems have a lot of blocking and sync operations in order to be consistent, I don't think a candidate will be asked to build one of those. A good example of such a system is a banking/financial application.
AP systems - can be inconsistent, but have high availability. This is 99% of all web applications.

# Eventual consistency

When building an AP system, one rule must be set: the system is allowed to be inconsistent, but only for a certain period of time (the shorter, the better). Eventually, the system must become consistent again, until the next change happens. This is called eventual consistency. A good example of an eventually consistent system is two microservices A and B talking to each other using Kafka. The microservice A received an update, and the update was communicated to the microservice B, but the message is not yet consumed and remains in a queue. It is said, that the system is currently inconsistent. But give it time, it will become consistent, eventually.

# Vertical Scaling vs Horizontal Scaling

When load on one of the system's element grows, and it starts choking up, there are two ways to deal with it.

Vertical scaling - giving a instance more CPU power and memory amount to cope with increased load.
- 👍 Rather cheap to do, especially in the cloud.
- 👎 There is always a limit to what the amount of resources can be increased.
Horizontal scaling - add more copies of the element, and spread the load between them.
- 👍 In theory, thanks to consistent hashing, horizontal scaling could grow the system to any amount of copies / partitions.
- 👎 Additional infrastructure is needed to manage traffic between the copies / partitions.

When it is said, that the system should be able to process high load and hundreds of thousands of requests per second, it usually means the system should effectively scale horizontally.

In the amazing Node.js Design Patterns book I saw a good diagram that unfolds the horizontal scaling concept. Basically, in the very center of a 3-dimensional coordinate system (0,0,0) stands a monolith running as a single instance, connected to a single database instance. And then you can scale horizontally in 3 orthogonal directions:

X-axis: add more instances of the application itself, to balance the load between them.
Y-axis: separate the responsibilities and domains by splitting the application onto parts (microservices and CQRS)
Z-axis: add sharded database, so each instance serves a subset of the data.

# Trading space for time and the other way around

It ain't possible to cheat laws of physics. The system you build always makes a tradeoff between time of execution and memory consumption. The algorithm in use can be either fast, but hungry on memory, or the other way around, or something balanced.

This is why every algorithm typically has two parameters of effectiveness: time complexity and space complexity. And that's exactly the reason why you shouldn't claim that the Bubble Sort algo sucks and Quicksort doesn't. It simply depends on what circumstances every specific algorithm is used under: the Bubble Sort has space complexity of O(1), so it is perfect for micro-controllers that only have as little as 2kb of RAM and low performance is generally expected.

# Sync and async actions

There are two types of processes that usually take place inside any system.

Synchronous - an action is performed right away, and the user or another element of the system, which triggered the action, must wait.
- a user gets a list of books via REST to see it in the UI,
- one service calls another via gRPC to get a list of products.
Asynchronous - an action is triggered and then enqueued, thus postponed. The user or element moves on with their business and gets notified when the action is completed. Examples:
- generating image previews after upload,
- generate CSV files containing exported data and upload to a bucket.

Before committing to either one, the following question must be asked and answered: "Is the result needed now (in real time) or later?".

In order for the system to successfully scale horizontally to handle high amount of load, all heavy processes in the system must be switched over to the async mode.

# In-memory vs disk operations

Every system that stores the data must decide where to store it. There are two options:

in-memory storage (e.g. Redis)
on-disk storage (file system, relational databases)

As usual, there is no right or wrong approach, it all depends on the system requirements and the type of data the system deals with. Large files can't be stored in memory (at least entirely), frequently read data shouldn't be stored on disk due to high access latency.

Rule of a thumb: in-memory is faster, disk operations are slower.

# Stateful vs stateless API

The term stateless means that at any point of time any instance of the application can serve any request from any user identically successfully. Simply put, a user session isn't stored on the server.

However, being stateful isn't necessary a bad thing. Websockets are stateful, because a permanent connection between a client and a server is preserved and thus every user is "bound" to a specific instance of the system.

Streaming can also be both stateless and stateful, depending on the concrete task.

If you want your system to effectively scale horizontally, you generally want your API stateless.

# Write-optimised vs read-optimised systems

It's always a good idea to understand whether the data in your system will be more frequently read or more frequently written. This is important to know, because then there is room for some optimisations. Most of the web applications are read-optimised, but there are exceptions such as metric and log aggregators, where there is constant influx of new data.

# Consistent hashing

Consistent hashing is a technique that keeps that whole idea of horizontal scaling afloat. Imagine we have N instances that process user requests. Basically, thanks to the consistent hashing we can take any incoming ID and then map it to one of the instances. When the same ID comes next time, it is mapped to the same instance once again. Furthermore, new additional or replacement instances can be added to the pool, old instances - removed, and the system can also keep track of the unhealthy instances and promptly redirect the requests.

A good application of a consistent hashing is database partitioning. Since every record has an ID, it can be unequivocally mapped to a certain instance now and later.

More information is here

In order for a database, message broker or in-memory store to scale horizontally, it should support partitioning.

# Application tiers

When designing the application, we must clearly understand the request flow, and what data we can trust, and which can't be trusted. Also, most of the time data is sensitive and must be protected by authorization and authentication. When the data is a subject to a change, we must also record who introduced those changes.

Typically, there are these tiers:

Public tier (or web tier) - untrusted tier, must be handled with caution and ideally should be protected with authentication,
VPC tier - trusted, because in this tier all communications between microservices happen,
VPN tier - half-trusted I would say. In case of a VPN, we know that the tier is not entirely public, but we still need to be careful with the data.
Database tier - here the data itself is trusted, but the authentication should still happen to protect the data from unauthorized access.

# Observability

We shouldn't neglect observability. Handling an un-observed service is like flying a plane without any cockpit devices: all you know is the engine makes sound and the earth is below, not above, which means the plain is still in the air and hasn't crushed (yet). But, obviously, this is not enough.

Observability (aka O11y) implements cockpit instrumentation for your application.

Here is a list of standard metrics that are typically of interest:

Request per second (RPS), per endpoint and total
Request duration, per endpoint
P99, per endpoint - shows the slowest endpoints
CPU & Memory consumption
Daily active users (DAU)

In order to observe how effectively your system scales, you need to have observability instrumentation in place.

# The contract

One of the most important thing is to describe how the microservice communicates with the user and other parts of the cloud native application.

There are several transports that are good to know:

REST (OpenAPI standard)
GraphQL
RPC (gRPC, ...)
Streaming (Websockets, Server-Sent Events, WebRTC, ...)
Event-based (Kafka, Google Pub/Sub, RabbitMQ, ...)

# Microservice architecture

It is compulsory to know the concept of the microservice architectures and the related design patterns. Be ready to talk about it. Also be ready to explain when microservices are not needed (small projects, PoCs, eventual consistency intolerant systems, etc.)

Here is a good website dedicated to the microservice architecture.

The microservice architecture is a design pattern where an application is broken down into independent, smaller services.

Concepts:

Single responsibility
Autonomy
Data isolation
Failure isolation

Advantages:

Scalability (each service can be scaled independently) and distribution
Faster deployment cycle
Technology agnostic

Disadvantages:

Higher maintenance cost (observability, orchestration, tracing, contracts, security)
Data consistency (eventual consistency, distributed transactions)
Network latency and communication overhead

# Data propagation patterns

There are two main patterns for data propagation, that are used to keep the system consistent:

2 phase commit - a coordinator is used to coordinate the actions between the parties. The coordinator sends a message to all parties to open a transaction simultaneously, and waits for a response. If all parties respond with "yes", the coordinator sends a final message to all parties to commit the transaction. If any party responds with "no", the coordinator sends a message to all parties to rollback the transaction. This allows strong consistency, but is blocking. Good for financial systems.
Saga - a coordinator or event-driven approach is used to coordinate the actions between the parties. Each party makes a change. If some party fails to make a change, all parties get to know about this and execute the undo action on their end. This allows eventual consistency, but is non-blocking. Good for order processing.

# Architecture patterns

Some of the most common architecture patterns:

API Gateway - a single entry point for all clients to access backend services.
Backend for Frontend - provides a custom API layer tailored for a specific frontend.
Service Mesh - a network of services that are used to communicate with each other. It is used to route requests to the appropriate service.
Peer-to-peer communication - a pattern that is used to communicate between services using a direct connection.
Event-driven communication - an architecture that is used to communicate between services using events.
Load balancing - a pattern that is used to distribute requests between multiple instances of the application.
Caching - a pattern that is used to cache data in a distributed system.
Circuit breaker/Retry - a pattern that is used to handle failures in a distributed system.
Private database - a pattern that is used to store data that is only accessible to the service itself.
Sidecar - a pattern that is used to add additional functionality to an already existing service without modifying it.
Command Query Responsibility Segregation - a pattern that is used to separate the read and write operations of a system, to let them scale independently.

Here is a list of all patterns.

Be ready to talk about the anti-patterns as well. Here is a good set of slides about the whole topic.

# Key system components

Each system always consists of a set of smaller components, that can be considered as "building blocks". When building a cloud native application, you typically want to re-use these blocks.

# Load balancing

A load balancer is a special kind of software that distributes requests between multiple instances of the application. In basic situations, the round-robin algorithm is used, but there may be variations. Typically the balancer is a built-in feature of the Cloud Platform or K8s, but on a C4 diagram it must be clearly highlighted.

# Databases and storage

When it comes to structured keeping of data, databases come into play. There are several main cohorts of databases/storage:

SQL
- row-based (MySQL/MariaDB/Percona, Postgres, AWS Aurora/RDS, GCP BigQuery, etc.)
- column-based (ClickHouse, Grafana Mimir, etc.)
NoSQL
- document-based (MongoDB, AWS DynamoDB, etc.)
- key-value-based (Redis, AWS DynamoDB, etc.)
- graph-based (AWS Neptune, etc.)
- time-series-based (InfluxDB, Grafana Mimir, etc.)
- search-based (Elasticsearch, etc.)
Storage
- object-based (AWS S3, etc.)
- remote filesystem (AWS EFS, etc.)

As everything mentioned above, every type of a database comes with own tradeoffs. To make a choice, it ultimately boils down to answering certain questions, including, but not limited to:

Are flexible reports needed? If yes, proceed with SQL.
Is your data mostly flat or can be considered as a unit? If yes, proceed with document-based NoSQL.
Is the data rather simple and low read-latency crucial? If yes, proceed with in-memory NoSQL.
Is the data graph-like? If yes, it is actually tricky, because both SQL and NoSQL can handle tree-structured data.
Will the amount of data grow indefinitely? If yes, then either go with NoSQL and partitioning or with SQL and periodical dumping to a cold storage.

Keep in mind, that SQL databases due to the nature of JOINs typically don't scale well horizontally. There could be some optimisations made though, such as master node and read replicas. When using replication, the database an becomes eventually consistent AP system.

Partitioning is also possible, but it comes with a price.

On the other hand, NoSQL allows better partitioning, due to the flat nature of data, but JOINs are obviously not available natively.

There are also hassle-free cloud relational databases, such as GCP BigQuery, but they ain't cheap.

There sometimes may be a combination of databases implemented. For example, a service manages data using MongoDB, but once in every N hours the data is dumped to BigQuery to populate intricate analytics down the road.

# Caching

Cache is needed, when the data managed by a service is more frequently read than written. In that case, it's a good practice to put a cache in front of the long-term storage, such as a database.

Two things to keep in mind:

Cache invalidation strategy
- Keep the data for too long, and you'll end up with stale data.
- Keep the data for too short, and you'll face frequent cache miss.
Maximum cache size and an eviction algorithm

There are some good caching techniques, such as LRU, LFU and Tagged caching.

# CronJobs

CronJobs are periodical tasks used to keep the system up to date. The frequency of execution depends on the concrete task.

# Security

Security is crucial to have between untrusted parties, such as VPC and Web tier. Be ready to talk about:

IAM
- API keys
- Login & Password
- OAuth2 - authorization framework
  - grant types:
    - authorization_code - via UI user allows an application to access some resource (owned by the user)
    - client_credentials - used for machine to machine communication
    - refresh_token - used to refresh the access token
- OpenID - authentication and authorization framework
- Role-based access control
- Attribute-based access control
Digital signatures
- JWT - JSON Web Token
- JWKS - extension of JWS, allows storing public keys elsewhere by exposing an endpoint
- JWE - JSON Web Encryption, allows encrypting the token
TLS
- Handshake
- Asymmetric encryption (RSA, Diffie-Hellman)
- Symmetric encryption (AES)
- Signing algorithms (HS256, RS256, ES256)
mTLS and zero trust architecture and when it may be needed (banking, military, government, healthcare, etc.)
VPN
Vault for secrets management
Cookies

A bonus would be to mention ways to mitigate DDoS attack and prevention of resource bottlenecks by introducing:

API rate limiter - when a client sends too many requests, after reaching a certain threshold the requests are denied. Then there is cool down period. There are two main algorithms:
- leaky bucket - an incoming request adds a drop into a bucket. The request is denied when the bucket is full, but it constantly leaks at the bottom, creating more space.
- token bucket - a new token is added to the bucked at a certain rate. The request is denied when the bucket is empty.
Consumer throttling for the event based communication - when messages start coming in big numbers, you typically want to start making short pauses between acknowledgments, otherwise the CPU and DB CPU will be overwhelmed.

# Algorithms & data structures

We should be aware of the most commonly used data structures and algorithms on those. It's not typically asked to implement any of these, as it's not an DSA interview, but there must be understanding of the basic principles of each and when to use what.

For the system design it's ultimately important to understand the job queue algorithm, and what advantages it has over the traditional synchronous approach of data processing.

# Back of the envelope estimation

Basically, the Back of the envelope estimation is a technique that allows very-very-very rough estimation of average amount resources that the system will probably consume.

This must be well understood, as it could be asked during the interview.

Typically, what attracts interest is the following:

Query Per Second and peak Query Per Second
Storage size for N years
Bandwidth per second

In order to get these values, we need to know some indications:

Average active users (AAU) per a day (or daily active users (DAU))
Percentage of requests that save something
Average data size

Then we can easily make the calculations. Consider the example:

Let's say, that

1. AAU per day = 500 // this amount comes and does something on the platform
2. Data write requests = 50% // half of the users posts a message
3. Average message size = 300 kB

Then

1. Requests per an hour = 500 / 24 =~ 21
2. Users per a minute = 21 / 60 = 0.35
3. QPS = 0.35 / 60 = 0.006
4. Peak QPS = 2 * 0.006 = 0.012
5. Amount of requests that posts a message = 500 * 0.5 = 250
6. Message volume per a day = 250 * 300 kB = 73 mB
7. Bandwidth of new messages per second = ((75000 kB / 24) / 60) / 60 = 0.9 kB

📃 Copy

The code is licensed under the MIT license

# Links

Some useful links to read more about the topics covered in this article:

Microservices.io
Microservices anti-patterns in Melbourne - true classic never gets old
A pattern language for microservices

***

Well, that was a long post. As before, this article is a work in progress, I will enrich and expand it when I have new experience to share.

Sergei Gannochenko

Business-focused product engineer, in ❤️ with tech and making customers happy.

AI, Golang/Node, React, TypeScript, Docker/K8s, AWS/GCP, NextJS

20+ years in dev