Why system design matters

System design questions are where most senior engineering interviews happen. Unlike algorithms where there's a correct answer, system design is about trade-offs — and knowing which trade-offs exist requires a solid foundation of core concepts. This list isn't exhaustive, but these are the 20 concepts that come up most consistently, both in interviews and in real architecture decisions.

1. Scalability — Horizontal vs Vertical

Vertical scaling means upgrading the machine (more CPU, more RAM). Horizontal scaling means adding more machines. Vertical has a ceiling and a single point of failure. Horizontal scales theoretically without limit but introduces distributed systems complexity — you now have to think about how multiple instances coordinate, share state, and handle failures. Most modern systems scale horizontally.

2. Load Balancing

A load balancer distributes traffic across multiple servers. Common algorithms: round-robin (cycle through servers in order), least connections (send to the server with the fewest active connections), and IP hash (same client always hits the same server — useful for session stickiness). Load balancers operate at Layer 4 (TCP) or Layer 7 (HTTP) — Layer 7 balancers can route based on URL path, headers, and cookies.

3. Caching

Caching stores expensive computation or database results in fast memory so subsequent requests don't repeat the work. Cache-aside (application checks cache first, fetches from DB on miss and populates cache), write-through (write to cache and DB simultaneously), and write-back (write to cache, sync to DB asynchronously) are the main strategies. Redis and Memcached are the standard implementations. Cache invalidation — knowing when to expire stale data — is the hard part.

4. CDN — Content Delivery Network

A CDN is a geographically distributed network of servers that caches static assets (images, CSS, JS) close to users. Instead of a request from Sydney hitting your server in Virginia, it hits a CDN edge node in Sydney. Latency drops dramatically. CloudFront (AWS), Cloudflare, and Fastly are common choices. Important: CDNs are for static or cacheable content — dynamic API responses need to come from your origin.

5. CAP Theorem

In a distributed system, you can only guarantee two of three properties: Consistency (every read returns the most recent write), Availability (every request gets a response), and Partition tolerance (the system works despite network partitions). Since network partitions are unavoidable in distributed systems, you're really choosing between CP (consistent but may be unavailable during partition) and AP (always responds but might return stale data). Cassandra is AP; HBase is CP.

6. Database Sharding

Sharding splits a database horizontally — different rows go to different database servers (shards) based on a shard key. A user table sharded by user_id range might put IDs 1-1M on shard 1 and 1M-2M on shard 2. Benefits: each shard handles a fraction of the total load. Problems: cross-shard queries are expensive, rebalancing shards when one gets hot is painful, and joins across shards are often impossible. Shard key choice is critical — a bad key creates hotspots.

7. Database Replication

Replication keeps copies of your database on multiple servers. The most common pattern: one primary accepts writes, multiple replicas receive changes asynchronously and serve reads. This improves read throughput and gives you a failover target if the primary goes down. Lag between primary and replica is the trade-off — reads from replicas might be slightly stale. For financial transactions, always read from primary.

8. Consistent Hashing

Consistent hashing is used in distributed caches and load balancers to assign requests to servers in a way that minimises redistribution when servers are added or removed. In a normal hash (key % n), adding a server changes almost every assignment. Consistent hashing maps both keys and servers onto a ring — each key goes to the nearest server clockwise on the ring. Adding a server only reassigns the keys between it and its predecessor.

9. SQL vs NoSQL

SQL databases (PostgreSQL, MySQL) are relational, schema-enforced, ACID-compliant, and excellent for complex queries and strong consistency. NoSQL databases trade some of these properties for scale and flexibility. Document stores (MongoDB) store JSON-like documents — flexible schema, good for hierarchical data. Key-value stores (Redis, DynamoDB) are extremely fast for simple lookups. Column-family stores (Cassandra) optimise for write-heavy workloads and time-series data. Wide-column stores (BigTable) handle analytical queries at massive scale. Pick based on access pattern, not hype.

10. Indexing

Indexes make queries fast by providing a pre-sorted lookup structure instead of scanning the full table. A B-tree index on user_id turns an O(n) scan into O(log n) lookup. Composite indexes (user_id, created_at) accelerate queries that filter on both columns — but only in the order defined. Too many indexes slow down writes because every write must also update every index. Index the columns in your WHERE and JOIN clauses, not everything.

11. API Gateway

An API gateway is the single entry point for clients into your microservices. It handles cross-cutting concerns: authentication and authorisation (so individual services don't each implement auth), rate limiting, request routing, SSL termination, logging, and sometimes request/response transformation. AWS API Gateway, Kong, and nginx (in proxy mode) are common implementations.

12. Message Queues

A message queue decouples producers from consumers. Instead of Service A calling Service B synchronously and waiting for a response, A drops a message on a queue and B processes it when ready. If B is slow or down, messages queue up and are processed once B recovers. This improves reliability (no lost requests during outages) and throughput (producers don't block on consumer speed). Kafka is the standard for high-throughput event streaming; RabbitMQ and SQS for task queues.

13. Rate Limiting

Rate limiting prevents abuse and protects services from traffic spikes. Algorithms: token bucket (bucket refills at a fixed rate; each request costs a token — allows bursting up to bucket capacity), sliding window (counts requests in a rolling time window — more precise than fixed windows), and leaky bucket (requests queue and process at a fixed rate — smooths spiky traffic). Rate limits are typically enforced at the API gateway using Redis to store per-user counters.

14. Microservices vs Monolith

A monolith is a single deployable unit with all functionality bundled together. Easy to develop and test initially, but scaling requires scaling the entire application, and a bug anywhere can take everything down. Microservices split functionality into independent services that communicate over the network. Each service can be deployed, scaled, and updated independently. The cost: distributed systems complexity, network latency between services, and significantly harder debugging. Monolith first is usually the right call until you have clear scaling bottlenecks.

15. Reverse Proxy

A reverse proxy sits in front of backend servers and forwards client requests to them. Benefits: hides internal server addresses, handles SSL termination, serves static files without hitting application servers, and enables caching. nginx is the canonical reverse proxy. The difference from a forward proxy: a forward proxy sits in front of clients and fetches on their behalf (VPN-like); a reverse proxy sits in front of servers.

16. Service Discovery

In a microservices architecture, services need to find each other dynamically — hardcoding IPs doesn't work when containers start and stop constantly. Service discovery maintains a registry of available service instances and their addresses. Client-side discovery (service queries the registry and picks an instance) vs server-side discovery (request goes to a router that queries the registry). Consul, etcd, and Kubernetes DNS are common implementations.

17. Circuit Breaker

The circuit breaker pattern prevents cascade failures in distributed systems. If Service A calls Service B and B starts failing, A will keep hammering a failing service, backing up its own request queue and failing too. The circuit breaker wraps the call to B and tracks failure rate. When failures exceed a threshold, the circuit opens — calls to B fail immediately without attempting the connection. After a timeout, a test request goes through; if it succeeds, the circuit closes. Prevents one failing service from bringing down the whole system.

18. Event-Driven Architecture

In event-driven architecture, services communicate by publishing and subscribing to events rather than calling each other directly. Service A publishes an OrderPlaced event. Services B (inventory), C (email), and D (analytics) all subscribe and react independently. The publisher doesn't know who's listening — fully decoupled. This enables independent scaling, easy addition of new consumers, and natural audit logs. The complexity: eventual consistency and debugging event flows across services is non-trivial.

19. Containerisation and Orchestration

Containers (Docker) package an application and its dependencies into a portable, isolated unit. They start faster and use fewer resources than VMs. Orchestration (Kubernetes, ECS) handles deploying containers at scale: scheduling them across a cluster, restarting failed containers, scaling based on load, rolling updates, and service discovery. The deployment unit shifts from a server configuration to a container image — infrastructure becomes reproducible and environment-agnostic.

20. Observability — Logs, Metrics, Traces

The three pillars of observability: Logs (timestamped records of discrete events — what happened), Metrics (numerical measurements over time — CPU, request rate, error rate, latency percentiles), and Traces (records of a request's path across multiple services — where time was spent). Without all three, debugging production issues in a distributed system is guesswork. The standard stack: ELK (Elasticsearch, Logstash, Kibana) or CloudWatch for logs; Prometheus + Grafana for metrics; Jaeger or Datadog for distributed tracing.

The pattern across all of these

Almost every system design concept is a response to the same three problems: scale (handle more load), reliability (handle failures gracefully), and latency (respond faster). Understanding which problem each concept solves is more valuable than memorising implementation details. When you're designing a system, start with the constraints — expected load, consistency requirements, budget — then choose the patterns that address those specific constraints.