Load Balancing
Mar 26, 2026
What is Load Balancing?
A load balancer distributes incoming requests across multiple servers to prevent any single server from being overwhelmed.
┌─ Server A
Client → Load Balancer ─┼─ Server B
└─ Server CWithout load balancing, all traffic hits one server — adding more servers doesn't help.
Load Balancing Algorithms
Round Robin
Distribute requests in order: A → B → C → A → B → C…
Simple and even, but doesn't account for server load or request complexity.
Least Connections
Route to the server with the fewest active connections.
Better for requests with variable processing time (some requests take 1ms, others 500ms).
IP Hashing
Hash the client's IP address to always route them to the same server.
Useful for session affinity — the same user always hits the same server.
Weighted Round Robin
Assign weights to servers. A server with weight 3 gets 3x more requests than one with weight 1.
Useful when servers have different capacities.
Layer 4 vs Layer 7 Load Balancing
| L4 (Transport) | L7 (Application) | |
|---|---|---|
| Works at | TCP/UDP level | HTTP level |
| Routing based on | IP + port | URL, headers, cookies |
| Speed | Faster | Slower |
| Smart routing | No | Yes |
| Example | AWS NLB | AWS ALB, Nginx |
L4 : blindly forwards packets — fast but dumb.
L7 : reads the request — can route /api/* to one cluster and /static/* to another, or route based on Authorization headers.
Health Checks
Load balancers continuously ping servers to check they're alive. If a server fails the health check, it's removed from the pool.
Load Balancer → GET /health → Server A (200 OK) ✓
→ GET /health → Server B (timeout) ✗ removed
→ GET /health → Server C (200 OK) ✓Sticky Sessions
Some apps store session state on the server. If a user hits a different server each time, they lose their session.
Sticky sessions (session affinity) ensure the same user always routes to the same server.
Better solution: store sessions externally in Redis so any server can handle any user.
High Availability with Load Balancers
The load balancer itself can be a single point of failure. Solutions:
- Active-passive : one load balancer is active, another is standby
- Active-active : multiple load balancers share traffic (DNS round-robin)
- Managed services : AWS ALB, GCP Load Balancer handle HA for you
Key Questions
Q: What is a load balancer and why do we need one?
A load balancer distributes incoming traffic across multiple servers to prevent overload, improve availability, and enable horizontal scaling. Without it, adding more servers provides no benefit — all traffic still hits one machine. It also improves fault tolerance: if one server goes down, the load balancer stops routing to it.
Q: What is the difference between L4 and L7 load balancing?
L4 load balancing operates at the transport layer — it forwards TCP/UDP packets based on IP and port without inspecting the content. L7 load balancing operates at the application layer — it can inspect HTTP headers, URLs, and cookies to make smarter routing decisions, like routing API traffic to one cluster and static assets to another.
Q: What are sticky sessions and when are they problematic?
Sticky sessions route a user to the same server on every request, typically using IP hashing or a session cookie. They're needed when session state is stored locally on the server. They're problematic because they reduce load balancing effectiveness — if one server is slow, its "sticky" users are stuck waiting. The better solution is to store session state externally (e.g., Redis).
Q: What is the difference between round-robin and least-connections load balancing?
Round-robin distributes requests evenly in order, regardless of server load. Least-connections routes to the server with the fewest active connections, which is more adaptive when request processing times vary significantly. Use round-robin for uniform workloads; use least-connections when some requests are much heavier than others.