Cloud & DevOps
Load balancing is the process of distributing incoming network traffic across multiple servers or instances to prevent any single server from becoming a bottleneck, improving application availability, throughput, and fault tolerance.
A load balancer sits between clients and the pool of backend servers, forwarding each incoming request to a server according to an algorithm — round-robin distributes requests sequentially, least-connections sends traffic to the least busy server, and IP hash pins a client to a consistent server for session stickiness. Beyond distributing traffic, load balancers perform health checks on backend instances, automatically removing unhealthy servers from rotation so traffic is only sent to healthy endpoints — enabling zero-downtime deployments and seamless recovery from server failures. Application-layer load balancers (Layer 7) can make routing decisions based on URL path, HTTP headers, or cookies, enabling sophisticated patterns like routing /api requests to a different server pool than /static requests. Cloud providers offer managed load balancers (AWS ALB, GCP Cloud Load Balancing, Azure Application Gateway) that integrate with auto-scaling groups to handle variable traffic without manual capacity management.
Example
An e-commerce site receives 50,000 concurrent users during a Diwali sale; the AWS Application Load Balancer distributes requests across 40 EC2 instances, with auto-scaling adding 20 more instances when CPU exceeds 70%.
Related terms
Scalability
Scalability is the ability of a software system to handle increasing workloads — more users, data, transactions, or requests — by adding resources without requiring fundamental redesign of the architecture.
Kubernetes
Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, load balancing, self-healing, and management of containerised applications across clusters of machines.
CDN (Content Delivery Network)
A CDN is a globally distributed network of servers that caches and delivers web content — images, CSS, JavaScript, and HTML — from locations geographically close to each user, reducing latency and improving load times.
Serverless
Serverless is a cloud execution model in which developers deploy individual functions or applications without provisioning or managing servers — the cloud provider automatically allocates compute resources on demand and charges only for actual execution time.
Ready to grow your business?
Tell us what you're building. We'll reply within one business day with a clear next step.