Rate Limiting vs Load Shedding
Ever wonder what the difference between rate-limiting and load-shedding is?
Rate Limiting is all about controlling the rate of traffic sent or received by the network. In it's simplest form, it consists of rejecting any request that exceeds a specified count per time unit - think "If we get more than 10 customers per hour, we tell them we're busy".
A slight variant of this would be concurrent rate limiting, which defines the maximum number of simultaneous requests that can be handled in a time unit - think "We can only handle 3 customers at a time".
A slight variant of this would be concurrent rate limiting, which defines the maximum number of simultaneous requests that can be handled in a time unit - think "We can only handle 3 customers at a time".
Load Shedding is all about preventing the system from getting overloaded in the first place. The idea here is that it is better to ignore some requests, rather than having the system crater and not be able to serve any requests - think "Let 911 calls through, and ignore the rest". This, of course, implies that decisions around load shedding - i.e., which requests to drop - are made based on the state of the system as a whole.
The process can be coarse-grained (based on the system as a whole), or fine-grained (individual workers can drop requests based on their load), depending on the way the system is structured, the level of instrumentation and observability, etc.
The process can be coarse-grained (based on the system as a whole), or fine-grained (individual workers can drop requests based on their load), depending on the way the system is structured, the level of instrumentation and observability, etc.
So, what are the differences between the two?
Well, in common parlance, rate limiting refers to rejecting traffic based on properties of individual requests (too many from a given client/IP/node), and load-shedding refers to rejecting requests based on the overall state of the system (database at capacity, workers slammed…).
In a more technical context, load-shedding is used as a type of rate-limiting. For example, Stripe has implemented four different types of rate-limiting for their APIs, two of which are do based on limiting inbound requests (what you'd typically term as rate-limiting), and two of which do load-shedding based on system state.
Well, in common parlance, rate limiting refers to rejecting traffic based on properties of individual requests (too many from a given client/IP/node), and load-shedding refers to rejecting requests based on the overall state of the system (database at capacity, workers slammed…).
In a more technical context, load-shedding is used as a type of rate-limiting. For example, Stripe has implemented four different types of rate-limiting for their APIs, two of which are do based on limiting inbound requests (what you'd typically term as rate-limiting), and two of which do load-shedding based on system state.
The bottom line is, stick with the common parlance, unless you know that the other party understands the difference…
Comments