Burst Throttling on AWS API Gateway Explained
By Pete Freitag
One nice feature of AWS API Gateway is that you can configure or Throttle the Burst (requests) and Rate (requests per second). The first time I looked at that it was not really clear to me how the Burst (requests) throttling works. So here is an explanation of what the Burst and the Rate are, and how they work together.
What is the Burst?
The Burst limit is quite simply the maximum number of concurrent requests that API gateway will serve at any given point. So it is your maximum concurrency for the API.
What is the API Gateway Rate (requests per second)?
The Rate is a little easier to understand, it is the maximum number of requests that can occur within one second.
How do the Rate and Burst Throttle work together?
The Burst setting and Rate setting work together to control how many requests can be processed by your API.Let's assume you set the throttle to Rate = 100 (requests per second) and the Burst = 50 (requests). With those settings if 100 concurrent requests are sent at the exact same millisecond only 50 would be processed due to the burst setting, the remaining 50 requests would get a 429 Too Many Requests
response. Assuming the first 50 requests completed in 100ms each, your client could then retry the remaining 50 requests.
What does this mean for API Gateways that invoke Lambda?
AWS Lambda Functions have a default maximum concurrency level of 1000 (you can request to have this increased if you need to), but the default burst levels on AWS API Gateway is way higher than this, so if you are using API Gateway with Lambda you will want to make sure that you have set a value for the Burst throttle setting that makes sense for your Lambda Concurrency level.
Burst Throttling on AWS API Gateway Explained was first published on December 07, 2018.
If you like reading about aws, lambda, or apigateway then you might also like:
Discuss / Follow me on Twitter ↯
Tweet Follow @pfreitagComments
"The Rate is a little easier to understand, it is the maximum number of requests that can occurs within one second."
^This sentence seems a bit confusing to me. My understanding is that if the burst is 200 and the rate is 100, there can be 200 requests occurring within the first second if the service had not been taking any request for sometime.