AWS EC2 pricing seems compilcated, and many times I have tried to figure it out. Recently I was there again, looking at it from the RDS perspective (same pricing model for the underlying EC2 instances), so here we go.
Amazon/AWS calls their virtual machine Elastic Compute Cloud (EC2). I used to think about the "elastic" part in terms of being able to scale your infrastructure by adding and removing VM’s as needed. But I guess scaling is also relevant in terms of allocating more or less compute on the same VM, or considering how many compute tasks at the same time a single hardware host in AWS can handle. Let’s see why…
Basic Terminology
What is compute in EC2? AWS used to use a term called Elastic Compute Unit (ECU). As far as I can tell, this has been largely phased out. Now they measure their VM performance in terms of virtual CPU (vCPU) units.
So what is a vCPU? At the time of writing this, it is defined as "a thread of either an Intel Xeon core or an AMD EPYC core, except for M6g instances, A1 instances, T2 instances, and m3.medium.". A1 and M6g use AWS Gravitron and Gravitron 2 (ARM) processors, which I guess is a different architecture (no hyperthreading?). T2 is not described in so much detail except as an Intel 3.0 GHz or 3.3 GHz processors (older, no hyperthreading?). Anyway, I go with vCPU meaning a (hyper)thread allocated on an actual CPU host. Usually this would not even be a full core but a hyperthread.
There are different types of vCPU’s on the instances, as they use different physical CPU’s. But that is a minor detail. The instance type are more relevant here:
- burstable standard,
- burstable unlimited, and
- fixed performance.
OK then, what are they?
Fixed Performance
Fixed performance instance type is the simplest. It is always allocated the vCPU’s in full. A fixed performance instance with 2 vCPU instances can run those 2 vCPUs (hyperthreads) at up to 100% CPU load with no extra charge at all times. The price is always fixed. If you don’t need the full 100% CPU power at all times, a burstable instance can be cheaper. But only if you dont "burst" too much, in which case burstable type becomes more expensive.
Burstable Standard
The concept of a burstable instance is what I find a bit complex. There is something called the baseline performance. This is what you always get, and is included in the price.
On top of the baseline performance, for burstable instances, there is something called CPU credits. Different instance types have different number of credits. Here are a few example (at the time of writing this..):
Instance type | Credits / h | Max. creds. | vCPUs | Mem. | Baseline perf. |
---|---|---|---|---|---|
T2.micro | 6 | 144 | 1 | 1GB | 10% |
T2.small | 12 | 288 | 1 | 2GB | 20% |
T2.large | 36 | 864 | 2 | 8GB | 20% * 2 |
T3.micro | 12 | 288 | 2 | 1GB | 10% * 2 |
T3.small | 24 | 576 | 2 | 2GB | 20% * 2 |
T3.large | 36 | 864 | 2 | 8GB | 30% * 2 |
M4.large | – | – | 2 | 8GB | 200% |
Baseline performance
I will use the T2.micro
from above table as an example.
The same concepts apply to other instance types as well, just change the numbers.
T2.micro
baseline performance is 10%, and there is a single vCPU instance allocated,
referring to a single hyperthread.
The 10% baseline refers to being able to use 10% of the maximum performance of this hyperthread (vCPU).
CPU credits
Every hour, a T2.micro
gets 6 CPU credits.
If the instance runs at or below the baseline performance (10% here), it saves these credits for later use. For a maximum of 144 credits saved for a T2.micro
.
These credits are always awarded, but if your application load is such that the instance can use more than the 10% baseline performance, it will spike to that higher load as soon as the CPU credit is allocated, and consume the credit immediately.
The credit is used up in full if the instance runs at 100%, and in parts if it runs higher than baseline but lower than maximum 100% performance. If multiple vCPUs are allocated to an instance, and they all run on higher than baseline, they will use multipe amounts of the CPU credits.
Well, that is what the sites I linked above say.
But here is an example, where I ran a task on a T2.micro
instance after it had been practically idle for more than 24 hours.
So it should have had the full 144 CPU credits at this point.
In the above chart, the initial spike around midnight is about 144 minutes, although the chart timeline is too coarse to show it.
It is from an RDS T2.micro
instance, under heavy write load (I was writing as much as I could all the time, from another EC2 T2.micro
instance).
So the timeline of 144 minutes seems consistent with the credit numbers.
But the CPU percentage shown here is not, since 10% should be the baseline.. uh.
It could also be that the EC2 instance responsible for loading the data into the above RDS instance is having the same CPU credit limit and thus the size of data injected for writing is also limited. Will have to investigate more later, but the shape illustrates the performance throttling and CPU credit concepts.
Considering the baseline, practically the T2.micro
is an instance running at 10% of a single thread performance of a modern server processor.
Does not seem much.
To me, the 1 vCPU definition actually seems rather misleading, as you don’t really get a vCPU but rather 10% of one.
Given 60 minutes in an hour, and 6 CPU credits awarder to a T2.micro
per hour, you get about one credit every 60/6 = 10 minutes.
If you save up and run in low performance load for 24 hours (144*10=1440minutes = 24 hours),
you can then run for the 144 minutes (2 hours 24 minutes) at 100% CPU load.
In spikes of about 10 minutes you can run for one minute equivalent of 100% load.
T2.micro
instances are described as "High frequency Intel Xeon processors", with "up to 3.3 GHz Intel Scalable Processor". So the EC2 T2.micro
instance is actually 10% of a single hyper-thread on a 3.3GHz processor. About equal to 330Mhz single hyperthread.
The bigger instances can have multiple vCPU’s allocated as shown in the table above. They also get a bit more credits, and have a higher baseline performance %. The performance percentage is per vCPU, so an instance with 2 vCPU’s and a baseline performance of 20% actually has a baseline performance of 2*20%. In this case, You are getting two hyperthreads at 20% of the CPU max capacity.
I still have various questions about this type, such as do you actually use a fraction of the instance CPU credit, or do you use it in full when going over the baseline? Can the different threads (over multiple vCPUs) share the total of 2*20%=40%, or is it just 20% per vCPU and anything above that is over baseline regardless of the other thread idling or not? But I guess I have to settle for burstable complicated, fixed simpler to use. Moving on.
Burstable Unlimited
The burstable instances can also be set to unlimited burstable mode.
In this mode, the instance can run (burst) at full performance all the time, not just limited by accumulated CPU credits. However, you still gain CPU credits as with burstable instances. In comparison to standard bursting type, if you use more CPU credits than you have, with unlimited mode you will just be billed extra for those. You will not be throttled by available credits, rather you can rack up nice extra bills.
If the average utilization rate is higher than baseline + available CPU credits, over a rolling 24-hour window, or during instance lifetime (if less than 24h), you will be billed according to each vCPU hour used over that measurement (baseline average + CPU credits).
Each vCPU hour above the extra billing threshold costs 0.05$ (5 cents USD). Considering the cost difference, this seems potentially quite expensive. Lets see why.
Comparing Prices
What do you actually get for the different instances? I used the following as basis for calculations:
- T2: 3.0/3.3GHz Xeon. AWS describes T2 instances as T2.small and T2.medium being "Intel Scaleble (Xeon) Processor running at 3.3 GHz", and T2.large at 3.0 Ghz. A bit strange numbers, but I guess there is some legacy there (more cores at less GHz?).
- T3: 3.1GHz Xeon. AWS describes this as "1st or 2nd generation Intel Xeon Platinum 8000", and "sustained all core Turbo CPU clock speed of up to 3.1 GHz". My interpretation of 3.1 GHz might be a bithigh, as the description says "boost" and "up to", but I don’t have anything better to go with.
- M5: 3.1GHz Xeon. Desribed same as T3, "1st or 2nd generation Intel Xeon Platinum 8000", and "up to 3.1 GHz"..
Instance type | CPU GHz | Base perf | Instance MHz | vCPUs | Mem. | Price/h |
---|---|---|---|---|---|---|
T2.micro | 3.3 | 10% | 330 Mhz | 1 | 1GB | $0.0116 |
T2.small | 3.3 | 20% | 660 MHz | 1 | 2GB | $0.0230 |
T2.large | 3.0 | 20% * 2 | 600 MHz * 2 | 2 | 8GB | $0.0928 |
T2.large.unl | 3.0 | 200% | 3000 MHz * 2 | 2 | 8GB | $0.1428 |
T3.micro | 3.1 | 10% * 2 | 310 MHz * 2 | 2 | 1GB | $0.0104 |
T3.small | 3.1 | 20% * 2 | 620 MHz * 2 | 2 | 2GB | $0.0208 |
T3.large | 3.1 | 30% * 2 | 620 MHz * 2 | 2 | 8GB | $0.0832 |
T3.large.unl | 3.1 | 200% | 3100 MHz * 2 | 2 | 8GB | $0.1332 |
M5.large | 3.1 | 200% | 3100 MHz * 2 | 2 | 8GB | $0.0960 |
I took the above prices from the AWS EC2 pricing page at the time of writing this. Interestingly, the AWS pricing seems so complicated, they cannot keep track of it themselves. For example, T3 has on price on the above page, and another on the T3 instance page. The latter lists the T3.micro price at $0.0209 / hour as opposed to the $0.0208 above. Yes, it is a minimal difference, but just shows how complicated this gets.
The table above represents the worst-case scenario where you run your instance at 100% performance as much as possible. It also does not include the burstable instances being able to run at up to 100% CPU load for short periods when they accumulate a CPU credit. And with the unlimited burstable types, you can get by with less if you run at or under the baseline. But, as the AWS docs note, the unlimited burstable is about 1.5 times more expensive than the fixed size instace (T3 vs M5).
Strangely, T2 is more expensive than T3, while the T3 is more powerul. So I guess other than free tier use, there should be absolutely no reason to use T2, ever. Unless maybe for some legacy dependency, or limited availability.
Conclusions
I always though it was so nice of AWS to offer a free tier, and how could they afford giving everyone a CPU to play with?
Well, it turns out they don’t. They just give you one tenth of a single thread on a hyperthreaded CPU.
This is what a T2.micro
is in practice.
I guess it can be useful for playing around and getting familiar with AWS, but yeah the marketing is a bit of.. marketing? Cute.
Still, the price difference per hour from T2.large
($0.0928) or T3.large ($0.0832) to M5.large
($0.0960) seems small.
Especially the difference between the T2 and M5 is so small it seems to make no sense.
So why go bursty, ever?
With the T3 you are saving about 15%.
If you have bursty workloads and need to be able to handle large spikes, on a really large set of servers, maybe it makes sense.
Or if your load is very low, you can get smaller (fractions of a CPU) instances using the bursty mode.
But seems to me it requires a lot of effort to profile your loads, make predictions, monitor and manage it all.
In most cases I would actually expect something like Lambda functions to be the really best fit for those types of cases. Scaling according to the need, clear pricing (which seems like a miracle in AWS), and a simple operational model. Sounds just great to me.
In the end, comparing the burstable vs fixed performance instances, it just seems silly to me to be paying almost the same price for such a complicated burstable model, with seemingly much worse performance. But like I said, for big houses, and big projects, maybe it makes more sense. Would be really interested to hear some concrete and practical experiences and examples on why use one over the other (especially the bursty instances).