WLID AWS Demo Site

Burstable CPU - T2 instances

What happened here?

You are looking at three CloudWatch graph taken from a t2.micro image. This image ran a special tool that performs calculations at a certain percentage of CPU usage. It reports the amount of work done to CloudWatch, with a granularity of one minute. By varying the workload (CPU usage) of this image you can see the effect of CPU credits.

The baseline performance value for a t2.micro image is 10%. That is thus the average CPU utilization that Amazon expects you to use. Everytime you use less, you earn credits, everytime you use more, you lose credits. As one credit equals one minute of 100% CPU utilization, a t2.micro can earn at most 60 * 10% = 6 credits per hour if it's fully idle, and it uses 60 * (100% - 10%) = 54 credits per hour if it's fully busy. In order to facilitate quick booting of a t2 instance, it gets half an hours worth of credit upon (re)start.

In order to demonstrate this, I started a t2.micro instance. The following happened to this instance:

The image was booted at 7:30 and started with 30 credits (half an hour worth of CPU utilization).
Between 7:30 and 8:30 the image was almost 100% idle. Therefore it accumulated 10% * 60 = 6 credits. So at 8:30 the image had 36 credits.
Between 8:30 and 9:30 the image was approximately 40% busy, processing about 360 WorkUnits per minute. During this period the system lost (40% - 10%) * 60 = 18 credits. At 9:30 the credit balance was therefore 18.
At 9:30 the workload was increased to 100% CPU utilization. The system now needs (100% - 10% ) * 60 = 54 credits per hour to sustain this. However, only 18 credits were available so we could sustain this load for only about 20 minutes. In the period between 9:30 and 9:45 the system was therefore able to run at full speed, processing about 900 workunits per minute.
At about 9:45 the credits were close to being exhausted and the system was gradually tuned back by Amazon (over a 15-minute interval) to the nominal 10% CPU baseline. This is done by stopping the CPU for, eventually, 90% of the time, so the system will appear to be only 10% busy. The amount of work that the CPU could perform in this condition was drastically reduced: from about 900 workunits per minute to about 90 workunits per minute.
The busycpu program was stopped at 10:10 after which the CPU utilization became 0% again, and the system started accumulating CPU credits again.
The demo was stopped at 10:30 when the instance had accumulated 2 CPU credits.

Some further notes:

Although CloudWatch can handle a granularity of one minute, EC2 metrics like CPU Utilization and CPU Credits are only reported every five minutes. That's why some of the events mentioned above do not seem to start at the same time.
The tool used to generate the CPU load was written specifically for this purpose. It repeatedly calculates the value of Pi to 500 digits. This takes about 0.065s and is called, for the purpose of this demo, a "WorkUnit". Every minute it prints the number of calculations done, and reports this to CloudWatch. By inserting a specific amount of idle time after each calculation, I can control the average amount of CPU time used for the whole process.