WLID AWS Demo Site

Automatic starting and stopping of EC2 instances

Amazon EC2 doesn't offer a scheduling service for EC2 instances. So if your EC2 instances only need to run, for example, during business hours, you need to start and stop these services manually. Obviously you can write some sort of script that runs inside an OS somewhere, but a neater solution would be to run a serverless (AWS Lambda) script. And instead of supplying a fixed list of EC2 services that need to start and stop, why not use the EC2 instance tags to define their start/stop times? That way you can set the start/stop parameters while creating the EC2 instance.

The AWS Lambda script below will first pull a list of regions from AWS, and then for each region create a list of EC2 instances. It then looks for a tag with the name "AutoStartStop" and use this tag to determine whether to start or stop a given instance.

The AutoStartStop tag has the following format (in regex form): "(\#)?([0-9]{4})?-([0-9]{4})?(T)?". In human-readable form, this means:

The tag starts with an optional hash mark ('#') which "comments out" (deactivates) the tag.
Then comes an optional start time, written as four digits which represent HHMM. All time references are in UTC.
Then comes a mandatory dash to separate start and end times.
Then comes an optional stop time, again written as four digits HHMM.
Then comes an optional T, meaning "terminate on stop": If this indictor is present, the EC2 instance will be terminated instead of stopped. This is useful for throwaway instances that you used for test purposes, for instance.

Valid AutoStartStop tags would be "#0800-1800", "0800-1800", "0800-", "-1800T" for instance. Any EC2 instance with an invalid tag, or with a tag that contains neither a start nor an end time, will be ignored.

Make sure the script runs at least every 15 minutes, for instance by triggering this through a CloudWatch Events schedule.

When defining the Lambda script, note the following:

The Lambda script needs an IAM role that allows it to perform EC2 manipulations. The easiest is to create a role with the AmazonEC2FullAccess policy. Furthermore, the IAM role needs to have the AWSLambdaBasicExecutionRole policy, and needs a Trust Relationship so that the TE lambda.amazonaws.com can assume this role. But that's standard Lambda stuff and will be setup automatically.
Due to the number of API calls involved and their latency, the script will take at least about 15 seconds to complete. With larger amounts of EC2 instances that need to be started or stopped, you may even be looking at multiple minutes. You need to make sure that the timeout on the Lambda script is set to allow for this.
For the same reason, the latency of the API calls, you can't really bring the execution time down by increasing the amount of memory. At 128 MB memory the script will need around 15 seconds, while at 1024 MB execution time drops to about 10 seconds. The most economical option therefore is to run it at 128 MB. (And the script really only needs about 64 MB memory, so 128 MB is sufficient.)
You can decrease the run time of the script dramatically by commenting out the autodetection of the regions, and supplying a fixed list of regions. See the code (lines 12-15). With just two regions the script finishes in less than two seconds, even at only 128 MB memory.
An alternative approach would be to provision the script in each and every region where you run EC2 instances, and let each instance only handle EC2 instances in the local region. This will bring the execution time down even further. It also allows you to remove the outer loop of the script.
Executing this Lambda script every 15 minutes, and assuming a run time of 15 seconds each time, will cost about 12 cents per month according to the Lambda pricing calculator.

Here is the Lambda script itself:

import boto3
import logging
from time import gmtime, strftime
import re
import os

theTag = 'AutoStartStop'

def lambda_handler(event, context):
    now = strftime("%H%M", gmtime())
    print "Now is " + now
    
    # Use the line below to automatically detect regions
    regions = boto3.client('ec2').describe_regions()['Regions']

    # Or use the line below for a fixed set of regions - this is considerably faster if you only
    # have EC2 instances in those regions
    #regions = [ { "RegionName": "eu-west-1" }, { "RegionName": "eu-central-1" } ]

    # Or use the line below to only use the current region - you then need to provision this
    # script in every region where you want to auto start/stop EC2 instances. (You can then also
    # remove the outer loop in this script.)
    #regions = [ { "RegionName": os.environ['AWS_DEFAULT_REGION'] } ];    

    for region in regions:
        print('Region: ' + region['RegionName'] );
        ec2 = boto3.resource('ec2', region_name=region['RegionName'])
        
        for instance in ec2.instances.all():
            if instance.state['Name'] == "pending" or instance.state['Name'] == "rebooting" or instance.state['Name'] == "stopping" or instance.state['Name'] == "shutting-down":
                # Transient states. Will be picked up in the next invocation, if applicable.
                continue
            
            if instance.tags is None:
                # No tags at all.
                continue
            
            for tag in instance.tags:
                if tag['Key'] == theTag:
                    print instance.id + " " + tag['Value'] + " " + instance.state['Name']
                    
                    # Separate the AutoStartStop tag into four variables
                    m = re.search(r"(\#)?([0-9]{4})?-([0-9]{4})?(T)?", tag['Value'])
                    if m == None:
                        print "Failed evaluation for value " + tag['Value']
                        continue
                    
                    # Detect commented out tags
                    if m.group(1) == '#':
                        print "Commented out. Ignoring."
                        continue
                    
                    starttime = m.group(2)
                    endtime = m.group(3)
                    terminate = m.group(4)
                    
                    if( starttime is None and endtime is None ):
                        # No timing information given. Ignore
                        print "No starttime and no endtime given. Ignoring."
                        continue
                    
                    isActive = instance.state['Name'] == 'running'
                    
                    if( (not starttime is None) and (not endtime is None) ):
                        # Both start and endtime given. We can both start and stop the image.
                        shouldBeActive = ( starttime <= now ) and ( now <= endtime )
                        if shouldBeActive and not isActive:
                            print "Instance is not active but should be active. Start."
                            instance.start();
                        if not shouldBeActive and isActive:
                            if( terminate == "T" ):
                                print "Instance is active but should be terminated. Terminate."
                                instance.terminate()
                            else:
                                print "Instance is active but should be stopped. Stop."
                                instance.stop()
                        
                    if( (starttime is None) and not (endtime is None) ):
                        # Only an endtime given. We will never start, but only stop the instance.
                        shouldBeActive = ( now <= endtime )
                        if not shouldBeActive and isActive:
                            if( terminate == "T" ):
                                print "Instance is active but should be terminated. Terminate."
                                instance.terminate()
                            else:
                                print "Instance is active but should be stopped. Stop."
                                instance.stop()
                        
                    if( not (starttime is None) and (endtime is None) ):
                        # Only a starttime given. We will never stop, but only start the instance
                        shouldBeActive = ( starttime <= now )
                        if( (not isActive) and shouldBeActive ):
                            print "Instance is not active but should be active. Start."
                            instance.start()
    
    return ''