WLID AWS Demo Site

Alexa Demo

What's this?

Alexa is Amazons text to speech and speech to text interpreter, used by the Echo range of devices (Echo Tap, Echo Dot and others). By speaking to Alexa, you can do basic things like playing music, managing timers and alarms, and manage your Smart Home. However, Alexa can also be extended through the use of "skills". Skills are applications that are controllable through Alexa, and use a backend such as AWS Lambda, to implement their logic. This is what you see in the demo: A sample skill, written in AWS Lambda, that implements an aircraft checklist.

How does this work?

A full Alexa solution consists of a number of components.

The first component is an Alexa-enabled device: An Echo Tap or Echo Dot, or something else that's Alexa-enabled. This device communicates with Alexa via the internet.

Alexa itself is hosted by Amazon in the cloud. It performs the text-to-speech and speech-to-text translation. It also holds the configuration for all your Alexa-enabled devices. In this context, the most important configuration item is the list of enabled skills: Alexa will only match whatever you say to your Echo, to the skills that are enabled for your device.

A skill can easily be compared to a mobile app. Developers can develop Alexa skills and make these available via the Alexa portal. Users can then select which skills they want to enable on their Echo devices. By saying the name of the skill out loud, Alexa will start the skill session. In this particular example, the "Aircraft Checklist" skill is activated by saying "Alexa, Aircraft Checklist".

When developing a skill, you obviously have to supply some meta-information such as the name of the skill, the "invocation name" and where the backend logic (the Lambda script) is. But most importantly, you have to supply the "interaction model". This interaction model is essentially the way the user can interact with the skill via voice commands.

Interaction Model

An interaction model consists of at least two parts. The first part is the list of intents: The list of things the user can let the skill perform. Based on the phrases uttered by the user, Alexa will determine the intent of the user, and forward this intent to the backend logic. In this demo, there are only a few intents that the user can express. The intent schema looks like this:

The next thing to define is the phrases that the user can utter to convey an intent. This looks like this:

So when the user utters "check" or "done", both will ensure that Alexa sends the "CheckIntent" intent to the backend logic.

The AMAZON.* intents are built-in intents. Amazon has already defined a set of standard phrases that are mapped to these intents. You can extend the list of phrases like above.

Variable data

In this example, there is no variable data. The user can only utter the exact phrases that are listed in the interaction model. More advanced uses would allow the user to supply variable data. In that case, you also need to define "slots". Slots are lists of things that the user can say as part of a specific phrase. As an example, a more advanced implementation of the Aircraft Checklist demo would incorporate multiple checklists. You would then have a "checklist type" slot that would include the phrases "Preflight", "Departure", "Approach" and so forth. Furthermore, you would have a "SelectChecklistIntent" that would be invoked as follows: "Start {ChecklistType} Checklist".

This is what a slot definition would look like:

This is what an intent description with variable data would look like:

And this is what the phrases for this intent would look like:

When the "SelectChecklistIntent" is passed to your backend logic, the intent JSON structure also incorporates a "slots" element which lists the values for the various slots that were defined for the intent.

Intent/Response structure

The last component of the solution is the AWS Lambda script that handles the application logic. See the complete script below.

The Lambda script is called with two parameters. The "event" parameter contains everything that was known to Alexa: The device that was used, the skill ID, the intent, any variable data that was part of the intent, and so forth. This comes in the form of a complex JSON structure. An example of such a structure is this:

{
  "session": {
    "sessionId": "SessionId.e5810520-e381-4d38-a64b-08231aa52f69",
    "application": {
      "applicationId": "amzn1.ask.skill.24bbe4d0-0213-4053-b26b-ec7cda76f067"
    },
    "attributes": {},
    "user": {
      "userId": "amzn1.ask.account.AFIOT732NPBMCWKZSD475..."
    },
    "new": true
  },
  "request": {
    "type": "IntentRequest",
    "requestId": "EdwRequestId.29d9df46-9d12-4efe-a0e6-ab56d33d898b",
    "locale": "en-US",
    "timestamp": "2016-11-16T13:54:42Z",
    "intent": {
      "name": "AMAZON.StartOverIntent",
      "slots": {}
    }
  },
  "version": "1.0"
}

The application needs to parse this request and eventually provide a response. Such a response will look like this:

{
  "version": "1.0",
  "response": {
    "outputSpeech": {
      "type": "PlainText",
      "text": "Departure checklist. Lights on"
    },
    "card": {
      "content": "Departure checklist. Lights on",
      "title": "Checklist",
      "type": "Simple"
    },
    "reprompt": {
      "outputSpeech": {
        "type": "PlainText",
        "text": "Lights on"
      }
    },
    "shouldEndSession": false
  },
  "sessionAttributes": {
    "currentChecklistItem": 0
  }
}

In this response, the "response" is what Alexa will read to you. The "reprompt" is activated when you wait too long for an answer. And the "card" information is stored in the Alexa web interface for later reference. It is also the place where you can confirm that Alexa heard the right thing, so that Alexa is able to improve its speech-to-text recognition. "shouldEndSession" should be set to "true" if this is the last message in the conversation, and the skill is finished. This will cause your Echo device to turn off again.

Once you have analyzed the request and generated the response, you need to call "context.succeed( response )" so that your response is forwarded to Alexa. It is not necessary to "return" any data.

Maintaining state

An AWS Lambda script is stateless. This means you cannot use, for instance, an internal variable to keep permanent data. However, in most cases you will find that you will need to keep the state of your application somehow. This is done by including "sessionAttributes" in your response. These sessionAttributes are sent to the Alexa server, and will be included verbatim in the next intent that was generated by the same Echo device. It is possible to include multiple keys in your sessionAttributes, thereby putting the whole application state in it. But you can also keep the whole application state in, for instance, DynamoDB, and only include some sort of session identifier in the sessionAttributes - similar to how web browsers and web servers use cookies. In this particular demo the only state I need to keep track of, is the Current Checklist Item.

Backend logic

Here is the full Lambda script:

/* AWS Lambda (Node.JS) script that implements the Alexa "Aircraft Checklist" skill */

'use strict';

var checklist = [
    "Lights on",
    "Fuel pump on",
    "Transponder on",
    "Mixture full rich" ];
    
var CARD_TITLE = "Checklist";
    
/* Main routine */
exports.handler = function( event, context ) {
    try {
        console.log("event.session.application.applicationId=" + event.session.application.applicationId);
        
        if (event.session.application.applicationId !== "amzn1.ask.skill.24bbe4d0-0213-4053-b26b-ec7cda76f067") {
            context.fail("Invalid Application ID");
        }
    
        if( event.session.new ) {
            onSessionStarted( {requestId: event.request.requestId}, event.session);
        }
        
        if (event.request.type === "LaunchRequest") {
            onLaunch(event.request,
                event.session,
                function callback(sessionAttributes, speechletResponse) {
                    context.succeed(buildResponse(sessionAttributes, speechletResponse));
                });
                
        } else if (event.request.type === "IntentRequest") {
            onIntent(event.request,
                event.session,
                function callback(sessionAttributes, speechletResponse) {
                    context.succeed(buildResponse(sessionAttributes, speechletResponse));
                });
                
        } else if (event.request.type === "SessionEndedRequest") {
            onSessionEnded(event.request, event.session);
            context.succeed();
        }

    
    } catch (e) {
        context.fail("Exception: " + e );
    }
}

/**
 * Called when the session starts.
 */
function onSessionStarted(sessionStartedRequest, session) {
    console.log("onSessionStarted requestId=" + sessionStartedRequest.requestId
        + ", sessionId=" + session.sessionId);

    // add any session init logic here
}

/**
 * Called when the user invokes the skill without specifying what they want.
 */
function onLaunch(launchRequest, session, callback) {
    console.log("onLaunch requestId=" + launchRequest.requestId
        + ", sessionId=" + session.sessionId);

    getWelcomeResponse(callback);
}

/**
 * Called when the user specifies an intent for this skill.
 */
function onIntent(intentRequest, session, callback) {
    console.log("onIntent requestId=" + intentRequest.requestId
        + ", sessionId=" + session.sessionId);

    var intent = intentRequest.intent,
        intentName = intentRequest.intent.name;

    if( intentName === "CheckIntent" )
    {
        handleCheckRequest( intent, session, callback );
    } else if ( intentName === "AMAZON.RepeatIntent" ) {
        handleRepeatRequest( intent, session, callback );
    } else if ( intentName === "AMAZON.StartOverIntent" ) {
        handleStartOverRequest( intent, session, callback );
    } else if ( intentName === "AMAZON.CancelIntent" ) {
        handleCancelRequest( intent, session, callback );
    } else {
        throw "Invalid intent";
    } 
}

/**
 * Called when the user ends the session.
 * Is not called when the skill returns shouldEndSession=true.
 */
function onSessionEnded(sessionEndedRequest, session) {
    console.log("onSessionEnded requestId=" + sessionEndedRequest.requestId
        + ", sessionId=" + session.sessionId);

    // Add any cleanup logic here
}

/* Called when the skill is started */
function getWelcomeResponse(callback) {
    var currentChecklistItem = 0;
    var speechOutput = "Departure checklist. " + checklist[currentChecklistItem];
    var repromptText = checklist[currentChecklistItem];
    var sessionAttributes = { "currentChecklistItem": currentChecklistItem };
    var shouldEndSession = false;
 
    callback(sessionAttributes,
        buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
}

/* Called when the user says "check" - go to the next checklist item */
function handleCheckRequest( intent, session, callback ) {
    var speechOutput = "";
    var repromptText = "";
    var sessionAttributes = {};
    var shouldEndSession = false;
    var currentChecklistItem = parseInt( session.attributes.currentChecklistItem );
    
    currentChecklistItem++;
    
    if( currentChecklistItem >= checklist.length ) {
        speechOutput = "Departure checklist complete.";
        repromptText = "Departure checklist complete.";
        sessionAttributes = {};
        shouldEndSession = true;
        
        callback(sessionAttributes,
            buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
    } else {
        speechOutput = checklist[currentChecklistItem];
        repromptText = checklist[currentChecklistItem];
        sessionAttributes = { "currentChecklistItem": currentChecklistItem };
        
        callback(sessionAttributes,
            buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
    }
}

/* Called when the user says "repeat" */
function handleRepeatRequest( intent, session, callback ) {
    var currentChecklistItem = parseInt( session.attributes.currentChecklistItem );
    var speechOutput = checklist[currentChecklistItem];
    var repromptText = checklist[currentChecklistItem];
    var sessionAttributes = { "currentChecklistItem": currentChecklistItem };
    var shouldEndSession = false;
 
    callback(sessionAttributes,
        buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
}

/* Called when the user says "start over" */
function handleStartOverRequest( intent, session, callback ) {
    var currentChecklistItem = 0;
    var speechOutput = "Departure checklist. " + checklist[currentChecklistItem];
    var repromptText = checklist[currentChecklistItem];
    var sessionAttributes = { "currentChecklistItem": currentChecklistItem };
    var shouldEndSession = false;
 
    callback(sessionAttributes,
        buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
}

/* Called when the user says "cancel" */
function handleCancelRequest( intent, session, callback ) {
    var speechOutput = "Departure checklist cancelled.";
    var repromptText = "Departure checklist cancelled.";
    var sessionAttributes = {};
    var shouldEndSession = true;

    callback(sessionAttributes,
        buildSpeechletResponse(CARD_TITLE, speechOutput, repromptText, shouldEndSession));
}


/**
 * Helper function to build the speechlet response 
 */
function buildSpeechletResponse(title, output, repromptText, shouldEndSession) {
    return {
        outputSpeech: {
            type: "PlainText",
            text: output
        },
        card: {
            type: "Simple",
            title: title,
            content: output
        },
        reprompt: {
            outputSpeech: {
                type: "PlainText",
                text: repromptText
            }
        },
        shouldEndSession: shouldEndSession
    };
}

/**
 * Helper function to build the response
 */
function buildResponse(sessionAttributes, speechletResponse) {
    return {
        version: "1.0",
        sessionAttributes: sessionAttributes,
        response: speechletResponse
    };
}