Building A Custom Alexa Skill – Part 6 (Part 2)

brokerandy25 · October 26, 2022, 2:34pm

Building A Custom Alexa Skill – Part 6 – Yet another 2 steps back but this time 4 forward

In my last post, I shared with you the usage of “magic strings” and why I chose to go that route. So . . . Lets talk about how they are actually used. We will “deep dive” into the following utterances:

turn {action} the {entity}
turn {action} the {entity} lights
turn {action} the {entity} please
turn {action} the {entity} lights please

When a user utters any of the above phrases, alexa will call my Lambda function and trigger my HouseLightsOnOffIntent method. (I’m only going to show some of my code, which lets me get this point across).
So my HouseLightsOnOffIntent looks like this:

async handle(handlerInput) {
        var jsonData = {};
        var response = {};
        var requestedLightsEntity = '';
        var requestedLightsAction = '';
        var speakOutput = 'There was an error talking to Node Red, please try again';
        var alexaObj = handlerInput.requestEnvelope.request.intent.slots;

        //Set up Session to record entity
        const sessionAttributes = handlerInput.attributesManager.getSessionAttributes();

        // Set Up Session Defaults using magic strings if the value isn't already defined
        if (!sessionAttributes.NextAction) {
            sessionAttributes.NextAction = START;
        }
        if (!sessionAttributes.LastCalledIntent) {
            sessionAttributes.LastCalledIntent = HouseLightsOnOff_Intent;
        }
        if (!sessionAttributes.requestedLightsEntity) {
            sessionAttributes.requestedLightsEntity = DEFAULT;
        }
        if (!sessionAttributes.requestedLightsAction) {
            sessionAttributes.requestedLightsAction = DEFAULT;
        }
        if (!sessionAttributes.State) {
            sessionAttributes.State = START;
        }

        //check for a filled slot and populate "things"
        if (alexaObj.entity.hasOwnProperty('value')) {
            requestedLightsEntity = alexaObj.entity.value;
            jsonData.entity = requestedLightsEntity;
            sessionAttributes.requestedLightsEntity = requestedLightsEntity;
        }
        if (alexaObj.action.hasOwnProperty('value')) {
            requestedLightsAction = alexaObj.action.value;
            jsonData.action = requestedLightsAction;
            sessionAttributes.requestedLightsAction = requestedLightsAction;
        }
        //Save session . . . why?  Just because we bounce through a lot of methods and I wanted to be sure that things "persist"
        handlerInput.attributesManager.setSessionAttributes(sessionAttributes);

        //Do we have an action (Code is Triggered by utterance: Alexa, tell house jarvis Kitchen lights)
        if (sessionAttributes.requestedLightsAction===DEFAULT) {
            //getSlotsFill will return a "speach string" for Alexa to respond with
            speakOutput = getSlotFill(HouseLightsOnOff_Intent, ACTION).replace('{light}', sessionAttributes.requestedLightsEntity);
                return handlerInput.responseBuilder
                    .speak(speakOutput)
                    .reprompt(speakOutput)
                    .getResponse();
        }
        //Do we have an entity (Code is triggered by utterance such as: Alexa, tell house jarvis to turn on lights)
        if (sessionAttributes.requestedLightsEntity===DEFAULT) {
            speakOutput = getSlotFill(HouseLightsOnOff_Intent, LIGHT).replace('{action}', sessionAttributes.requestedLightsAction);
            return handlerInput.responseBuilder
                .speak(speakOutput)
                .reprompt(speakOutput)
                .getResponse();
        }
        //Track where we are in our conversations (used to get response model messages)
        sessionAttributes.LastCalledIntent = HouseLightsOnOff_Intent;

        //resave session values . . . cuz . . . why not?
        handlerInput.attributesManager.setSessionAttributes(sessionAttributes);
        //write to logs
        console.log("HouseLightIntent: " + JSON.stringify(this.event));
        try {
            //response = await fetchURLwithJSON(houseLightsBrightnessURL, jsonData);
            //response.success = false;
            //If we have everything . . . and NodeRed Call has been successful . . . talk back to me
            if (((sessionAttributes.requestedLightsEntity !== DEFAULT) && (sessionAttributes.requestedLightsAction !==DEFAULT)) && response.success===true) {

                speakOutput = await getSuccessMessage(sessionAttributes);
                sessionAttributes.NextAction = DIM_BRIGHTEN;
                handlerInput.attributesManager.setSessionAttributes(sessionAttributes);
                var reprompt = await getReprompt(sessionAttributes);
                var followup = await getFollowUp();
                speakOutput = speakOutput + " " + reprompt + ' ' + followup;
                sessionAttributes.State = DIM_OR_BRIGHTEN_QUESTION;
                sessionAttributes.FollowUpFailure = false;
            }
            else
            {
                sessionAttributes.FollowUpFailure = true;
                speakOutput = await getReprompt(sessionAttributes);
            }
        }

Well . . . that is all fine. But HOW do we make it “conversational” . . . where are the smarts? Well the smarts are wrapped up in 4 methods, and a JSON object which is defined here (promise . . . this will all make sense at the end). First a couple of things of note. Remember all those constants I defined in my last post? Yup they are KEY here. Each of my keys for my JSON response model, are variables. You will see where this becomes powerful in a bit:


const IntentResponseMappings = {
    [FOLLOWUPFAILURE]:{
        "0": "I'm sorry, but I had a problem doing what you asked.",
        "1": "Oops, there was an issue doing what you asked.",
        "2": "Stupid Home Assistant couldn't do what you wanted."
    },
    [FOLLOWUPREPROMPT]:{
        "0": "Would you like me to try again?",
        "1": "Shall I try to perform your action again?",
    },
    [REINITIATE]:{
        "0": "Happy to help, what can I do?",
        "1": "Sure thing, what else can I do for you?",
        "2": "My Pleasure, what can I help you with?"
    },
    [HouseLightsOnOff_Intent]:
    {
        [MISSING_SLOT]: {
            [LIGHT]: {
                "0": "Which light would you like to turn {action}?",
                "1": "I'm sorry, it appears you didn't tell me what light you would like to turn {action}.",
                "2": "Please tell me which light you would like to turn {action}."
            },
            [ACTION]: {
                "0": "What would you like to do with the {light}?",
                "1": "What can I do for the {light}?",
                "2": "Would you like to turn on or off the {light}?"
            }
        },
        [DIM_BRIGHTEN]: {
            "0": "Would you like to dim or brighten the {light}?",
            "1": "I think you may want to change the brightenss of the {light}, is this correct?",
            "2": "Would you like to change the brightness of the {light}?",
        }
        ,
        [SUCCESS]: {
            "0": "I have turned {action} the {light}.",
            "1": "The {light} is now {action}.",
            "2": "Your wish is my command.  The {light} is now {action}."
        },
        [DONE]: {
            "0": "I have turned off the {light}, is there anything else I can do for you?",
            "1": "The {light} is now off, would you like to do anything else?",
            "2": "Your wish is my command.  The {light} is now off."
        },
        [DEFAULT]: {
            [ON]: {
                "0": "I can help with that. Which Light would you like to turn on?",
                "1": "Sure thing, what light?"
            },
            [OFF]: {
                "0": "I can help with that. Which Light would you like to turn off?",
                "1": "Sure thing, what light?"
            }
        }
    },
    [HouseLightsBrightness_Intent]: {
        [MISSING_SLOT]: {
            [LIGHT]: {
                "0": "Which light would you like to {action}?",
                "1": "I'm sorry, it appears you didn't tell me what light you would like to {action}.",
                "2": "Please tell me which light you would like to {action}."
            },
            [ACTION]: {
                "0": "What would you like to do with the {light}?",
                "1": "What can I do for the {light}?",
                "2": "Would you like to dim or brighten the {light}?"
            }
        },

        [DIM_OR_BRIGHTEN_QUESTION]: {
            "0": "Would you like to dim or brighten the {light}?",
            "1": "Would you like to change the brightness of the {light}?"
        },
        [SUCCESS]: {
            [DIM_SUCCESS]: {
                "0": "I have dimmed the {light}.",
                "1": "The {light} has been dimmed.",
                "2": "Your wish is my command.  The {light} has been dimmed."
            },
            [BRIGHTEN_SUCCESS]: {
                "0": "I have brightened the {light}.",
                "1": "The {light} has been brightened.",
                "2": "Your wish is my command.  The {light} is now brighter."
            },
            [PERCENTAGE_SUCCESS]: {
                "0": "I have set the {light} to {percent} percent.",
                "1": "The {light} is now set to {percent} percent."
            }
        },
        [DEFAULT]: {
            [ON]: {
                "0": "I can help with that. Which Light would you like to turn on?",
                "1": "Sure thing, what light?"
            },
            [OFF]: {
                "0": "I can help with that. Which Light would you like to turn off?",
                "1": "Sure thing, what light?"
            },
            [TIMER]: {
                "0": "Do you want to turn off the {light} light after a period of time?",
                "1": "Do you want to leave the {light} light {action} for a period of time?"
            }
        }
    },
    [Automations_Intent]: {
        [SUCCESS]: {
            [START]: {
                "0": "I have started the {automation}.",
                "1": "The {automation} has been started.",
                "2": "I have triggered the {automation}"
            },
            [STOP]: {
                "0": "I have stopped the {automation}.",
                "1": "The {automation} has been stopped.",
                "2": "I have told the {automation} to stop."
            }
        },
        [ERROR_MESSAGE]: {
            "0": "I'm sorry, but I don't know that automation."
        }
    },
    [REPROMPT_ERROR]: {
        "0": "There has been an error, could you please restate what you would like to do?",
        "1": "It appears that I don't understand what you are asking, could you please restate, and file a bug with hunky hubby?",
        "2": "Oops! Things are borked, please try again."
    },
    [REPROMPT_SUCCESS]: {
        "0": "I have completed your task, would you like to do anything else?",
        "1": "Your wish was my command, can I do anything else for you?",
        "2": "Done. What else would you like?"
    },
    [FOLLOWUPS]: {
        "0": "When you are finished, please tell me goodbye.",
        "1": "If you are done, please tell me goodbye.",
        "2": "Tell me goodbye if you are done"
    },
    [RESTART]: {
        "0": "What can I do for you?",
        "1": "What else can I do for you?",
        "2": "What would you like me to do?"
    },
    [GOODBYE]: {
        "0": "Ok then, have a great day.",
        "1": "Goodbye",
        "2": "Alright.  Thank you, and goodbye.",
        "3": "Have a great day."
    },
    [ANYTHINGELSE]: {
        "0": "Ok then, is there anything else I can help you with?",
        "1": "Alright.  Are you finished? If so, please tell me goodbye."
    },
    [TIMER]: {
        [DEFAULT]: {},
        [SUCCESS]: {},
        [DONE]: {}
    }
}

All the above JSON object does . . . is provide a “holding mechanism” for things for Alexa to respond back to you with. So . . . for example. If you tell alexa to “turn on kitchen lights”. Then the following method is called in the “success portion” of my House lights intent. (this is the code that is in the if block that starts after this line:

if (((sessionAttributes.requestedLightsEntity !== DEFAULT) && (sessionAttributes.requestedLightsAction !==DEFAULT)) && response.success===true)

you will see a couple of method calls:

speakOutput = await getSuccessMessage(sessionAttributes);
//Code removed for brevity
var reprompt = await getReprompt(sessionAttributes);
               var followup = await getFollowUp();

with an entire “response” built like this:
speakOutput = speakOutput + " " + reprompt + ' ' + followup;
so what does getSuccessMessage(), getRepompt(), and getFollowUp() do? Well . . . they get “random” sentences from my IntentResponseMappings document in the following manner:

const getSuccessMessage = (sessionAttributes) => {
    var toReturn = 'default text';
    var obj;
    switch (sessionAttributes.LastCalledIntent) {
        case HouseLightsOnOff_Intent:
            var lightEntity = sessionAttributes.requestedLightsEntity;
            var actionEntity = sessionAttributes.requestedLightsAction;

            obj = IntentResponseMappings[HouseLightsOnOffIntent][SUCCESS];
            toReturn = obj[getRandomInt(countKeys(obj) - 1).toString()].replace("{light}", lightEntity).replace("{action}", actionEntity);
            break;
//More code here, but you should get the point

const getReprompt = async (sessionAttributes) => {
    var response = {};
    var responseText = 'default';
    var obj;
    switch (sessionAttributes.LastCalledIntent) {
        case HouseLightsBrightness_Intent:
            if (sessionAttributes.NextAction === START) {
                //response = 
            }
            if (sessionAttributes.FollowUpFailure === true) {
                obj = IntentResponseMappings[FOLLOWUPREPROMPT];
                var obj2 = IntentResponseMappings[FOLLOWUPFAILURE];
                responseText = obj2[getRandomInt(countKeys(obj)-1).toString()] +' '+obj2[getRandomInt(countKeys(obj2)-1).toString()];
            } else {
const getFollowUp = () => {
    var obj = IntentResponseMappings[FOLLOWUPS];
    return obj[getRandomInt(countKeys(obj) - 1).toString()];
}

Supporting “Methods”

function getRandomInt(max) {
    return Math.floor(Math.random() * max);
}

function countKeys(obj) {
    return Object.keys(obj).length;
}

But what is going on in the code?!?!?!? It is actually rather simple. Really.
If you look at IntentResponseMappings[HouseLightsOnOff_Intent][Success] you will see 3 “possible” phrases that are within that key. We count the number of keys (allowing us to just add new phrases to the mapping doc without rewriting a bunch of logic). We then get a random number between 0 and count of keys minus 1. We do this because counting starts at 1, but “arrays” start with a 0. I might have 3 optional responses, but my number starts at 0. 0,1,2 (array based) is the same number of elements as 1,2,3. Hope this makes sense . . . it’s a pretty common appdev problem.

So . . when I turn on my House lights, Alexa will respond with 1 of 27 possible combinations as shown above. I get this because I have 3 “sentences” that fill the response. A confirmation of the action, a follow up question, and a “reminder to tell me goodbye”. Each of these 3 “things” have 3 possible values . . . 3x3x3 = 27. So in reality . . . if things are “truly random”, then it would take us 28 times of asking “House Jarvis” to do something to my lights, for me to hear the same response. (computer randomness doesn’t necessarily work that way . . . but the math does).
SO YAY!!! We are on our way to being “conversational” (or at least some variety in response).

But what about being predictive? We as humans are able to carry on a conversation because we are aware of the CONTEXT of the conversation, what has been said, and what are a short list of possible “responses”.

And this is where it gets HARD.

So here is the situation . . . you have just asked Alexa to turn on your kitchen lights . . . THEN WHAT?!?!?! Should she ask you if you want to change their brightness? Their color? Turn them off after a period of time? Should she ask you ANYTHING at all?

So . . . since I have several automations that based upon many factors change light brightness / color temp, etc . . . I have chosen to ask about brightness.

What does this mean? This mean when I turn on the lights . . . my house Jarvis is gonna ask me if the lights are at a correct level and if I want to change them.

What are the possible responses to a question such as “would you like to change the brightness of the kitchen lights?” That is a binary question. You only have two possible responses: Yes or No.
So I say “yes” . . . (sidebar here: This actually triggers the builtin Amazon.YesIntent). So my “voice flow” looks like this:

ME: “Alexa, tell house Jarvis to turn on Kitchen Lights” (Triggers HouseLightsOnOffIntent)
RESPONSE: I have turned on the kitchen lights. Would you like to change the brightness of the kitchen lights? If you are done, please tell me goodbye.
ME: Yes (Triggers Amazon.YesIntent)
RESPONSE: Please tell me if you would like to brighten or dim the kitchen lights
ME: Brighten (Triggers HouseLightsBrightnessIntent)
RESPONSE: I have brightened the kitchen lights. Would you like to turn off the lights after a period of time? If you are done, please tell me goodbye
ME: No (Triggers Amazon.NoIntent)
RESPONSE: OK, is there anything else I can do for you? If you are done, please tell me goodbye
ME: Goodbye (Triggers Amazon.StopCancelIntent)

And if you have missed the complexity in there . . . by nature, we have interacted with 5 different Intents for just one “voice flow”. This is where magic strings for keeping state inside of the session object, and the voice model JSON object become hyper powerful (in my opinion).

How does session fit into this? So lets look at the above “voice flow”. When I ask “turn on Kitchen Lights”, I fill 2 session variables: entity and action (technically I have different names, but simplicity is king here). After a successful call to Node Red, this block of code executes:

speakOutput = await getSuccessMessage(sessionAttributes);
sessionAttributes.NextAction = DIM_BRIGHTEN;
handlerInput.attributesManager.setSessionAttributes(sessionAttributes);
var reprompt = await getReprompt(sessionAttributes);

the last line (var reprompt) is where we get our “follow up ask”. If you look, you will see I have set the sessionAttribute.NextAction to “dim_brighten”. I have chosen for my conversation models current implementation that once I turn on lights, I MAY want to set their brightness. By setting the “NextAction” to be “DIM_BRIGHTEN” then my code for getReprompt will return an appropriate “reprompt” (NOTE: I will be adding more “smarts” to this, as Dim/Brighten makes more sense during a certain period of the day, whereas a timer may make more sense (like towards the end of the day / overnight)

I hope this is all making sense for anyone who is reading these.