Building A Custom Alexa Skill – Part 6 – Yet another 2 steps back but this time 4 forward

Prior Post: Building A Custom Alexa Skill – Part 5

As I have undergone this journey, there has been a lot of “just in time” design work that I have been doing. It has definitely been a case of a couple of steps forward, and a one back. As such I have chosen to refactor “YET AGAIN” (sigh).

As I was working through the interaction model, I realized that my goals were to be conversational, not to just execute commands. Grabbing slot values from utterances is EASY. Making it “smart” is HARD. With that said, be forewarned . . . I’m going to be sharing a hot mess of issues, and how I have chosen to work through them.

What I have found while undergoing this exercise is NOT that building your voice interaction model is hard, or designing your interactions with NodeRed or any of that stuff . . . but how do I make this smart? Case in Point. An utterance will be something along the lines of “Alexa, tell House Jarvis to turn on the kitchen lights”. If you remember from one of my previous posts, there are 2 slot values in the utterance “turn on the kitchen lights”. An Action: “on”, and an Entity: “kitchen”. Easy peasy right? Create an Intent, with a couple of utterances such as:

  • turn {action} the {entity}
  • turn {action} the {entity} lights
  • turn {action} the {entity} please
  • turn {action} the {entity} lights please

So I talk to my Alexa device, submit the “utterance”, capture the slot values, send them to NodeRed, and do some validation, etc. within NodeRed.

BUT THEN WHAT? (and to me, even the addition of Dialogs and things to Alexa, “she” is still a “command and response” application). AKA: I ask her to do something, and she does it . . . DONE.

So I realized that the true power in this skill was the interaction model. How do I talk to Alexa and how does she respond (beyond just doing what I asked). And this is where I have been spending most of my time. Again . . . REST call to Node Red, capture the msg object, and do my things is EASY. Conversations are hard.

So here is what I currently have for my voice “model”, and I will deep dive into it. (Be warned . . . a LOT of code coming your way)

const HouseLightsOnOff_Intent = 'HouseLightsOnOffIntent';
const HouseLightsBrightness_Intent = 'HouseLightsBrightnessIntent';
const Automations_Intent = 'AutomationsIntent';
const START = 'START';
const STOP = 'STOP';
const TOGGLE = 'TOGGLE';
const DIM_BRIGHTEN = 'DIM_BRIGHTEN';
const DIM_SUCCESS = 'DIM_SUCCESS';
const BRIGHTEN_SUCCESS = 'BRIGHTEN_SUCCESS';
const PERCENTAGE_SUCCESS = 'PERCENTAGE_SUCCESS';
const SUCCESS = 'SUCCESS';
const DONE = 'DONE';
const ERROR_MESSAGE = 'ERROR_MESSAGE';
const COLOR = 'LIGHT_COLOR';
const TIMER = 'SET_TIMER';
const LIGHTS_ON_OFF = 'LIGHTS_ON_OFF';
const NONE = 'None';
const STATE_START = "STATE_START";
const DIM_OR_BRIGHTEN_QUESTION = "DIM_OR_BRIGHTEN_QUESTION";
const REPROMPT_ERROR = 'REPROMPT_ERROR';
const REPROMPT_SUCCESS = 'REPROMPT_SUCCESS';
const FOLLOWUPS = 'FOLLOW_UPS';
const GOODBYE = 'GOODBYE';
const ANYTHINGELSE = 'ANYTHINGELSE';
const LOOKFORGOODBY = 'LOOKFORGOODBYE';
const THERMOSTAT = 'THERMOSTAT';
const RESTART = 'RESTART';
const DEFAULT = 'Default';
const ON = 'ON';
const OFF = 'OFF';
const MISSING_SLOT = 'MISSINGSLOT';
const LIGHT = 'LIGHT';
const ACTION = 'ACTION';
const FOLLOWUPFAILURE = 'FOLLOWUPFAILURE';
const FOLLOWUPREPROMPT = 'FOLLOWUPREPROMPT';
const REINITIATE = 'REINITIATE';

So . . . lets break down in what is going on with JUST the code above.

First, the usage of Constants to replace the usage of “magic strings”. I am defining “everything” via these constants. This allows me to do things such as:

  1. Set variables to a “known string” and eliminate typos.
  2. Perform logic operations against a “known string”.
  3. Dynamically build response objects
  4. Dynamically build my JSON “speech model” (more on that in my next post)

Code Examples tied to above:
#1

        if (!sessionAttributes.LastCalledIntent) {
            sessionAttributes.LastCalledIntent = Automations_Intent;
        }

#2

if (sessionAttributes.requestedLightsAction===DEFAULT) {
            speakOutput = getSlotFill(HouseLightsOnOff_Intent, ACTION).replace('{light}', sessionAttributes.requestedLightsEntity);
                return handlerInput.responseBuilder
                    .speak(speakOutput)
                    .reprompt(speakOutput)
                    .getResponse();
        }

#3

if (sessionAttributes.NextAction === DIM_OR_BRIGHTEN_QUESTION) {
                    obj = IntentResponseMappings[HouseLightsBrightness_Intent][DIM_OR_BRIGHTEN_QUESTION];
                    responseText = obj[getRandomInt(countKeys(obj) - 1).toString()].replace("{light}", sessionAttributes.requestedLightsEntity).replace("{action}", sessionAttributes.requestedLightsAction);
                }

So ultimately, these “magic strings”, allow for a significant amount of flexibility within this application. In addition, it also has saved me HOURS of not chasing down typos because I misspelled a word. At least if I have spelled it wrong in these constants, then it will always be “wrong” but right at the same time.

In my next post . . . I will talk about the actual “response model” as I’m calling it and how these constants “empower” my conversations.