Smart speakers such as echo or google home, have enabled us to do wonderful things with speech. Instead of just plain commands, they can also start a smooth conversation with you and do tasks for you, such as ordering food, making payments, reading news, listening to music, reading books, telling jokes, entertain your kids and a lot more stuff.
Just a basic “How does it work?” –
Echo is a smart speaker and a very smart listener 🙂 . With some commands you can invoke Alexa skills and start a conversation. What is an Alexa skill? Well, amazon provides a platform to build applications (aka skills) which enable Alexa to do a lot more stuff. This platform can be used by any developer after registering with an amazon account.
To understand how the conversation looks like, lets take an example :
You : Alexa, start “Food Recipe App” (This is the invocation command)
Alexa: Welcome to the Food Recipe App, what would you like to cook?
You: Today I would like to cook pasta.
Alexa: That’s a great choice! Tell me what all vegetables you have ?
You: I have broccoli, spinach, zucchini, mushrooms and some lettuce.
Alexa: Awesome, lets start with boiling the pasta ……
and the recipe goes on….
Another more complex conversation can be as following :
You: Alexa, check if I have enough milk in my fridge.
Alexa: Sure, I see that you only have milk for today. Would you like me to order milk ?
You: No, just add it to my shopping list.
Alexa: Sure, Milk is now on your list.
…after some time…
You : Alexa, can you order my shopping list from amazon ?
Alexa: Sure, Let me start the order.
Alexa: I have made an order for you. Your order will arrive within 2-3 days. Would you like to erase the shopping list?
You: Yes
Alexa: Your shopping list is erased. Have a nice day!
What exactly happened behind the scenes ?
In simple words – Alexa used its skills to perform some tasks for you.
Behind the scenes – Echo converts your speech to some commands which are understood by the Alexa skill which is invoked. These commands are processed by the application behind the skill and a response is sent back. This response is converted to a speech (which is programmed) and echo reads it out for you. This is a very simple explanation to understand what happens when you talk to Alexa.
So basically, if you want to enable speech invocation for your api’s, you can integrate with Alexa and build a skill to make it speech enabled 🙂 .
– My Experience with Alexa-
- Alexa is stupid at times 🙂
- I have observed that despite being able to process complex sentences, it is unable to understand some simple words. Due to which you might need to change your skill conversation.
2. Alexa, closes the session if the user doesn’t respond within a specific time.
- For security reasons Alexa cannot keep your session open forever, otherwise some people will create skills which will keep the session open and listen all your secret conversation 🙂
- As per my findings, Alexa waits for the user to respond within 8 secs. If he doesn’t it simply closes the session and the blue light goes off. To wait longer, you can add a re-prompt to your skill, in which case Alexa will re-prompt the message to the user and wait for another 8 secs.
- However you can manage session on the application side. It gives you more control on the conversation and enables you in building better conversational flow, which is a great experience for the customer.
3. Utterances – The more utterances you provide for your skill the better.
- Utterances means the sentences which you want for your skill to understand for a specific task (AKA intent).
- Try to add utterances for each intent such that they are not similar. Otherwise Alexa can mix up two intents and your conversation will switch to a different intent altogether.
4. Slots – Date time slots are not completely supported in Alexa
- Slots are important attributes which you want to extract from the user speech. For example : User says to Alexa “Set a reminder for me to take medicines”. Now you as a developer want to extract “take medicines” from the user speech to store it somewhere and set a reminder. You can do it using slots. So it will look something like “Set a reminder for me to {reminderSlot}” or “Set a reminder to {reminderSlot}”. You have to define your slots if you don’t use the standard ones. Note that the above two sentences are two utterances for your intent. So user can say any of them to invoke your intent (task).
Overall, I liked the experience I had with Alexa and hopefully will try out some new skills over Alexa and google home as well. Alexa have an additional benefit as you can leverage AWS to do a lot more stuff easily and fast.
Alexa definitely has a good potential to provide a new dimension to conversational tasks. As more and more developers are building skills over Alexa, it is improving and in future will provide many more cool features.
Note – You can build skills or try out just for fun, even if you don’t have an echo. Amazon provides simulators to test your apps on Alexa. So before you buy an echo try out on simulator.
Hope you like the blog.
Important links : alexa.amazon.com and developer.amazon.com
Thanks,
Kaivalya Apte
As developer perspective, very nice details. Great kaivalya.