Why Voice User Interface is Hard
- Posted by John Ackerman
- On April 3, 2018
- 0 Comments
- Smart Speaker, User Interface, Voice
Voice, they tell us, in the next big thing in user interface (UI) design. Record sales of voice assistants, including devices like Amazon’s Echo, Google Home and Apple HomePod, indicates a clear trend in the direction of more voice engagement. Even with the current trends, however, voice has not taken off in the way that was being predicted 2 or 3 years ago. This post aims to take a look at some possible reasons why, and makes some suggestions for improvement.
Currently, most voice user interfaces are limited in their scope to simple query/response patterns. An example might by asking Google Home to “play me a song”. In this post, we look at the voice user interface from the standpoint of a deeper conversation meant to achieve one or more goals.
Problem 1 – Where am I?
When using a web interface, the question “where am I?” is easy to answer. The user can always see what they have done, as well as what is currently expected of them. Progress bars make knowing where you are going and where you are in a process, quick and easy. In a pure voice user interface, you are 100% reliant on the user to remember what they’ve done and to know where they are. This has a tendency to make users feel extremely uncomfortable.
User can be made to feel more comfortable with your voice system by being given a roadmap and continuous updates. The user should be informed early on about what to expect from the conversation they are about to have. Along the way, the user should be informed of how far they are into the process, and what is coming up next. Giving the user keywords at transition points, which allow the user to return to certain parts of a conversation, can also be helpful in resolving these kinds of questions.
Problem 2 – Did you understand me?
When typing into a form, or interacting with a page using a mouse, a user is given immediate feedback that their intent has been understood. With voice user interfaces, users are much less sure of this. You can tell that a user is uncomfortable because their speech patterns change. Users start to speak in slow, deliberate, staccato voice to help the machine understand them. Users may begin to shout, or at least speak more loudly so that the computer can hear them better.
The voice system should always give feedback, especially at critical junctures, that it understood what the user was trying to say. This can be done in the form of questions like “did you say __________” or “i heard you say ____________, is that right?”. This gives the user the confidence that the system is understanding them. Further, your voice system should be resilient to the point that, even if it cannot immediately understand the user, they still have a path forward. This can include directing them to a different interface, or transferring a discussion to a live human operator in the worst cases.
Problem 3 – What do you want me to say?
On a web form, control options are generally well understood. Text boxes and text areas are used for general responses. Drop down lists, checkboxes and radio buttons are used when specific responses are expected from the user. In a voice system, however, there is only a question and an answer. If, for example, we want to obtain a users title (Mr, Mrs, Doctor, etc), on a web form we just present the options in a drop down list. Using voice, however, the user may not know what specifically to say or how your system expects him to say it. This could result in unexpected entries and odd behavior from your system.
A tiered approach is the best way to resolve these kinds of questions. The voice system can begin with an open ended query. If the response is understood and matches the systems expectations, then the system can continue as usual. If the response is not understood, the voice system should then present the user with a list of spoken options. The user should be able to interrupt this list of spoken options at any point. Artificial intelligence and machine learning should be used wherever possible to develop intents so that users do not have to respond to an inquiry verbatim to get their desired result (“yes”, “yeah”, “yup” for example). Finally, if the inquiry is still not understood you can revert to using numeric entry to get a match and keep on going.
Bonus – Understanding how your user is doing.
One additional step you can take in resolving all of the problems listed above, is developing an understanding of how your user is feeling. Machine Learning and AI advancements have created a number of services that can analyze user sentiment based on the tone of their voice and the words they are using. Intelligent voice systems can detect when a user is becoming frustrated with a conversation, and route that to either a more simplified conversation, or to a human interaction instead.
Voice truly is the next user interface, but there are many steps that should be taken in order to make the voice experience the best it can be for your users. By helping them know where they are, reassuring them that you understand them, helping them to know how to respond and being sensitive to their experience through the conversation we can create a truly delightful user experience.
Looking for help developing a voice user interface for your customers? Need help from an expert? Contact us today and we’ll be more than happy to discuss your problems and how best to resolve them!