Designing the User Experience of Voice

04 April 2021

Over the last year, voice-based interfaces have become increasingly popular. Virtual assistants like Siri and Cortana have been around for a while, and recently Amazon and Google have made their own contributions to the industry’s ever-expanding technologies. It’s estimated that over 60 million Americans use a voice-enabled virtual assistant at least once every month, and more than 35 million Americans use a voice-enabled virtual assistant home hub like Amazon Echo on a monthly basis.

Clearly, audiences are highly receptive to voice activated devices and are becoming increasingly reliant on them. Voice-based interfaces may have been little more than an entertaining but largely useless novelty a few years ago, but today, they are time-saving, multitasking tools that allow users to set timers, order takeout, make shopping lists, adjust the thermostat, and more, all through natural voice commands. Keep reading to learn more about how to ensure that your voice activated devices or apps provide a great user experience.

What are voice skills?

Intelligent voice-enabled personal assistants like Alexa and Cortana provide users with an intuitive, personalized experience through built-in capabilities, or skills. Developers can expand on these capabilities by building and publishing their own voice experiences for their own devices and apps. These skills are then available to users on devices like Amazon Echo and Google Home.

What Does a Good User Experience Sound Like?

Creating a new voice skill requires more than technical know-how, but also a good conversational UI. Consider this: a website needs to provide a user-friendly experience; after all, if a visitor has to work too hard to find the information they’re looking for on a website, they’ll simply look elsewhere. Likewise, a voice skill design should also be built around providing a frictionless, positive experience.

A good user experience follows a pattern like this:

  1. The user says the activation word or phrase
  2. The app or device immediately detects the activation word or phrase
  3. A command or question is stated
  4. Prompt, relevant feedback is given

While voice interaction is quite different from graphical user interfaces, the quality of the user experience still hinges on usability, flexibility, and efficiency. A good user experience should be seamless and intuitive, providing the desired outcome in as few steps as possible.

Creating a Positive User Experience

When creating a voice skill, consider how you can make accommodations for users’ patterns of speech and regional accents, common speech impediments, and varied word usages. After a command is given, how does the skill confirm the command? If the skill goes down the wrong path, is it easy for the user to course-correct?

As with website design, one of the most important elements in creating a positive user experience is error prevention. This is the idea that it’s not just important to help users recover from an error, but that the system should be designed to prevent errors from occurring in the first place.

Common Pitfalls to Avoid when Creating Voice Skills

Here are three of the most common mistakes made by voice application developers and publishers that undermine the quality and success of their app.

Failing to consider whether a voice interface is the best medium for the task. Is the task truly simple enough for a voice command, or would the user have difficulty understanding the feedback or achieving their goal without visual aids?

Failing to properly deal with interruptions. If a user is interacting with a skill that has a longer response time, they may become impatient and say the activation word or phrase again. For example, say that a user asks, " Alexa, what time does Flight 8922 land?" They may not receive a prompt response, causing them to blurt out “Alexa,” which terminates the previous process and leads to greater frustration. To avoid this, consider building in a functionality that enables the app to ask if the user still wants to complete the previous process.

The response is too wordy. Unlike text, the response that the user receives can’t be skimmed; it has to be listened to in its entirety. If the response that the user receives is too verbose, they may abort the process or simply fail to glean the important information. Make sure that the given responses are as concise as possible.

Adding Value for Users

As the landscape of skills broadens, voice will become our primary way of interacting with machines. The most successful voice-controlled apps add value to users’ lives by providing prompt, concise feedback, avoiding common pain points like misunderstood commands and irrelevant responses. By ensuring that your voice activated devices and apps follow human interaction design principles, you can create an effective conversational UI experiences.