Microsoft's Cortana on a Nokia Lumia 635

Introduction

The infamous expression “Can you hear me now?” used to be reserved for frustrations in talking with another person. Now, it has been repurposed to be used with our devices. The advancement of technology and computing power has given birth to a variety of new applications for speech recognition and Natural Language Understanding (NLU) and projects from big to small are changing the way that millions of users interact with their devices. Here’s how.

What is Natural Language Understanding?

Speech recognition is the process of parsing speech to text without any context. It is now more efficient than typing, with 20% of all google searches now from voice and their failure rate has reduced from 24% to 8% in 2015. Meanwhile, NLU is the combination of speech recognition and artificial intelligence to understand the context of a conversation. Although NLU technology is still not perfect, there have been huge advances from how you may remember it. Let’s take a look at how these two technologies are reshaping our day to day.

How Natural Language Understanding Is Used Today

Speech recognition has existed for some time. However, it’s only in recent years that we have reached the advanced levels of computing power and the accumulation of mass data to needed to use it in a “human” way. The most common application of this technology has been with chatbots and digital assistants, where users engage with a “bot” in a familiar and conversational way. Both of these serve the same purpose, to communicate and react to queries like a person would and either respond to or assist the user with their request.

Chatbots are a service powered by a set of rules with the occasional addition of some artificial intelligence. While they can be created as standalone projects, they are often found integrated with existing chat platforms like Facebook Messenger, Slack, and even Skype. These have been successfully used for customer service, activism, and fighting parking tickets. Bots are typically limited to their platform and can only perform a simple set of tasks, however, they can be very powerful within those limitations (eg. chat bots can be your friends).

Digital assistants are the powerful sibling of chatbots, however, they are often a platform all by themselves (rather than relying on an existing platform). The most well known of these is Apple’s Siri, however, Amazon Alexa, Google Assistant, and Microsoft’s Cortana have joined the team, with Facebook’s Jarvis and Samsung Viv coming in 2017. These all perform the same conversational interaction as the more primitive chatbots. Assistants are also enhanced by the integration of other services and platforms. Most importantly, assistants can retain the context from existing conversations and your personal information (varies by product). For example, I may ask “What movies are playing near me this weekend?” and after receiving my options, I change my request with “hmm, what’s showing by mom’s place?”. This simple line of questions has required the assistant to understand the date range, location, and type of my request. It then determines who and where my mother is, then updates the existing query with that information. While that may sound simple to us, that is a lot to ask from a computer program. If it had been done with a traditional UI, it would have been a multiple step process and “mom’s place” would need to be input as an address by the user.

The ability to parse incredibly complex questions down to their base elements while retaining the context of the conversation is what makes digital assistants so powerful. Queries with more than a few steps are better still better left to a human and some bots intermix human responses with automated ones for more complex tasks. Samsung’s Viv has begun to explore solutions to this complexity by automating the creation of a sort of program within the assistant in order to answer complex questions or line of questioning.

Keep in mind, these are just two of the current applications for speech recognition and NLU. Others of note are spam filters, which have been using the technology since 1996. Google is now using it to translate languages based on their meaning, rather than words. Perhaps the most obvious are spelling and grammar checkers, which have also been improved by the technology (although not as fast as some of us would like).

Using Natural Language Understanding in Your Projects

Ready to get started on your very own project? Well, good news! All of the major speech recognition and assistant services have ways of adding support for 3rd party applications. Depending on the platform, this ranges from adding your app to the list of supported solutions of a query to creating an entire suite of actions and conversations that you can create. Below is a rough breakdown of what is currently available, what platforms they are available for, and what support they offer to developers.

Amazon Alexa is an assistant that runs on the Echo family and a variety of other devices, as well a web version for demo purposes. While there are companion apps for iOS and Android, they do require one of the devices mentioned to be used. Alexa is unusual in that it has no UI (with the exception of the companion apps) and interactions are done entirely with speech. With support for IoT and custom interactions through learned “Skills,” Alexa is one of the most powerful Assistants available.

Apple’s Siri is probably the most well known of the assistants, however, it is also the most limited in terms of developer support. SiriKit was recently released to allow limited support for your app within Siri and only actions that fall into their seven pre-approved “domains.” Siri is exclusive to the Apple ecosystem and is available for iOS, WatchOS, tvOS and macOS Sierra or newer.

Google Assistant is the new conversational replacement for Google Now, and it is currently exclusive to Google Home, Google Pixel and limited interaction on Google Allo. It is unknown if Google plans to bring their assistant to other devices and while Google Now seems to offer similar features, it is available with a limited feature set for iOS and lacks support for Google Home and Chromecast integration. Google provides support and tools to create assistant interactions with Actions on Google. If you are only looking to leverage Google’s speech recognition, Voice Actions can be implemented on any Android app.

Microsoft’s Cortana and is now used by over 100 million people a month, proving to be a huge success for Microsoft. In addition to all Windows 10 devices (including mobile), Cortana is available for iOS, Android, Xbox One and Microsoft Bands. Sometime in early 2017, Microsoft plans to expand Cortana into the internet of things and have partnered with Harmon & Kardon to create a smart home device. The Cortana Dev Center covers the number of ways that it can be integrated into and support 3rd party applications.

Facebook Messenger is a messaging platform with support for chatbots to be created and used within it, and well over 11,000 have already been made. Facebook’s official assistant “Facebook M” is also available through messenger and looks to be the precursor to “Jarvis,” their standalone assistant releasing sometime in 2017. Developers can take advantage of Facebook’s NLU with their wit.ai bot engine.

Twitter and Skype also offer support for chatbots within their platforms; however, they are the most limited in this list. Twitter currently supports the creation of automated experiences in Direct Messages, and Skype has full support for chatbots with the Microsoft Bot Framework. The real value of these two platforms comes from their support for the vast majority of devices, and both are currently expanding their support.

Don’t feel that you are limited to leveraging an existing service, although it may be a good idea based on your use case and need for easy access. If you do see value in going alone, there are many APIs that can be leveraged to create your chatbot, assistant or something entirely new! Here are just a few of them, many of which are used or supported by the assistants and chatbots mentioned above.

What Does the Future Sound Like?

If there is any concern that digital assistants and other services powered by NLU services will not succeed, I would look to the development of future technologies before they are dismissed. The emergence of Virtual Reality comes with the need for new and unique input methods, with speech at the top (and most intuitive) of that list. Our homes are continuing to get smarter and smarter, and many of these devices do not need or have room for a visual interface, perfect for speech recognition. Self-driving cars are just around the corner, with other traditionally human jobs being replaced with them. It will only feel natural to speak with these human replacements, rather than navigate a complex UI to build a set of instructions for them. Even medical and emergency services will be drastically improved by “always listening/watching” presence in your home, alerting 911 if you fall, choke or worse.

Conclusion

It’s time to get excited, because the future is here. Alexa is already being used to solve crimes and invisible relationships are on the rise. The spoken word is the most natural way for humans to communicate, with written word being a relatively modern creation. This technology will liberate us from being tethered to our devices, perhaps at the cost of listing to us at all times. But one thing is for sure, the opportunity to innovate is there for the taking!

Image from Flickr user n.bhupinder.