One of the apparent issues when understanding natural language is that the structure and syntax of spoken language that we all understand intuitively often needs to be broken down into many different sub-components before they can be “understood” by machines. That means the evolution of machine intelligence has been slower than many hoped because of the need to figure out the incremental steps necessary to really make sense of a given request. Even today, some of the most sophisticated natural language AI models are running into walls when it comes to doing any kind of simple reasoning that requires the kind of independent thinking that a young child can do. On top of this, when it comes to smart home-focused devices—which is where voice-assistant powered machines continue to make their mark—there has been a frustrating wealth of incompatible standards that have made it physically challenging to get devices to work together.
The Alexa Voice Service (AVS) SDK 3.0 debuted to combine Alexa capabilities with the previously separate set Alexa Smart Screen SDK for generating smart screen-based responses. Using this would allow companies to potentially do things like have a voice-based interface with visual confirmations on screen or to create multi-modal interfaces that leverage both at the same time. Unfortunately, browsing through multi-level screen-based menus, pushing numerous combinations of buttons, and trying to figure out the mindset of the engineers who designed the user interfaces is still the reality of many gadgets today. I, for one, look forward to the ability to do something like plug a new device in, tell it to connect my other devices, have it speak to me through some connected speaker to tell me that it did so (or if it didn’t, what needs to be done to fix that), answer questions about what it can and can’t do and how I can control it, and finally, keep me up-to-date verbally about any problems that may arise or new capabilities it acquires. As these new tools and capabilities start to get deployed, the potential for significantly easier, voice-based control of a multitude of digital devices is getting tantalizingly closer. Bob O’Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech.