Nivedit Majumdar Nivedit Majumdar

Voice + AI = The new frontier in user interaction?

For a lot of time, voice has been the go to medium for enabling digital assistants to work for us. Right from the basic Google voice powered searches that caused quite a stir some years back, to devices that are always listening for a keyword to deliver to your commands – voice enabled user interaction has definitely come a long way. Which brings about an important question: Given that Artificial Intelligence and Context are evolving at gargantuan rates, what will be the next big things in voice enabled user interaction? Add to the table the heavy investments that Google and Apple are making in this area, and we’ve got ourselves a booming space for connected devices.


A couple of weeks back, I went for a Sunday lunch with a few friends of mine, one of whom had brought his four-year old daughter along. While the grown-ups were doing their own thing, she was getting bored and wanted to watch some cartoons.

What did she do? She asked her dad for his phone (which he unlocked for her), said “OK Google” and asked the phone to download a media streaming application. Within a few seconds, the Play Store page for the app opened up automatically, and she had already begun downloading it.

Obviously, she needed her father to finish setting up the application later. But the point I’m trying to make is this: a four-year old child can now perform tasks which would have otherwise required a user going to the Play Store, manually typing in the app’s name and downloading it – all because the child knows the verbal keywords to get the job done.

And that’s really the power of voice enabled interactions. It’s designed to be smooth, simple, seamless and effective. Therefore, it shouldn’t come as a surprise that Amazon Echo sales are growing at steady rates, more people are adopting to voice applications such as IFTTT, and the total number of voice-enabled speaker users are estimated to grow to about 130% this year, as compared to 2016.

Needless to say, the numbers in this sphere depict a very good picture of where things are headed in the voice-enabled-assistant space.


Back in June, Adobe released some very informative e-commerce sales based data, wherein the market for voice enabled devices is certainly growing in potential. For instance, online sales of voice devices grew by 39% year-over-year, with a peak seen during the 2016 holiday season.

Another interesting statistic was that of the number of active connections to voice assistants on IFTTT: the figure as of June stood at 778,000 – which grew 6.4 times in 2016 and grew 4.3 times as of June 2017.

eMarketer states that while we are still far away from virtual adoption, the growth is pretty impressive to say the least, with 35.6 million Americans expected to use a voice-activated assistant device at least once a month this year – which is a 128.9% increase over the numbers from 2016. The lion’s share of the pie will go to Amazon Echo devices (accounting for 70.6%), followed by Google Home and other players such as Lenovo, LG, Harman Kardon and Mattel.

The growth will mostly be driven by millennials, who will be accounting for about 30% of the user base in 2017. The numbers are estimated to grow to about 40% by 2019.

Finally, Gartner predicts that the end-user spending for assistant-enabled speaker devices will reach $2.1 billion by 2020, up from $360 million in 2015. Major factors for this would include maturing of speech to text conversion engines, smaller form factors, a reduction in prices and potential subsidization models.

All in all, the consumer market for voice enabled devices is certainly up and coming, and consumer interest in voice assistants and applications might just be the impetus needed for the IoT to become more interesting.


While we’re on voice enabled devices, can we really ignore Apple and Google?

Apple’s AirPods might be seen as a piece of expensive, unnecessary accessories, but the truth is that they offer the foundation for enhanced voice enabled interactions. Siri is now two taps away, the music pauses when you take the pods out of your ears, there is seamless integration with Apple devices.

I’ll quote David Pierce here, who in his review of the AirPods on Wired, states that

“[The AirPods] have the potential to be the kind of project that goes from accessory or hobby to critical piece of Apple’s future. The AirPods, above all else, are Siri machines. “

Even more impressive (in my opinion), is the rival to Apple’s AirPods: Google Pixel Buds. The obvious winning feature in this case is the inclusion of real-time translations directly in the ears of the user.

Think of the interpreters in the United Nations, but in a more handy package. According to the various reviews, Google Translate sounds more human, owing to which people can communicate in different languages while not having to resort to staring at the screen of a phone. And isn’t that the final purpose of user interaction – make the communications as human as possible?

Finally, worthy of mention is the Google Home, which although raised a few privacy red flags right at the time of its initial sales, seems to be a promising device to counter the likes of the Echo lineup from Amazon.


I think that Voice based interactions are only going to get better in the near future. Devices are already becoming cheaper, and technology is improving like never before – a good example of which the real-time translation in Google Pixel Buds.

With the inclusion of context based inputs, and more non obtrusive form factors (a number of jokes were cracked at the AirPods’ design, but people did come back for the useful features), we are looking at technology for better wearables, and more powerful accessories – including Echo devices in cars.