Introduction: Past and present of Speech Tech

Perhaps the most common misconception regarding speech recognition technology concerns its age. The introduction of Siri in 2011, quickly followed by her competitors Cortana, Alexa and the less glamorously named Google Assistant, has catapulted speech technology into the mainstream consciousness, despite the technology being anything but nascent.

For that, you can thank power increases in modern GPU’s, which have provided the foundation for tech giants such as Amazon, Apple, Microsoft, Samsung and Google to house their own virtual personal assistant.

While the tech giants’ investment in voice interaction/command speech recognition has made certain aspects of our day-to-day lives – from navigation to structuring our calendars – more convenient, the true disruptor of the consumer market will be conversational speech technology that adapts to the users and functions across different devices.

“Maintaining accuracy across various languages and dialects and its ease of integration and operation are immediate challenges, but it is convincing users that speech technology is accurate and reliable that remains its biggest challenge.”

Google’s recent, impressive display of their AI’s ability to accurately mimic and adapt to human conversation, with umming and ahhing included, is a glimpse into our speech tech-driven future.

Whether it’s having an AI make calls on your behalf, dictating emails in cars and being able to finish them at home whilst cooking, speech technology is primed to disrupt every industry, saving businesses and the everyday consumer time and money, while adding a layer of convenience to our daily lives. Much like the introduction of the Smartphone or world wide web, this will mark a point of no return in how we live our lives.

While speech tech will impact a multitude of industries, there are two that will perhaps see the most immediate and significant change in the coming years: retail and health.


While Google continues to work on its efforts behind the scenes, speech recognition technology has become ubiquitous: 65{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of US Smartphone owners employed voice assistants in 2017, up 20{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} on the previous year, and by 2020, it’s predicted that 50{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of all searches will be done by voice or through image search.

The latter is arguably the most important, as it holds the greatest potential to disrupt multiple industries, especially retail.

With a third of computing due to be screenless by 2020, visibility, shelf placement, design, packaging and branding will all need a swift reexamining.

With the exception of the minority that have existing brand loyalty, searching for products online will be dictated by their type, and less so by the brands themselves, prompting business to completely rethink their marketing and advertising strategies, all while monitoring and anticipating the most popular voice searches to ensure that their products can appear at the top of searches.

SEO professionals will need to quickly evolve as the influence and prevalence of voice technology on searches continues to grow. We could be a decade away from Google’s search box finding itself subservient to voice and in-app searches.

The extreme granularity of voice searches will significantly challenge SEO marketers to ensure that their business’ are remaining relevant and attractive to search engines

For the customer, it means that selecting multiple filters to choose a white t-shirt in a medium size between a $20 to $30 price range with next-day delivery are replaced by a single voice command.

Speaking of the customer, the user base of online shoppers is set to become even greater. While smartphones, tablets and shopping online have become ubiquitous, there is an entire generation that technology forgot. Only 42{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of over 65s in America own a smartphone, and just 32{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} a tablet.

While the socio economic impact of the aforementioned devices is unquestionable, they require a large amount of education.

However, we don’t need to be trained to speak or engage in conversation and ask for what we want above and beyond what we already know. This means that anyone, irrespective of their age or understanding of modern technology, can use voice technology to its fullest capacity.

With the technology continuing to develop at this rate, we’ll see a future without brands before long – something that was almost unimaginable even as recently as then years ago.

While that reality is still some years away, forward-thinking companies would be wise to start thinking about how they will implement voice tech for shopping, and that will likely involve leaning on third parties to provide the technology for them.


Arguably the most important industry to benefit from the intervention and advancement in speech technology is health. Through what’s known as ‘sentiment analysis’, we could potentially measure the emotional state of individuals, analysing the volume and tone of one’s voice and from that, deducing whether the person needs support with mental health issues or finds themself in danger.

Another future application of speech technology includes the ability to measure the progress of Alzheimer’s via vocabulary complexity in real-time and allow physicians to adjust medication dosage.

WinterLight Labs, for example, are working on ground-breaking technology that is able to detect whether you’re suffering from Alzheimer’s in as little as 45 seconds, with an accuracy that’s currently over 80{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901}.

Speech tech will greatly enable medical professionals to predict the signs of invisible illnesses, such as brain traumas and PTSD, earlier than ever before


For example, individuals with problems with their cognitive function may elongate certain sounds and struggling with pronunciation that involves specific muscular movements. Medical professionals have already begun small-scale research on speech tech’s benefits in this field, with Charles Marmar, a psychiatrist for over 40 years, collaborating with nonprofit research institute SRI; their research showed that they were successfully able to distinguish between patients with PTSD and healthy patients with an accuracy of 77{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901}.

Stumbling Blocks

While the aforementioned efforts of Google et al have provided an exciting glimpse into the future, speech technology has hurdles it has yet to overcome.

Maintaining accuracy across various languages and dialects and its ease of integration and operation are immediate challenges, but it is convincing users that speech technology is accurate and reliable that remains its biggest challenge.

A recent report by PwC found that 8{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of 18 to 24-year-olds said that they only used a voice assistant a few times per year. Naturally, older demographics fair slightly worse, with only 6{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of those aged 25 to 49 and 3{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of those over 50 using a voice assistant a few times a year, while 23{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901} of the youngest demographic stated that they use their voice assistant less often than when they first started.

With that having been said, we are, without question, hurtling toward a speech revolution.
Pioneers in the field of speech tech can achieve a word error rate of under 10{87a18df7a28eb56c6a7dc02e4e1a3d322672f7d5de2b418517971f2bf2603901}, with the technology on the cusp of disrupting a multitude of industries and opening doors we didn’t previously think possible.


0 0 votes
Article Rating