To understand your customers, focus on tone, not just words

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Your contact center agent just informed a customer that their order will be delivered in 10 to 12 business days. “Oh, that’s great,” responds the customer. “Thanks for the help.”

Reading the text alone, this would seem to most of us a snapshot of a pleasant encounter, one in which the customer had an overall positive experience. We may even allow our interpretation of this encounter (and many like it) to justify critical business decisions, such as signing a new contract with the same logistics and shipping company that provided that 10-to-12-day delivery time. 

Now read the response again, this time focusing on different parts of the customer’s response. Imagine that “great” is long and drawn out, infused with sarcasm. Or that “thanks for the help” is stated quickly and curtly. Running through this exercise, we start to question whether the experience was, in fact, a positive one or whether our expected delivery time truly meets the customer’s expectations. We begin to realize that knowing what the customer said is only helpful if we also know how they said it.

For years, there have been major challenges in the field of speech emotion recognition. And in today’s customer experience context, it is more consequential than ever: one in three consumers now say that they would likely abandon their favorite brand after a negative customer service interaction.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

This has companies responding with a renewed sense of urgency, creating complex maps of customer touchpoints and rethinking their brands to maximize the customer experience at each opportunity.

In fact, a Deloitte study found that at Fortune 500 companies, customer service has become the fourth most popular core company value. Similarly, McKinsey’s 2022 State of Customer Care Survey found that customer care has become a strategic focus for most companies.

>>Don’t miss our special issue: The quest for Nirvana: Applying AI at scale.<<

As companies increasingly audit their customer experience, one of the key touchpoints they keep coming back to is that call to customer service. Most often handled by agents in a contact center, these calls represent one of the most common — and arguably the most make-or-break — interactions between companies and their customers. In fact, a recent Qualtrics report found that 60% of consumers prefer a phone call for difficult queries, often bypassing other channels to reach a real person.

As 58% of customer care leaders anticipate call volumes to increase over the next 18 months, they must begin to think about voice interactions beyond what customers are saying. They must begin to think about how they are saying it.

Measuring the customer experience

In the contact center industry, there are dozens, if not hundreds, of tools and KPIs to measure the customer experience.

While some metrics, such as average handle time (AHT) or first-call resolution (FCR), are based on quantitative metrics tracked by digital systems, others, such as net promoter scores (NPS) and customer satisfaction scores (CSAT) are most often the result of qualitative, post-hoc surveys.

Historically, these surveys have been used as the basis for making larger strategic decisions about how the organization interacts with customers, even influencing marketing, sales and broader business development initiatives.

But like any survey, these metrics are often inherently flawed, as they only tell us what the customer actively decides to tell us about their experience. Similarly, they are biased by how we decide to word questions, by the order in which we ask those questions, or by whether we offer customers an incentive to take the survey. Operationally, they are also often plagued by low response rates.

Combined, all of this can give a distorted picture of what the customer thinks about the business.

The benefits and pitfalls of text analytics

One way contact centers have responded to the weaknesses of survey-based customer insight is by implementing speech-to-text technology to transcribe call content automatically, then applying text analytics algorithms to better analyze the content of calls.

Speech-to-text technology is powered by artificial intelligence (AI) algorithms. These have “learned” from a database of hundreds of thousands of hours of speech recordings and corresponding text how to convert any speech recording (beyond the database it learned from) into a text transcript. Text analytics are often also powered by AI that has learned from millions of sentences to assign topic or sentiment labels to them.

Speech-to-text and text analytics can be valuable tools to uncover trends in how agents interact with customers and what customers themselves are saying about their issues with your product or service.

This kind of technology, however, while innovative in its own right, also suffers from disadvantages or inconsistencies. Apart from a sub-perfect accuracy rate — inherent in virtually all AI models —, it is often language- and accent-dependent, meaning that it is very difficult (and expensive) to scale across languages and struggles to accurately understand many accents.

Furthermore, the algorithms need to be trained to custom vocabulary and terms used by your institution, such as product and brand names. This process often requires a very high initial investment. When aiming to learn more about your customers and how they express opinions and experiences on calls, this can result in wildly inaccurate, and not to mention unfair, results. 

Perhaps the biggest weakness of speech-to-text technologies is that they are incapable of inferring that crucial element of speech showcased by our opening example: tone. In the process of being transcribed for linguistic analysis, the conversation has had some of the richest and most evocative components completely stripped out.

We like to think about this in terms of Albert Mehrabian’s 7-38-55 model of communication. Mehrabian’s study found that when listeners interpreted expressions of people talking about their feelings (i.e., trying to understand another person’s true emotional intent), the listeners would only rely 7% on the words, but 38% on the tone of the voice, and 55% on the body language, including facial cues.

While it might not be generalizable to say that tone is five times more important than words in every context, it does highlight the importance of the tone of the voice over the words themselves when decoding the true emotional intent of a message. We saw exactly this with the expression “Oh, that’s great” in our opening example.

What is tone?

Tone is the vocal medium through which we express ourselves in conversation. In other words, tone is the vehicle for emotion; it regulates how we converse with one another and how we communicate what we feel inside to the world outside. The difference between tone and “traditional” sentiment is that tone indicates how you’re saying something, while the sentiment relates to the opinion you have about a certain topic — that is, whether you feel positively or negatively towards it. 

Of course, there are other ways that we express how we feel, such as through body language and facial expressions. And while these can be rich indicators of sentiment and add depth and dimension to a conversation, several studies have shown that when compared to other modalities of communication, tone is actually the most accurate indicator of emotion and affect. 

Why does tone matter in the customer experience?  

While digital means of communication have been supercharged by the pandemic, companies have been communicating with their customers via phone for decades. And data shows that this trend will not let up anytime soon: In a recent survey, 61% of customer care leaders reported a growth in total calls, with increased contacts per customer and a growing customer base as the key drivers. 

When we communicate with one another in person, we combine all our various modes of expression — body language, facial expressions, tone, etc. — to get one comprehensive and holistic understanding of the conversation. This is obviously the preferable environment in which we can communicate.

However, this ideal form of communication is not always possible, especially for companies responding to thousands of customer inquiries each day from all over the world.

Since phone calls are often the only direct interaction that companies have with their customers, it is crucial that they understand as much as they can from customers’ voices. That means companies employing speech-to-text technologies as their primary channel of interaction analysis are missing out on a veritable treasure trove of knowledge about the service they provide to customers.

Tone analytics can reveal much of the underlying meaning of what customers are saying. For example, a case study of St. Clair Communications — a contact center providing support for patients with hearing loss — showed that agents using an AI-powered tone analysis tool experienced a 20% improvement in customer satisfaction, a 25% decrease in average talk time, and a 31% increase in conversion rate compared to agents not using the tool.

Taken as a whole, these numbers show that the ability to understand and respond to tone in real time provided benefits both for St. Clair Communications and for its customers.

Three ways leaders can use tone in the contact center

There are many ways that tone can impact the efficacy and authenticity of a business’s interactions with customers. These can either be direct (i.e., agents actively thinking about how their tone impacts conversations with customers) or indirect (i.e., embedding tone into training and hiring processes). Here, we suggest a few recommendations for leaders to use the power of tone for a stronger customer experience and healthier workplace culture.

1. Hire agents and supervisors for empathy and emotional intelligence

The tone someone uses when speaking with us can often be very overt; we know immediately whether they are happy or sad, frustrated or grateful. But tone can also be very subtle, requiring active listening and deliberate effort to derive meaning from it. While these traits may come naturally to some, they are certainly not the easiest skills to teach.

As Jonathan Brummel, Director of Enterprise Support at Zendesk, says: “Hire the smile, train the skills. I can train technical customer service skills all day long. But how do you handle a livid customer? What are you going to do when you have to get another team to understand a customer’s problem? Those skills take longer to train. They take intent, openness, and heart. If you don’t have the people skills, you can be right all day long, but the customer isn’t going to hear you.”

In a job consisting largely of human interaction and solving problems in real time for others, natural empathy and emotional intelligence are key.

Of course, this hiring practice should expand beyond agents alone. Supervisors can have a major impact on the way employees experience their work environment.

If agents come into a healthy, positive work environment where they feel they are cared for, that positivity will spill over into their calls with customers.

But the same goes for agents who feel stressed, tired, or burnt out. In recent months, workplaces have seen a rise in “quiet quitting,” the culmination of a years-long trend of decreased motivation, engagement and commitment at work. This likely contributes to why three in five contact center leaders list talent attraction, training, and retention as a top priority.

As such, just as leaders need to hire agents for soft skills such as empathy and emotional intelligence, this shows that they should also hire supervisors with these traits, as it can have spillover effects on the customer experience, workplace culture, and ability to retain agents.

2. Practice and embed tone in your contact center

Just because these soft skills can take longer to develop does not mean they should not be worked on; as with anything, practice makes perfect. The more agents are reminded of the impact of tone, the more it will remain top-of-mind in their interactions with customers.

Similarly, the more practice they receive — through training calls, real-time feedback and best-practice examples — the more natural it will become to keep an ear open for customer tone, as well as to regulate their own.

In his book Advice from a Call Center Geek, Thomas Laird recognizes the value of embedding tone in agent training from the start.

“If you have major issues with quality and delight,” he writes, “I suggest being more specific with what an associate can say (more scripted) and make them focus solely on their tone. The better they get with tone, the more freedom they get to go off script.”

Of course, going off script can create a much more authentic and personalized experience for the customer when executed well.

In this regard, keeping both customer and agent tone central to training will lead to more empathetic and authentic human interaction.

3. Think of speech analytics in the context of other measures

As we have explored above, technology is still a central pillar of the modern contact center. That being said, we know that relying on text analytics — one of the most common tools currently deployed in the industry — will create a rather shallow picture of how a customer-agent interaction actually played out.

Sure, we will be able to see what was said, but we will get very little insight into whether the customer had a positive experience (think back to our opening example). 

In an article entitled “From speech to insights: The value of the human voice,” McKinsey experts argue that without a comprehensive and holistic analysis of speech data, that data cannot be turned into actionable insights. The solution, they suggest, is “combining speech data with other customer or telephony data [to show] the full context of the call, which is often crucial to its meaning.”

Employing this strategy can help leaders make better sense of the qualitative insights drawn from text analytics. For example, deploying AI-driven tone analysis tools alongside text analytics will help businesses decode the true meaning of the words while still providing the rich data captured through speech-to-text and text analysis tools. To get a full understanding of the customer experience, both qualitative and quantitative measures must be collected, analyzed and cross-referenced.

The future of the customer experience

Remaining agile in the midst of global economic and geopolitical uncertainty is challenging almost every business, and driving transformation at scale on top of this is a massive undertaking. However, the opportunity is now ripe to gain market share through elevated customer experiences. Customers are looking to brands to deliver experiences as a key differentiator, and the role that a contact center plays in the overall customer experience is more important than it has ever been. 

72% of customers say it is harder to reach a real person now than it was at the beginning of the pandemic. So when they finally do reach that real person, it is crucial that their issues aren’t falling on deaf ears.

Leaders need to ensure that in their hiring practices, they are prioritizing candidates — both for agent and supervisor positions — who demonstrate exemplary interpersonal skills, with a penchant for empathetic communication. This will help to create a healthy work environment, which benefits both employee well-being and customer interactions.

Similarly, they must conduct regular training on both customer and agent tone to ensure that it remains top-of-mind for agents when fielding calls from customers.

And with tone being one of the most accurate predictors of human emotion and experience, AI technologies that can accurately make sense of it could offer a major competitive advantage.

As a main takeaway: Leaders simply need to avoid relying too heavily on any single metric, making sure to use any tools for drawing insights about the customer experience in concert with other industry-proven methods.

There is no silver bullet for securing long-term customer loyalty. But having the people strategies and intelligent technologies in place to understand tone of voice just may have helped us realize that a 10-to-12-day delivery time was not, in fact, “great.”

Anders Hvelplund is SVP, call centric BU and global services at Jabra.

Björn W. Schuller is CSO and cofounder at audEERING.

Florian Eyben is CTO and cofounder at audEERING.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers