Chatbot provider as voicebot provider? 5 reasons why it might not be the best idea

Voice is the first means of communication that we learn to use as humans and it prevails as one of the most important ones despite the growth of texting channels like SMS and WhatsApp. This need for the use of speech is also reflected in the conversational AI industry and more and more companies are investing in voicebots, including chatbot providers.

Despite this tendency, the question remains what impact creating voicebots based on chatbots may have on the quality of the human-bot interactions and what (dis)advantages they may bring when compared to voicebot providers.

In the following sections we will shed light on these questions and highlight the top 5 reasons why choosing a chatbot provider as your voicebot vendor might not be the best idea.

Below you can find how this article will be structured:

  1. Overview: chatbot providers entering the voicebot business
  2. Why choose a voicebot over a chatbot?
  3. How to change a chatbot into a voicebot?
  4. Why is a voicebot based directly on a chatbot not a perfect solution?
  5. ChatGPT-based voicebots – ins and outs
  6. Conclusion: chatbot vs. voicebot providers

Overview: chatbot providers entering the voicebot business

Phones are still our allies when it comes to contacting customer support as they are perceived as effective channels of communication to solve any issues. But there is a catch: In practice when someone calls, the probability to wait in the queue is high, especially during peak seasons. 

This can lead to frustration, which only aggravates the emotional state of the caller and their perception of the brand. In order to avoid this situation, voicebots can be used as a solution to automate inbound and outbound calls and offer 24/7 availability and immediate response time

It is no secret that chatbots paved the way for the integration of voicebots in business communication. Chatbots have been around longer and have gained the trust of customers who wish to have 24/7 service availability and therefore get fast answers to their queries. Their simple integration into any device that uses chat as a means of communication, which can be smartphones, laptops, tablets etc., has been one of the many perks for both users and companies.

However, voicebots are becoming increasingly popular in the conversational AI industry and key players from the chatbot-industry are stepping into this field. 

Why choose a voicebot over a chatbot?

But let’s explain what is essentially the reason why chatbot providers tend to transform their solutions into voicebots. 

Some reasons for adopting this innovative technology are the following:

  • Phone as one of the main communication channels: Most businesses are contacted via telephone and according to a Salesforce study around 59% of customers use the phone to contact call center agents. 
  • Faster interaction: What stands out for calls instead of text-messages is the synchronous communication which allows users to solve urgent matters in no-time. Speaking is also more intuitive and usually faster than typing a text-message which is user-friendlier.
  • Hands-free conversation: Some people might need to communicate with a company without the use of their hands while multitasking or due to some disability. By incorporating a voicebot into their communication process companies can address the needs of those audiences too and be more inclusive.
  • More natural and human-like communication: Hearing a voice at the other end of the call gives a human-touch to the interaction, which consumers don’t get writing to a chatbot.

The demand for such solutions is increasing and the voicebot market is expected to develop at the quickest rate of 21.30% between 2022 and 2027 for customer service communication, which is the most requested area for conversational ai tools.

How to change a chatbot into a voicebot?

But how are chatbot providers switching their modus operandi? At first one may think that there are not that many differences between chatbots and voicebots but the reality is: there are more implications to take into consideration. 

Companies who have dedicated their expertise to a text-based solution for the last years create voice assistants that are based on textual characteristics turning text into speech with the assistance of artificial intelligence models such as ChatGPT. This methodology may be problematic, as we explain in the upcoming sections.

Why is a voicebot based directly on a chatbot not an ideal solution?

But let’s proceed to the core topic: what are the ins and outs of building a voicebot directly on the basis of a chatbot and what are the top reasons you should reconsider this idea. Let’s look through them!

Reason #1: Response time

Writing with a text chatbot is typically an asynchronous communication – we don’t expect the machine to give us an answer immediately and we are willing to wait for it for a (shorter or longer) moment. The situation is completely different with a voicebot – here, any noticeable pause preceding the answer makes the caller feel uncertain or impatient. Unlike chatbots, where users can quickly scan and scroll through text, voicebot interactions demand concise responses due to the natural pace of spoken communication.

It is estimated that a voicebot that starts answering in less than 0.7 seconds after the question ends, sounds like a human; if it is between 1 and 1.5 seconds – like a competent machine, and above 2 seconds – like a slightly less competent bot. 

When designing for voicebots, it's crucial to deliver clear and succinct answers, as users process spoken information differently than written text. This difference underscores the importance of specialised conversational designers who understand the nuances of human speech and listening patterns.

If we directly use the chatbot software to create a voicebot, the problem of longer response times will still persist. The reason for this is that chatbot technology isn’t designed to solve the problem of synchronous communication. Therefore connecting the chatbot to the voice will bring very non-human-like interactions.

Reason #2: Paraverbal components

A voice chat is significantly different from a text chat not only in terms of response time, but also in terms of paraverbal communication. A chatbot offers only text, in best-case scenario sprinkled with icons.

A voicebot, on the other hand, apart from text, has a whole range of means of expression at its disposal – which not only can, but should be used to make communication complete and natural, such as: appropriate intonation (so that the speech doesn’t sound “mechanical”), pauses at appropriate moments, the pace of speech or tone of voice expressing emotions (e.g. when talking about something serious, voicebot should use a different tone than when announcing happy news). 

Modern voicebots have very natural voices, which, in addition, can even include such means of expression as breaths or nodding grunts. Not to mention that a voicebot that speaks different languages must also adapt to local accents! 

This also works the other way: a voicebot has to understand a human who not only writes, but also speaks – often indistinctly, quietly or with an unusual accent. (On this occasion it is also worth mentioning that there already exists a technology allowing voicebots to recognize human emotions, although for a number of reasons – including ethical ones – it is not widely used).

What is the conclusion from this? Well, in order to create a really good voicebot, it needs to be equipped with properly functioning paraverbal means of expression, which is a very demanding and complex task, usually handled by voicebot designers.

Reason #3: Complexity of the script

A chatbot – just like a voicebot – works on the basis of a scenario, i.e. a framework plan for a conversation, taking into account all likely questions and answers, as well as many different ramifications of that path. 

And here the fundamental advantage of a voicebot, which was designed from the beginning as just a voicebot, becomes apparent: since a voice chat is more dynamic, its scenario is usually much more complex and less linear, and takes into account more potential contingencies. 

In written chat mode people talk differently than when they speak. This means that even a flawless chatbot scenario with LLM power will sound unnatural if it just gets converted to voice “as it is”. A voicebot built directly on the basis of a chatbot scenario will therefore be limited and won’t be able to successfully resolve many customer issues. 

As in the previous case, creating a good scenario for a voicebot is resource-intensive – requiring many hours of work done by experienced conversation designers. 

Reason #4: Control over the conversation

When talking to a chatbot, a human asks it a question, and the chatbot starts generating an answer, which is presented in full after a while. In the case of a voicebot, the logic is a bit different: the machine starts talking almost immediately, but the caller – if they decide that this isn’t what they meant – can interrupt without waiting for the machine to finish speaking. 

And the voicebot will respond: it will cut the speech and listen to the customer. As a result, the latter has more control over the course of the conversation and can settle their matters more quickly. While creating a voicebot directly based on a chatbot, must additionally teach it to listen to customer’s words throughout the conversation and respond immediately. 

Reason #5: Telecommunications experience

Lastly, one further good reason to choose a voicebot built from scratch is the possibility to find a provider with experience in the telecom industry.

A voicebot works in close cooperation with and relies on telephony service providers. Therefore a company which specializes in voicebots from the very beginning has a certain advantage – it can provide customers with the best rates and technological and business experience in cooperation with telecoms. 

ChatGPT-based voicebots – ins and outs

ChatGPT is one of the most well-known chatbots – it seems to know the answer to any question a user asks. It is established on so-called large language models (LLM). The model is learning from huge text datasets – and on this basis matches answers to queries. ChatGPT can also be used by a chatbot provider to create a voicebot – in this case, using speech-to-text and text-to-speech technologies, text data is converted into voice. 

A voicebot created this way can hold a very casual conversation with the user, and can take into account many different intentions. However, there are three major pitfalls hidden here. What are they?

First – since a voicebot based on ChatGPT knows the answers to many different questions from virtually any field and its ingenuity is not limited by anything, the conversation has a high chance of running into a dead end and its purpose blurring into numerous digressions. 

And yet a good voicebot is designed to solve the caller’s specific problem (saving both the user’s and the company’s time), not to chat endlessly with the caller, tempting though it may be. 

The second issue is the reliability of the answers provided – since ChatGPT is fed with huge amounts of data from various sources, it is virtually impossible to control this information and errors can occur. 

The answer to these two issues is to wisely limit ChatGPT’s unbridled creativity by forcing it to return to the topic of conversation and adding a database of correct answers, which is commonly used in voicebots based on this solution. 

The third problem, which currently, from the point of view of technology, doesn’t have a good solution, is the delay in providing answers by ChatGPT, related to the need for it to “comb” through the entire extensive database of information. While this doesn’t matter much in the case of a text-based solution, in the case of a voice conversation it can significantly disrupt its fluidity and annoy the caller in the long run.

Conclusion: chatbot vs. voicebot providers

Investing in a fast-growing automatization solution such as a voicebot requires extensive research which takes time and energy to do. Although it may seem appealing to choose a chatbot provider with a lot of experience and good reputation in the conversational AI market, on the other hand it is important to keep in mind that the mode of communication of chatbots differ significantly from those of voicebot vendors that create voice assistants from scratch.

Some advantages of voicebot vendors, such as the expertise in oral human-bot interaction, paraverbal factors and the telecom connectivity, stand out and are decisive for the creation of top-notch products.

Co-author of the article: Sofia Carvalho e Pereira.

