Speech Recognition
Speaker-independent speech recognition over the telephone is now a commercially viable solution for automating routine call centre transactions. This technology can result in significant cost savings while also improving service levels, and can be used by anyone with access to a telephone, anywhere, without any special training or equipment.
To make sense of the technology it is helpful to make two distinctions:
- Speaker dependent vs. speaker independent recognition.
- Basic Interactive Voice Response (IVR) vs. more sophisticated Automated Speech Recognition (ASR) and Natural Language (NL) systems.
Speaker dependence vs. speaker independence
Speaker-dependent systems only recognise a particular individual's voice, after fairly extensive training. The most familiar application of this technology are dictation products from suppliers such as Dragon and IBM. Dictation products are gradually becoming more acceptable but are unlikely to become widely used until it is possible to use continuous speech without prior training. Another application is voice verification, where the unique physical characteristics of a particular voice are used to authenticate an individual. This is highly relevant to banks and is the subject of another white paper on this website
Speaker-independent speech recognition technology allows any person, with any accent or dialect, to communicate with a computer using continuous speech, a large vocabulary, and increasingly natural language patterns. This technology can be used with high quality microphones in PCs or kiosks, but its most exciting application for banks and other financial institutions is to enable routine transactions over the telephone.
IVR vs. ASR/NL
Intercative Voice Response (IVR) has been around for some time and can be regarded as the "plumbing" of a modern speech recognition system. Vendors such as Periphonics, Syntellect and Intervoice supply industrial strength systems for call centres which handle advanced telephony and CTI (computer/telephony integration) features such as call direction, load balancing, and screen popping, as well as touchtone (or DTMF) selection from a menu of options, rudimentary recognition of a few words such as single digits and yes/no, and automated speech generation. IVR technology is used by many banks (eg NatWest's Actionline) and works reasonably well, but leaves a lot to be desired and will never be accepted by a large majority of the population.
Automated Speech Recognition / Natural Language (ASR/NL) technology from vendors such as Nuance is far more sophisticated. Because it operates at the level of phonemes rather than words, it is possible to recognise huge vocabularies, and sophisticated algorithms now allow customers to ask questions, make commands, and engage in a dialogue using increasingly natural, continuous speech, albeit in a limited subject domain.
The significance of speaker-independent automated speech recognition
Most banks, already saddled with costly branch networks, have built large call centres to handle telephone enquiries and transactions. These call centres are proving to be a victim of their own success - call volumes are growing rapidly and this means growing costs (on top of the branch costs) since good telephone operators are expensive. But up to 80% of call centre transactions are simple, routine transactions such as balance enquiries, funds transfers or pre-authorised payments which can easily be automated using the new generation of speech technology. Moreover, even complex transactions start with up to a minute's worth of routine identification and authentication of the customer, which can also be automated. The bottom line is massive potential cost savings for banks.
Of course this raises the question will customers accept talking to a computer rather than a human being? The answer seems to be definitely yes provided the dialogue is well designed. In fact many customers actually prefer automated facilities - the service is available 24 hours a day, there is no waiting for a call to be answered, the transaction can be accomplished quickly, in a streamlined manner, with no potential embarrassment.
The important point about this technology is that it can be used by anyone (or at least anyone with access to a telephone), anywhere, without any prior training or special equipment.
A word of warning however. As with any new technology, it is the human factors surrounding the systems which at the end of the day determine success or failure. Good dialogue design is critical, and many other issues need to be carefully considered, such as customer authentication, interaction with other delivery channels, and links to host systems.