What’s in a voice? It’s not just noise, but emotion, personality, intention and so much more. For this reason, experts across the country are developing and perfecting voice-banking technology, which aims to replace the standard voices in communication software with a person’s own, if they have temporarily lost their ability to speak.
One such person is John Costello, director of the Augmentative Communication Program at Boston Children’s Hospital, who drives the hospital’s message-banking work with children and adults. He defines message banking as a way to digitally record and store words, phrases, sentences, personally meaningful sounds and/or stories using the natural voice, inflection and intonation.
Voice banking, on the other hand, is a process of recording a large inventory of speech that is then used to create a synthetic voice that approximates a person’s natural speaking voice.
Pre-Surgical Planning Personalizes Voice Banks
Costello started working with pediatric patients in 1991, after learning that it was difficult for the hospital’s intensive care unit nurses to understand patients that had temporarily lost the use of their voices after surgeries.
By strategizing ahead of surgery, Costello can help these patients who need short-term access to a method of communicating their wants and needs.
“These patients don’t burst through the door of the emergency department and need urgent surgery,” says Costello. Thus, their care teams have time to work with them to “bank” phrases that express their short-term needs after surgery.
Patients want to do more with their voices than express pain or the need to be repositioned in their beds, he says. For example, one pediatric patient expressed concern about who was playing with her toys at home, so they recorded the message: “Is my sister playing with my toys?”
In addition to needs and concerns, adult and pediatric patients want to use their message-banked voices to also express gratitude to care teams or entertain friends and family members, Costello adds.
VocaliD Customizes Patient Voices
In nearby Belmont, Mass., Rupal Patel, CEO and founder of VocaliD, also works with clients to create customized voices. After witnessing a young girl and adult man communicating via the same voice from a speech-generating device, she decided to put her graduate studies on hold to help patients speak in their own voices again.
She and her company VocaliD create customized voices for patients using one of two methods.
The first method involves recording a client saying 3,500 specific sentences, which the VocaliD team uses to create a synthesized voice.
The alternative, an option for a client with far more limited voice ability, is to produce as much sound as possible; that sound is then combined with the voice of a donor who is approximately the same age and from the same area of the country to provide an approximation of the patient’s voice.
How Patients Interact with Voice Technology
Once the voice has been created and banked, how do patients use it?
Depending on their fine motor skills, some patients can produce voice by typing on a tablet or similar device with a speech-generating application, says Rebecca Lulai, a speech language pathologist at the University of Minnesota.
While speech-generating apps come with a default voice, patients using a customized voice developed by Lulai, VocaliD or another provider can upload that voice file into the application. The recordings of the client’s voice are converted into a synthetic voice with a software that uses concatenative synthesis, which divides and then rearranges pieces of speech.
“We’re cutting up every vowel and consonant, dividing them up into three or more pieces, and we’re actually able to manipulate each of those subphonetic pieces in doing the concatenation,” says Timothy Bunnell, principal research scientist and director of the Center for Pediatric Auditory and Speech Sciences, who developed the synthetic voice software alongside his colleagues at the Nemours Speech Research Laboratory in Wilmington, Del. “The resulting output sounds much more natural than the old automated systems [that cut up sounds at the word level].”