This is one of the coolest inventions I’ve heard of, if it works:
Imagine sending a birthday song recording to a friend "sung" by you without the embarrassment of recording yourself singing. Imagine a movie in which the characters speak Hebrew, with the voices of Arnold Schwarzenegger or Robin Williams, without these superstars having learned the language.Imagine entering an audio Net chat room where you can choose not only your own persona, but also a unique voice for that conversation. A man could use a child's voice, a woman a man's voice, and you could say whatever you wanted in your own natural voice - while on the other side your words would sound as if spoken by someone else entirely.
According to the vision of Shlomo Baruck, founder and CEO of startup VIR (Voice Imitation and Recognition), we won't have to wait long for such a reality. VIR has developed technology that enables these scenarios. And given that it recently signed agreements with several cellular content providers, we're not too far from the day when these services are widely available.
… The processing technology makes it possible to combine what one man said with another man's voice, thereby producing a perfect imitation of the person whose voice parameters are being used. Therefore, Schwarzenegger could speak perfect Hebrew, or any other language for that matter. All that's necessary is to combine all of Schwarzenegger's voice components with text spoken by someone else speaking Hebrew. The system requires only a four-second sample of Schwarzenegger's voice, which is analyzed within a few minutes. From that point on, the system could immediately combine Schwarzenegger's voice with any text.
I don’t believe that it’s “perfect” – I’m sure a careful analysis could tell them apart – but it doesn’t need to be. And the applications aren’t just cosmetic:
VIR's technology also has another application - reducing surrounding noise heard when speaking on a cellular telephone. For example, when there is background noise from a passing truck or even a car's air conditioner during a cell phone conversation, a component in the phone will recognize the sound as noise, not voice. Within three seconds, and for the rest of the call, the noise will be shut out and will not be transmitted.
Reducing noise, in general is a huge engineering problem in many applications. The problem is that it’s often difficult to distinguish signal from noise. If this technology can figure out which tones go with a particular voice, it might be able to do it.
UPDATE: It occurs to me that this will result in all kinds of mischief, as a result of impersonations, so I updated the title of this post.
Posted by David Boxenhorn at August 6, 2004 02:28 PM