Language Weaver: fast in translation

How one firm quickly translates reams of data.

John Kehe/Staff

October 1, 2008

If you want to text message your Spanish-speaking neighbor, but don’t know how to say “Please turn down the radio” in that language, you could find a quick translation online at any number of websites. But, if you are, say, a large semiconductor company with customers around the globe, you are in a pickle if all your support data is written only in English.

Enter Language Weaver, a Los Angeles-based firm on the cutting edge of a rapidly growing field known as machine translation (MT). The firm took one chipmaker’s extensive database and translated it overnight into Spanish, the No. 1 tongue in demand by that company’s customers. This task, says the company’s CEO Mark Tapling, would have taken weeks to accomplish not too long ago. Instead, its software made short work of a gargantuan task.

The $100 million MT industry has the potential to grow by more than 50 times that number, some analysts estimate. “Language Weaver is a leader in this field,” says Don DePalma, chief research officer with Common Sense Advisory Inc., who specializes in the somewhat arcane world of computerized translation services.

This may seem like a yawn-producing competition among geeks, one that transpires beyond the purview of most people’s concerns. But in fact, say industry watchers, making swift, high-volume, global communication possible is quickly moving up the to-do list of those who conduct international business deals. For instance, what happens to a nuclear power firm doing business in remote parts of India with no ability to hand over documents in the proper local dialect?

“The ability to translate lots of information quickly is becoming one of the important concerns of a global economy,” says Mark Przybocki, computer technologist and MT team coordinator with the National Institute of Standards and Technology, in Gaithersburg, Md. “Especially when you consider the huge amounts of information accumulating on the Internet.... Effective machine translation is becoming more important every day.”

Just what constitutes “effective” MT is a source of lively debate among a small but growing number of linguists, mathematicians, and computer specialists who dominate the field. Since the 1980s, the MT field has consisted of three approaches: rules-based, in which programmers entered up to 20,000 grammatical rules to direct the translation; example-based, in which discrete examples serve as guides; and statistical, in which “smart” computer algorithms “learn” from previous translations and develop their own guidelines.

The first two approaches were dominant until the turn of the century because the statistical method required so much data from which to “learn,” as well as massive amounts of processing power to search and cull its protocols, and enough memory to retain the information. But the statistical approach became more viable as computing power began to accelerate and memory capacity grew more affordable.

Language Weaver grew out of what Kevin Knight, one of the company’s cofounders, calls a “watershed workshop” in 1999. His team discovered that the translation protocols developed for one language could move seamlessly to another without having to start over from scratch with each new tongue. The group’s work enabled it to nab all-important research funds, and within two years, the commercial venture began. Today, Mr. Knight sits in front of his computer looking at a translation program for Chinese that is capable of processing some 100 million directives.

But this would not be cutting-edge technology, however, without some disputes. Chief technology officer and cofounder Daniel Marcu has T-shirts to prove it. One reads, “I lost the syntax bet,” another says, “I won”;  he alternates them depending on how the arguments go. This refers to a wager between his team and a former colleague who now runs the free translation service at Google. Mr. Marcu has maintained that the system will still need grammatical rules no matter how much a statistical system is able to learn from previous translations, while the other side believes that statistics alone will provide all the necessary guidance.

Friendly wagers aside, Marcu says that in the end, it won’t matter. “There is so much information on the Internet ... that these systems will absorb grammatical rules without pausing to articulate them.”

The biggest challenge MT may face is human expectation. “People think machines should be able to act like the computer on the bridge of the Star Trek’s Enterprise, or C3PO. That would be nice,” says Mr. DePalma, “but while everyone would like that fabled Babel fish in the ear [the universal translator from the sci-fi classic, “The Hitchhiker’s Guide to the Galaxy”], we are still a ways off from that.”