Translation memory (TM), computer-assisted translation (CAT) and machine translation (MT) tools are widely used for text-based applications, but spoken language communication is largely neglected. Though many business communications employ speech as the primary mode of interaction, the status quo of current localization business models and technologies remains limited to written applications. However, emerging technologies enable spoken inter-language communication through TM leverage, terminology databases, and computer-assisted interpretation (CAI), all of which point to move toward machine interpretation (MI).
The orality of communication
Of the 6,912 known living human languages, 2,261 have writing systems. Comparatively, all languages have either an oral or manual (signed) tradition. Oral and manual systems of communication define in part what it means to be a human being. Writing systems have resulted from attempts to catalog and capture spoken language — a portrait of a natural language at a given moment in time. Like all portraits, written systems are an imperfect rendition imperfectly attempting to capture the reality, richness and myriad dimensions of language.
Even Shakespeare’s plays, deemed to be some of the world’s greatest written works, were originally a combination of visual and oral mediums, destined not to be read, but to be seen and heard. We only need to consider the fact that the human ear can perceive between 300,000 and 400,000 distinct emotional states through tone of voice to gain a better understanding of the limits of written communication. In part due to these limitations, written language forms cannot keep pace with their spoken counterparts, often retaining elements of language that are outdated and no longer reflective of the speech used by the masses. In contrast, spoken language is dynamic, interactive and constantly evolving.
Text-based language is easier to control, while spoken language has a greater degree of variation and spontaneity. Written language has historically offered more ease and consistency for tracking, organizing, repurposing and distributing the information expressed in a language. Due in part to these qualities and to a historical lack of low-cost technologies for audio and video recording, linguists and businesses alike tended to rely heavily on written language.
A father of modern linguistics, Ferdinand de Saussure, believed that writing was merely a complement to oral speech. He wrote of the “usefulness, shortcomings and dangers” of writing and noted that oral speech is the underpinning of all verbal communication. In spite of its ancestry, written language sometimes wields more power than its spoken counterpart. In fact, languages are sometimes not even regarded as “legitimate” unless they have a writing system. Until recent decades, scholars focused their attention on written language, which was easier to document, analyze and write about than spoken language. With changes in technology, academics in the fields of linguistics, sociology and anthropology are turning an ear toward language in its most natural form, paying closer attention to the differences between writing and orality.
Limits of literacy
In spite of renewed attention from academics, at a practical level, individuals who cannot access written communication are often at a severe disadvantage in societies that depend on their members’ ability to read and write. According to estimates from the United Nations, one in five adults aged 15 or older was illiterate in 2000 worldwide; the majority of these individuals hail from countries with limited economic resources. Economically developed countries are affected by global literacy rates.
According to the International Organization for Migration, there were nearly 175 million international migrants worldwide in 2005, meaning that one in every 35 people is an international migrant. In Europe, migrants accounted for 7.7% of the population, while in North America, migrants made up 12.9% of the continent’s total. Australia had the highest percentage of migrants at 18.7%.
However, even in highly developed economies, a literate population is not a given. The National Adult Literacy Survey found in 2003 that roughly 22% of the American population had “below basic” quantitative literacy skills. This means that while these individuals may be able to sign their names or locate the expiration date on a milk carton, they cannot successfully perform basic tasks such as using a television guide to find out what programs are on at a specific time or comparing the ticket prices for two events.
Another 33% of the population had only basic literacy, meaning that they can find the television program and compare ticket prices, but cannot determine which food contains a particular vitamin on a label or tally up the total on an office supply order form.
In summary, anyone who wants to communicate messages to large groups of people — be they government bodies, organizations or businesses — cannot possibly reach the total audience through written language alone. In the United States, companies that want to reach consumers solely via text-based communication frequently overlook more than a fifth of their potential market. Those who produce material at any literacy level that exceeds the “basic” category are unable to reach 55% of their audience.
Automation’s shift toward spoken language
The language services industry, along with most economically developed societies, has mirrored academia by devoting most of its energies toward written language. There’s no shortage of tools, processes and technologies that promise to automate nearly every aspect of translation production and management, but they tend to focus on text-based communication.
When you factor voice-over and speech technologies into localization and internationalization discussions, you find interpretation aids mentioned only in the context of specific applications or technologies such as health care, public safety and the military. Your search will discover that few technologists pay much attention to the implications of spoken language automation for the field at large.
Meanwhile, interpreters of spoken language are increasing in both importance and number throughout the world. Speaking at 150 words per minute, an interpreter who spends four hours per day rendering spoken language converts 36,000 words per day without automation. Compare that to a prolific translator who, for most language pairs, would be lucky to manually convert 1,000 written words in the same time period.
As we’ve observed with most societal advances, innovations are at first available only to a select and privileged few, until technology improvements enable the masses to benefit. For example, family portraits were once available only to those with the money and the time to stand in place for hours. Today, this technology is affordable to the average person, evidenced by announcements earlier this year from Sony and Canon that each company had shipped more than 100 million compact digital cameras. Camera phone sales tend to outweigh digital camera sales at a ratio of four to one, meaning that a greater percentage of the world’s population can benefit from this technology than in the past.
Wherever oral communication is concerned, we see a similar movement toward broader market availability of automating technologies. In centuries past, spoken language interpreting services were provided primarily to royalty, dignitaries and the affluent. Today, community interpreting makes spoken language services available to the masses. With remote language mediation, such as telephone interpretation (TI) and video interpretation (VI), it does not matter if the individual’s language is a high-demand language (such as Spanish in the United States or Polish in Ireland) or a low-demand language (such as Papiamento in Japan). Access to interpreters for hundreds of languages is available within seconds and at a relatively small cost.
CAI
Total automation of interpretation emer-ges as an ambitious goal and will not take place overnight. Like the developments in the field of MT, the movement toward MI will be incremental. Yet, while it is unfamiliar territory for most who concentrate on written language, some promising efforts to automate conversion have already taken place in the oral communication realm.
CAI is a growing phenomenon. When working as a telephone interpreter in the mid-1990s, I experienced CAI firsthand. Because I provided interpreting services remotely, I was able to use machine-based support to assist with many interpretation tasks. As an on-site interpreter, I rarely had the chance to use reference materials in the midst of a live interpretation. However, as a remote interpreter seated in front of a PC, suddenly I had the ability to access resources in real time to facilitate part of my work.
Searchable electronic glossaries were available to assist me when I needed to find the perfect equivalent for a term in the regional variety that would be most suitable for the person on the other line. Definitions for unfamiliar terms were at my fingertips, reducing the need to request clarification. Even pre-translated scripts were available so that I did not even have to think about how to render lengthy questions on life insurance applications.
I simply needed to read them aloud from the document and only had to interpret the answers from the other party, thereby eliminating the need for me to convert in one direction. Half of the words that were to be spoken were already translated, so all I had to do was pronounce them, much like an in-language insurance agent would read the questions from a script.
I could even talk to fellow remote interpreters in chat rooms in case I needed help with a term or phrase, and we supported each other frequently this way. We were already engaging in an early form of collaboration to facilitate our interpreting work. Little did I know at the time that Facebook would employ a similar technique for website localization more than a decade later by engaging in collaborative translation.
Today, telephone interpreters can still access resources, both human and electronic, to support their work, improve quality, and provide renditions faster with reduced need for clarification requests. VI is also available for some spoken and sign languages; however, the interpreter’s on-screen presentation can make it a bit more difficult to access web resources and search for terms, let alone read a pre-translated script while simultaneously looking into the camera. However, technology promises solutions for overcoming these limitations of VI.
For example, pre-translated scripts could be visible to the interpreter on a split screen, or the interpreter could control whether or not to display a video feed. VI is also an important technology to provide access to communication for deaf and hard of hearing populations. In the United States alone, the National Center for Health Statistics estimates that approximately 20 million people — 8.6% of people aged 3 or older — have hearing problems. Around the world, many countries report shortages of sign language interpreters, and there are at least 103 different sign languages.
MI
On the side of remote interpreting service providers, not much has taken place over the past decade to advance CAI or MI. While a great deal of potential exists for automation of interpreting, it remains largely unexplored. For example, some TI companies record millions of minutes’ worth of interpreted calls. This effort represents enormous stashes of potential “interpretation memory” — units of interpreted speech that could be leveraged by replaying them. Sound familiar?
The same concepts that drive TM apply to interpretation. Because a great deal of TI involves repeated speech, practitioners can benefit from these utterances that are spoken over and over, day after day, year after year. Thus, rather than have these same segments of speech interpreted thousands upon thousands of times by humans, much of this effort could be automated by computers. That could free the limited cadres of human interpreters for more valuable communication tasks that involve less predictable communication.
Government organizations — some of the biggest spenders on interpreting services — would be among the greatest beneficiaries of harnessing the power of these repositories of interpretation memory, either by partnering with existing providers that record interpreted sessions or by establishing their own services and creating recordings. Eventually, bodies such as the US Department of Health and Human Services and the European Commission (EC) could begin sharing audio-format interpretation memory files, much as the EC is already harvesting TM.
MI is already happening in some settings. Devices such as IBM’s MASTOR and the Voxtec Phraselator automate unidirectional interpretation for military settings. Companies such as Polyglot Systems already facilitate interpretation in both directions for some health care settings. The company’s ProLingua software product enables hospital staff to speak more than 7,000 medical questions and instructions in a variety of languages by clicking on the phrase they wish to convey, after which the system verbalizes the phrase in the patient’s preferred language. Solutions such as these are extremely promising.
However, they represent only a tiny fraction of the possibilities for automating interpretation. Technologies such as those offered by SpeakLike may provide a glimpse of the future. With this web-based service, a person can type the information he or she wants to convey into a chat window. Then MT generates translations, which are in turn corrected by human linguists in real time so that the person who speaks the other language can see the translation.
This enables two people to “talk” to one another via web chat with ease. It is easy to imagine how a live linguist could just as easily receive the text and speak the phrase in another language instead of typing it.
These MI offerings represent important advances in the quest to automate spoken language transfer. Unfettered bidirectional speech-to-speech communication is still the Holy Grail in the automation space. In the cases of Phraselator and ProLingua, one-way communication is possible, and this certainly fills an important gap.
However, with these technologies, if the party who is the recipient of the unidirectional communication wishes to respond or ask a question outside of the program’s parameters, this is not possible without requesting additional language assistance. ProLingua had the foresight to build in a solution to this problem. The system can instantly connect an interpreter through a dialer that routes the call to a TI provider.
With SpeakLike, bidirectional communication is possible, but if a spoken language element were to be added, this would be a wonderful way of bridging the literacy gap. Individuals who cannot read or write — let alone type — could simply speak their utterance via Skype or another computer telephony product.
Then the remote linguist could either speak or type the rendition to the other party. While it would still require human intervention, a hybrid of this nature holds potential for overcoming barriers of language differences as well as disparities in literacy. One solution, JAJAH, already exists for this purpose.
Just as we have seen with efforts to automate translation, human linguists still play important roles, especially in clarifying ambiguities and perfecting the output to make sure it can be easily understood. However, in the realm of interpreting, the linguist’s role is even more complex and varied. Human interpreters often take on roles that are unrelated or distantly connected to language transfer tasks.
Depending on the professional standards of practice, the customs within a country or community, and the setting in question, an interpreter may do everything from walking a customer to the parking lot to consoling a parent who has learned that surgery did not save the child’s life.Such tasks are non-linguistic in nature, and there is great debate in the field of interpreting about whether or not interpreters should assume such roles.
The non-language-related elements of interpretation are important considerations, but technology will continue to evolve nonetheless and is likely to address the linguistic elements first and foremost. Expanding access to spoken and sign language services will have a tremendous impact on large groups of people. Countries require interpreting services to communicate with trade partners and to participate in the global political scene. Multilingual societies are especially reliant on spoken language interpretation.
They simply cannot guarantee health care, public safety, education and legal services to their population without it. There are 192 multilingual countries in the world. Of the 193 recognized countries, only politically isolated North Korea is considered monolingual. All of these nations need interpreting services. M
de Saussure, Ferdinand. Course in General Linguistics. Trans. Wade Baskin. 1st French ed. New York: The Philosophical Library, 1959. Gordon, Raymond G., Jr. Ethnologue: Languages of the World. 15th ed. Dallas, Texas: SIL International, 2005.
International Organization for Migration (IOM). World Migration 2005: Costs and Benefits of International Migration. Geneva, Switzerland: IOM, 2005.
Karpf, Anne. The Human Voice: How This Extraordinary Instrument Reveals Essential Clues About Who We Are. New York: Bloomsbury Publishing, 2006.
National Center for Health Statistics. Data from the National Health Interview Survey, Series 10, Number 188, 1994.
White, Sheida, and Sally Dillow. Key Concepts and Features of the 2003 National Assessment of Adult Literacy. National Center for Education Statistics (NCES) 2006-471. US Department of Education: Washington, D.C., 2005.
Nataly Kelly is a senior analyst at Common Sense Advisory.