The Development and Use of Machine Translation: Past and Present.

Introduction

I am writing on machine translation (MT) because I started teaching modern languages this year and I found that all of my students were using Google Translate both successfully and unsuccessfully for reading and writing. I am curious to know why that is, and how I can incorporate it into the pedagogy of language acquisition. In this project I will give a brief timeline of machine translation, summarize how the three main types of MT systems work, and where MT fits in modern education. It must be said that that MT is a tool to transfer information, not for language acquisition.

What is machine translation? It is “translation of natural languages by machine” and “the application of computer and language sciences to the development of systems answering practical needs” (Hutchins, 1995, p. 431). Idealistically, the end goal of MT is to create a high level translation between two languages that takes into consideration context, colloquialisms, formalities, and many other linguistical nuances. Realistically, the aim of machine translation “is to give the most accurate translation of everyday texts” (Poibeau, 2017, p. 4).

A Brief Timeline of Machine Translation

Hitchens (1995) and Poibeau (2017) give a thorough timeline of MT. Here are the main points in it’s history:

1930s – the French-Armenian, George Artsouni, and Russian, Smirnov-Trojanskij, independently patented designs for translating mechanical machines
1949 – Warren Weaver of the Rockefeller foundation outlined the prospects of MT methods through: cryptography, statistical methods, information theory, and the exploration of logic and the universal features of language.
1954 – Georgetown University and IBM held the first public demonstration of MT using 49 Russian sentences translated into English using 250 words and 6 grammar rules.
1966 – the Automatic Language Processing Advisory Committee (ALPAC) reported that a human translator was more cost-effective and funding for research was in decline since the late 1950s
1970s – The University of Montreal developed a syntactic transfer system for English-French translation, TAUM, which achieved the creation of a computational metalanguage, and a set the foundations for a programming language widely used in natural language processing.
1980s – Japanese companies developed computer-aided Japanese-English translation software for personal computers and text-processing software.
1989 – Up until this year, MT used a rule-based approach, but IBM experimented with a statistical corpus based system which inspired further experimentation and research
1990s – Personal computers had access to MT software such as Babel Fish
2006 – Google launched Google Translate using statistical machine translation
2016 – Google Translate shifted from statistical machine translation to neural machine translation which uses neural networking along with artificial intelligence.

The Three Main Types of MT Systems: Rule-Based, Statistical, and Neural

The workings of machine translation are extremely complicated. In this section, almost everything is a summation from Poibeau’s excellent book Machine Translation (2017), when not, other authors are directly referenced.

Rule-based MT systems are highly sophisticated by using a bilingual dictionary that follow thousands of rules to change word order to the specificities of the target language. Research of rule-based MT came out of World War 2 cryptology, and then the Cold War period which needed English-Russian translation (Hutchins, 1995; Poibeau, 2017). This type of MT factors in morphology, semantics, and syntax to make a translation. Rules needed to be established because a word-for-word translation can be too vague and ambiguous. Poibeau describes Warren Weaver’s four principles to avoid basic errors of word-for-word translation: one, the context of the word needs to be considered according to the topic and genre of the text to be translated, if known; two, it should be possible to determine a set of logical and recursive rules; three, Shannon’s model of communication (a message channel starting with a sender through a transmitter (a medium) to a receiver that reconstructs the senders message (Dubberly, 2011)) which was useful for cryptography would be useful; four, universal elements make up language and these can be used in the translation process to avoid ambiguity. Rule-based MT could not solve semantic ambiguities such as this example: “Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy” (Poibeau, 2017, p. 71).

Statistical MT systems “calculate probabilities of various translations of a phrase being correct, rather than aiming for a word-for-word translation” (Groves & Mundt, 2015, p. 113). A giant body of multilingual text called the bilingual text corpora is used for the MT system to reference. This is the technology that was eventually used for Babel Fish and Google Translate. Statistical MT process uses two steps: one, using the bilingual text corpora to automatically acquire information about the translations of words, and alignment of words at the sentence level, to encode, or learn, the information; two, the information and knowledge extracted from the encoding is then decoded to translate new sentences. In the early 1990s, IBM developed a fundamental equation for MT and Poibeau describes the three step process: “[one], determine the length of the target sentence depending on the length of the source sentence; [two], identify the best possible alignment between the source sentence and the target sentence; [three], find correspondences at the word level” (Poibeau, 2017, pp. 128 – 129). IBM’s model resides in the choice of words for translation to the target language and uses a two step approach: one, to extract as much information as possible from the bilingual text corpora; two, use this information to translate the sentences.

Neural MT, like statistical MT, these systems consists of an encoder and decoder to produce a translation from a given sentence but the encoder analyzes the source language so that the decoder can generate a translation for the target language. Poibeau describes the three characteristics of neural MT systems: one, the system thus tries to identify and group words appearing in similar translational contexts in what it called ‘word embeddings’; two, the models continuously grow as users interact with the system because the neural approach analyzes words, sentences, or groups of words to be compared and identified; three, it is hierarchical in nature since it can discover structure inside of a sentence, and it does this based observations through system training. Even though neural MT systems have come a long way since the rule-based systems of the 1950s this technology still fails to recognize overall sentence structure. Like the other two systems ambiguity remains an issue. In 2016 Google introduced GMNT, Google’s Neural Machine Translation System, and Google Translate began to support 10,000 language pairs (Vu et al., 2016).

Machine Translation in Modern Education

Students often have their smartphones on their desk in the high school classroom. “It seems inevitable that students will also utilize applications that will allow them to not only understand the single meaning of a lexical items but to render entire stretches of text” (Groves & Mundt, 2015, p. 114). While teaching Italian I have seen my students use Google Translate to for single words, sentences, and even entire pages using the Google Lens feature for written work and reading comprehension. The issue is that students do not know how to properly use Google Translate because one has to understand how its neural MT system works. Often times, verb tenses that have not been learned yet appear in the translation, or the translation does not make sense. “The most common error types [are] morphological and in word sense” (Groves & Mundt, 2015, p. 114). In my experience, this is because students use colloquial sayings or they do not clearly articulate what they want to say in English well enough for a good translation. Groves & Mundt (2015) say that Google Translate “is unable to help students align their writing to the norms and expectations of the wider discourse community” (p. 114). How is MT implemented in education? How can it be used honestly? Does it have a pedagogical place in language acquisition?

Implications in Education

Language classes are meant for language acquisition. Groves & Mundt (2015) suggest that “the sanctioned use of translation tools may undermine the actual language acquisition process or even the need to learn another language in the first place, potentially leaving examinations that forbid the use of such devices the main incentive for students to actually learn the language” (p. 119). MT tools are meant to support the student in their learning and is paralleled with the calculator. “The calculator did not remove the need for teaching maths – instead it allowed students to go further, quicker” (p. 120). That being said, if the student’s work has gone through a machine where spelling errors are corrected and the output is often a readable translation, is that output still the student’s voice? They “argue that a translated text is no longer the student’s own work” if it’s to be assessed (p. 119). Furthermore, “Google Translate ” is so good at at conjugating that it often allows lower-level students to produce complicated verb tenses that have not yet been studied” (Ducar & Schocket, 2018, p. 783). A conscientious student would input the translation, receive the output, and then make some adjustments post-translation.

What about entrance in to university as a second language learner? “The output of Google Translate approaches, if not actually exceeds, the minimum language requirement for a large number of English-speaking universities” (Mundt & Groves, 2016, p. 389). Groves & Mundt (2015) are concerned with students learning English for academic purposes, is that community to deny or embrace MT? Those students are already coping with an English language environment, and since academic work is done in a community it is unlikely that those learners would place blind-faith in MT (Groves & Mundt, 2015).

MT Systems as a Pedagogical Tool

If personal devices are important in everyones lives they should be used for educational purposes. Ducar & Schocket (2018) provide five recommendation for teachers to implement MT technologies in the classroom: “[one], they must evaluate their own knowledge of the available and emerging tools; [two], directly teach learners how to use appropriate technology responsibly; [three], review their beliefs about students’ use of supportive technologies; [four] familiarize themselves with their institution’s policies on academic honesty; and [five], decide how they intend to act and react when such policies are violated, all while offering engaging and motivating instruction and assignments (p. 793). With knowledge on how MT technologies work and a conscientious implementation students and teachers will find success.

Successful inclusion of MT in the classroom is possible. A study by Niño (2020) found that students “were aware of [MTs] limitations at the sentence or text level; however they [thought] it works pretty well as a quick reference for words in context and with verb conjugations…the students agreed that [MT] output still needs human input to bring it to an acceptable level of accuracy” (p. 19). In Niño’s study, students did note that MTs use is “questionable for medical or legal interpretation purposes because of the ambiguity and consequent misunderstandings that can arise” (p. 19). Furthermore, Niño found that the use of MT expanded digital literacy, reinforced previous learning, provided discussion on intercultural, subject-related, and linguistic questions, and enhanced metalinguistic reflection (2020).

Conclusion

Neural MT technologies are continually improving as users interact with the system but it is still far from that idealistic definition in the Introduction of this work. I would like to close with this quote by Poibeau (2017): “we should remember that the world chess champion was beaten by a computer in 1997, the world Go champion was beaten by a computer in 2016, but no computer is able to translate accurately between two languages” (p. 195).

References

Dubberly H. (2011, April 21). Shannon’s Model of Communication. Dubberly Design Office. http://www.dubberly.com/models/shannons_model_conversation.html
Ducar, C., & Schocket, D. H. (2018). Machine translation and the L2 classroom: Pedagogical solutions for making peace with google translate. Foreign Language Annals, 51(4), 779-795. https://doi.org/10.1111/flan.12366
Groves, M., & Mundt, K. (2015). Friend or foe? google translate in language for academic purposes. English for Specific Purposes (New York, N.Y.), 37, 112-121. https://doi.org/10.1016/j.esp.2014.09.001
Hutchins, W. J. (1995). Machine translation: A brief history. Concise history of the language sciences (pp. 431-445). Elsevier Ltd. https://doi.org/10.1016/B978-0-08-042580-1.50066-0
Mundt, K., & Groves, M. (2016). A double-edged sword: The merits and the policy implications of google translate in higher education. European Journal of Higher Education, 6(4), 387-401. https://doi.org/10.1080/21568235.2016.1172248
Niño, A. (2020). Exploring the use of online machine translation for independent language learning. Research in Learning Technology, 28, 1-32. https://doi.org/10.25304/rlt.v28.2402
Poibeau, T. (2017). Machine translation. The MIT Press.
Vu, Q., & Schuster, M. (2016, September 27). A Neural Network for Machine Translation, at Production Scale. Google AI Blog. https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Mark Pepe’s ETEC 540 Course Site

Final Project: Describing Communication Technologies

by markpepe

The Development and Use of Machine Translation: Past and Present.

Elsewhere