Introduction
The introduction of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies marks a significant milestone in the evolution of communication, literacy, and education. STT is a software that recognizes and translates spoken language into text using computational linguistics. On the other hand, TTS is software that reads and converts written text into human speech (Oriserve, 2024). These technologies have transformed the way people interact with written and spoken language, offering new possibilities for individuals with disabilities and enhancing communication for all. By examining the reciprocal relationships between communication needs, technological invention, and evolving practices, I hope to explores the development and potential of STT and TTS technologies, and the implications for literacy and education.
Historical and Cultural Context
Early Communication Technologies and Accessibility Needs
The need for alternative communication methods has always been present, particularly for individuals with disabilities. Before the invention of STT and TTS technologies, communication tools for the visually and hearing impaired were limited. Braille was developed in the 19th century, and it is a code constructed by a system of raised dots (American Foundation for the Blind, n.d.). Those that are blind or with low vision can read with their fingers using the Braille. However, it is labor-intensive and limited in scope. For those with speech or motor impairments, communication was often facilitated through the use of sign language, which also presented significant limitations in accessibility and autonomy – such as in the event of limited physical mobility or poor visibility (Educational Wave Team, March 2024).
The rapid advancement of digital technology in the late 20th centuries set the stage for the development of STT and TTS technologies. As computers became more powerful, the possibility of creating assistive technologies that could convert speech to text and text to speech became a reality (Xiong, 2022). The rise of personal computing and the internet further accelerated the need for inclusive communication tools that could accommodate diverse user needs.
The Development of Speech-to-Text and Text-to-Speech Technologies
STT and TTS technologies evolved from the broader fields of speech recognition and speech synthesis. Early research in these areas began in the 1950s at Bell Laboratories, where the first language recognition system “Audry” was created. It could recognize English letters and characters. Over the following decades, advancements in machine learning, natural language processing, and computational power led to more sophisticated systems capable of understanding and generating human speech with greater accuracy (Xiong, 2022).
By the 1990s, artificial neural networks and machine learning brought breakthroughs in STT technology. Markov models had become the preferred framework for modeling properties of speech (Xiong, 2022). These early systems were primarily used for dictation and transcription, offering an alternative to typing for users with physical disabilities or those who needed to increase their productivity. Concurrently, due to the increasingly improved development of speech storage and manipulation techniques, TTS was also being developed by Bell Labs and IBM, benefiting users with visual impairments or reading disabilities (Hossain, 2023).
The early 21st century saw the integration of these technologies into a wide range of devices and applications, from smartphones and computers to home assistants and accessibility tools. The proliferation of mobile devices and cloud computing further democratized access to STT and TTS technologies, making them more user-friendly and widely available (Hossain, 2023).
Communication Needs and Existing Practices
Addressing Learning Accessibility and Inclusion
One of the primary communication needs addressed by STT and TTS technologies is learning accessibility. For students with learning disabilities such as dyslexia, these technologies can increase opportunities on organization and argumentation in students’ text communication, and provide critical tools for them to engage in written and spoken language (Matre & Cameron, 2024). STT technology also enables students with physical impairments to input text without the need for a keyboard, make writing and communication more accessible. Additionally, it improves foreign students’ learning experience (Shadiev et al., 2014). TTS technology, on the other hand, offers individuals with visual impairments or reading disabilities the ability to access written content audibly, breaking down barriers to information and learning (Wood et al., 2018).
Before the advent of these technologies, individuals with disabilities often faced significant challenges in accessing education and participating fully in society. Traditional literacy practices were heavily reliant on print and manual writing, which excluded those who could not engage with these mediums. STT and TTS technologies have helped bridge this gap, empowering individuals with disabilities to engage with language in ways that were previously impossible (Matre, 2022).
Enhancing Communication Efficiency and Multitasking
Beyond accessibility in teaching and learning, STT and TTS technologies have also addressed broader communication needs, such as enhancing efficiency and enabling multitasking. For example, we frequently use digital assistants such as Siri and Alexa in our mobile applications with our daily tasks; there are dictation software built directly into our phones and computers that transform spoken words into text in real-time (eDist, 2023). For professionals, students, and everyday users, STT technology is particularly valuable in scenarios where speed and accuracy are crucial, such as during meetings, lectures, or while driving.
TTS technology, meanwhile, has become an essential tool for multitasking. Users can listen to written content while performing other tasks, such as commuting or exercising, making it easier to consume information in today’s fast-paced world. This flexibility has transformed how people interact with written content, enabling them to integrate reading into their daily routines in new and convenient ways (Garris, 2024).
Implications for Literacy
Redefining Literacy in the Digital Age
The introduction of STT and TTS technologies has had profound implications for the concept of literacy. Traditionally, literacy has been defined as the ability to read and write. However, with the advent of these technologies, literacy is increasingly being redefined to include the ability to engage with text and language through digital tools. As Christina Haas (1996) pointed out long ago, each literacy technology augments the relationship between written text and the social and cognitive processes differently.
STT technology challenges the traditional idea of writing. The act of writing becomes an oral process, where speech is transcribed into text. Classrooms that use STT to phonetically draft ideas are learning to re-write. Instead of learning how to write the traditional way, students become better writers by correcting their own phonetically recorded work (Smith, 2021). This shift raises important questions about how literacy is taught and assessed in educational settings, as traditional methods may not fully capture the diverse ways in which individuals now produce and interact with text.
Similarly, TTS technology expands the concept of reading beyond the visual decoding of text, where reading becomes an auditory experience. This has implications for how literacy is assessed and understood, as it broadens the scope of what it means to be literate in a digital world.
Promoting Digital Literacy
Digital literacy is increasingly recognized as a fundamental skill in the 21st century. It is essential for success in both education and the workforce (Martínez-Bravo, 2022). As STT and TTS technologies become more integrated into educational practices, they also play a crucial role in promoting digital literacy. Students must learn how to effectively use these tools, understand their limitations, and develop strategies for integrating them into their learning routines. This means teachers have to start educating themselves on these tools – not only to learn how to use them, but to develop critical understandings of the technologies and their impacts on literacy (Bradbury, 2014). This involves not only technical skills but also critical thinking and adaptability, as both teachers and students must learn to navigate a digital landscape that is constantly evolving.
Implications for Education
Enhances Learning and Support Diverse Learning Styles
As mentioned previously, STT and TTS technologies have made it possible to create more inclusive classrooms, where students with disabilities can participate on equal footing with their peers. Beyond supporting learning disabilities, integration of STT and TTS technologies into education have the potential to improve and enhance learning experiences for all. They allow students to interact with their learning materials through multimodal and multisensory experiences, enhancing comprehension and knowledge retention (MedRec Technologies, 2023).
STT and TTS technologies also support diverse learning styles, customizing to different preferences and needs. Some students may find it easier to express their ideas orally rather than through writing, making STT a valuable tool for capturing their thoughts and improving their writing skills. Others may benefit from listening to content rather than reading it, making TTS an essential tool for auditory learners (MedRec Technologies, 2023).
Implications for Educational Policy and Practice
As STT and TTS technologies continue to evolve, they will also have implications for educational policy and practice. Educators and policymakers will need to consider how to best integrate technology into the curriculum. In the new digital world, students will need new foundational skills. Teachers will need to be trained to understand the technology and to teach students these new foundational skills. There also needs to be policy on equal opportunity, and ensure that all students have access to the necessary resources (OECD, 2015). New instructional design should incorporate technology to complement teaching and learning, and support diverse needs from students (Guppy et al., 2022).
Conclusion
Speech-to-Text and Text-to-Speech technologies represent a significant advancement in the evolution of communication, literacy, and education. By addressing the needs of individuals with disabilities, enhancing communication efficiency, and supporting diverse learning styles, these technologies have redefined what it means to be literate in the 21st century. As they continue to develop, they will play an increasingly important role in promoting digital literacy, creating inclusive educational environments, and shaping the future of learning. However, it is crucial to address the challenges of accessibility, equity, and accuracy to ensure that the benefits of these technologies are realized by all. Through thoughtful integration and ongoing innovation, STT and TTS technologies will continue to transform literacy and education in ways that are both profound and far-reaching.
References
Bradbury, K. S. (2014). Teaching Writing in the Context of a National Digital Literacy Narrative. Computers and Composition, 32(Journal Article), 54–70. https://doi.org/10.1016/j.compcom.2014.04.003
American Foundation for the Blind (n.d.). What is Braille. https://www.afb.org/blindness-and-low-vision/braille/what-braille#:~:text=Braille%20is%20a%20system%20of,Braille%20is%20not%20a%20language.
eDist. (2023, August 16). The future of speech recognition: seamless communication. LinkedIn. https://www.linkedin.com/pulse/future-speech-recognition-seamless-communication-edist/
Educational Wave. (2024, March 14). Pros and cons of sign language. https://www.educationalwave.com/pros-and-cons-of-sign-language/#Accessibility_Issues
Garris, M. (2024, May 24). Improving research productivity: maximizing the potential of free text-to-speech services. The Academic. https://theacademic.com/research-productivity-free-text-to-speech-services/
Guppy, N., Verpoorten, D., Boud, D., Lin, L., Tai, J., & Bartolic, S. (2022). The post‐COVID‐19 future of digital learning in higher education: Views from educators, students, and other professionals in six countries. British Journal of Educational Technology, 53(6), 1750–1765. https://doi.org/10.1111/bjet.13212
Haas, C. (1996). Writing technology: Studies on the materiality of literacy (1;1st;). L. Erlbaum Associates. https://doi.org/10.4324/9780203811238
Hossain, A. (2023, June 25). The history and improvements of Text-to-Speech technology. LinkedIn. https://www.linkedin.com/pulse/history-improvements-text-to-speech-technology-altaf-hossain-limon/
Martínez-Bravo, M. C., Sádaba Chalezquer, C., & Serrano-Puche, J. (2022). Dimensions of Digital Literacy in the 21st Century Competency Frameworks. Sustainability, 14(3), 1867. https://doi.org/10.3390/su14031867
Matre, M. E. (2022). Speech-to-Text Technology as an Inclusive Approach: Lower Secondary Teachers’ Experiences. Nordisk tidsskrift for pedagogikk & kritikk, 8(Journal Article), 233–247. https://doi.org/10.23865/ntpk.v8.3436
Matre, M. E., & Cameron, D. L. (2024). A scoping review on the use of speech-to-text technology for adolescents with learning difficulties in secondary education. Disability and rehabilitation. Assistive technology, 19(3), 1103–1116. https://doi.org/10.1080/17483107.2022.2149865
MedRec Technologies. (2023, September 7). Technology is revolutionizing education. LinkedIn. https://www.linkedin.com/pulse/power-voice-how-text-to-speech-technology-revolutionizing/
OECD. (2015). Implications of Digital Technology for Education Policy and Practice. In Students, Computers and Learning (1–Book, Section, pp. 185–193). OECD Publishing. https://doi.org/10.1787/9789264239555-11-en
Oriserve. (2024, February 29). Text to speech vs. speech to text: what’s the difference? LinkedIn. https://www.linkedin.com/pulse/text-speech-vs-whats-difference-oriserve-805ec/
Shadiev, R., Hwang, W.-Y., Chen, N.-S., & Huang, Y.-M. (2014). Review of Speech-to-Text Recognition Technology for Enhancing Learning. Educational Technology & Society, 17(4), 65–84.
Smith, C. (2021, June 9). The benefit of speech-to-text technology in all classrooms. KQED. https://www.kqed.org/mindshift/57786/the-benefits-of-speech-to-text-technology-in-all-classrooms
Wood, S. G., Moxley, J. H., Tighe, E. L., & Wagner, R. K. (2018). Does Use of Text-to-Speech and Related Read-Aloud Tools Improve Reading Comprehension for Students With Reading Disabilities? A Meta-Analysis. Journal of Learning Disabilities, 51(1), 73–84. https://doi.org/10.1177/0022219416688170
Xiong, X. (2022). A Summary of the Development of Speech Recognition Technology. Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence, 768–773. https://doi.org/10.1145/3584376.3584513