For my final project I have researched Speech-to-Text (STT) technology and the implications it is having on culture, literacy, and education. I chose this technology because it provides an opportunity to reflect on many of the concepts covered in this course. It touches on oral versus written language, the purpose of writing systems, how we adopt technologies, how education integrates them, the concerns and optimism around such technologies, what drives the technological changes and how using certain technologies for writing potentially changes how we think. I also chose this technology because my youngest daughter is one of many children for whom handwriting and to a lesser extent spelling pose specific challenges. While she does well in many aspects of education, her struggle with handwriting negatively impacts her self-esteem and enjoyment of school. To help her with her writing we have been getting her to use STT. I was curious to find out what the academic literature says about providing such education technology supports in the classroom.
STT is made possible because of advances made in Speech Recognition technology. It is a combination of highly advanced Speech Recognition computer programming and relatively simple word processing. The development of the technology was largely driven by nation state’s security interests and corporation’s desire to automate services, with powerful governments listening in at home and abroad needing to transcribe all of that content, as well as quickly highlight specific utterances, and corporations seeking to replace manual labor with machines in an effort at creating greater efficiencies and larger profits. These forces are like those that drove the development of early writing which as Hass points out: “… was used in state administration and bureaucracy, in trade and commerce, and in religion.” (1996). Although an extremely advanced technology in comparison to early forms of writing STT is used for many of the same purposes, exerting influence, amplifying conformity, recording transgressions and increasing efficiency.
Speech recognition technology has improved significantly in the past few years thanks to the application of deep learning and its use of big data. Deep learning is a subset of machine learning; a type of artificial intelligence built on complex neural networks that when trained using sufficiently large amounts of data can perform tasks that we once thought of as only humanly possible with incredible speed and accuracy. When a person converses with a virtual assistant like ‘Alexa’ or ‘Siri’ and has what feels like a seamless interaction it is in large part thanks to the deep learning algorithms that support these technologies. While the performance provided by these technologies is impressive it is not without its problems. Biases prevalent in for example ‘Western’ societies which disadvantage certain groups in those societies are sometimes replicated in the deep learning algorithms, such as the case with racial disparities identified in automated speech recognition in studies done in the USA. (Koenecke, A. et al, 2020). There appears to be little financial incentive for the few large corporations who dominate this technology to address these issues given the often-greater buying power of the dominant groups in those societies. The ubiquity of these popular speech recognition technologies may well lead to even greater homogenization of language and the loss of vernaculars across the world. Given that, as Boroditsky points out ‘the languages we speak shape our perceptions of the world’ (2011) the loss of vernaculars brought about by the ever-greater influence of speech recognition technology looks likely to reinforce the existing hegemonic state of the world.
An additional concern with the use of speech recognition technology is that given the enormity of the computing power required to facilitate these humans to machine ‘conversations’ it almost always means interacting with the Cloud. Otter, for example, which is one highly regarded STT tool and the one my daughter and I have been trialing uses AWS services. Reading the Privacy Policy provided by companies like Otter it serves mostly to remind the user that they are in essence giving up their right to privacy by using the product, listing as it does the many circumstances under which consumer data may be shared (2020). Otter is also the tool used to provide transcription services in Zoom meetings. The ability to provide closed captioning and transcription is highly beneficial to businesses in terms of the efficiencies it provides as well as for individuals who might face physical or mental challenges and might otherwise find participating in online meetings more difficult. However, it also, for better or worse, changes the way people communicate in meetings; making people more circumspect in what they say, and it of course does not capture what is intimated by body language and pauses. It is also worth noting that one of the main uses of STT is Compliance Management, particularly so in the financial sector which must monitor the calls and trades made by its employees (Fortune Business Insights, 2020). Continuous monitoring and scrutiny influence how people communicate. Moderating what we say contexts is not new or either simply bad or good, but what is clear is that STT technology applied in these contexts is shaping how we use language and influencing our thoughts.
Education
In their review of speech-to-text recognition technology for enhancing learning Shadiev and their fellow authors note that STT has been applied to enhancing learning in some areas of education since at least the late 1990s (2014). Initially it was mostly used with learners who had cognitive or physical disabilities, and learners receiving instruction in a second language. Studies carried out on those learners and their use of STT have shown modest positive results in relation to increasing the quality and quantity of written text (MacArthur, 2009; Peterson-Karlan, 2011). It is worth noting that at the time STT was being used as part of some of these studies the technology performed significantly less well than it does currently in relation to, for example Word Error Rate and Command Success Rate and required mastering a much less intuitive user interface compared with the current versions of the technology. Additionally, as is the case with the implementation of all education technology in a classroom any benefits gained from its use are as a result of several factors, (e.g., the teacher’s pedagogy, parent support, school culture, availability of resources, etc.) and as such it can be challenging to analyze the impact of the technology.
STT in education is most often framed as an aid to helps learners overcome certain barriers whether they be physical, mental, or other (Svensson, I., Nordström, T., Lindeblad, E., Gustafson, S., Björn, M., Sand, C., Almgren/Bäck, G., & Nilsson, S., 2019;). It is rarely discussed as simply a preference. Which is probably reflective of how it is perceived outside of the classroom, in society in general. Most people still think of writing as something done by hand, at least for any kind of meaningful writing. We do not imagine novelists composing their books aloud, although some do, with Agatha Christie being a famous example of an author who dictated many of her novels (Daily Writing Tips, 2021). Framing STT as a work around in education rather than a positive choice limits its adoption. Studies such as the one carried by Haug and Klien in which they investigated whether STT can be used to learn a writing strategy helps make the case that STT is a viable choice for all students (2018). Obviously, cost is still a prohibitive factor for many schools when it comes to using STT. However, any classroom that provides learners with the opportunity to type on screens could just as easily provide the option to use STT.
There is also still a good deal of support for the notion that teaching handwriting skills to young learners is important for reasons beyond simply mastering the ability to transcribe with some suggesting that the act of handwriting prepares the brain for learning (Ose Askvik, E., van der Weel, F. R. (Ruud), & van der Meer, Audrey L. H., 2020). Such arguments when carried over to the public domain are often framed as a zero-sum game; as if educators are being made to choose between something like STT and abandoning handwriting, or perhaps the thought is that if learners can compose with ease using STT they themselves will give up on handwriting. However, it is better to think of STT working alongside other forms of writing, similarly to how the New London Group describes multiliteracies creating: “a different kind of pedagogy, one in which language and other modes of meaning are dynamic representational resources, constantly being remade by their users as they work to achieve their various cultural purposes.” Cazden, C., Cope, B., Kalantzis, M., Luke, A., Luke, C., Nakata, M., & New London Group. (1999;1996; p.72) Writing using STT has the potential to change how individuals express themselves in writing in ways in which we have probably not yet begun to see.
Anecdotally, my daughter’s teachers will often refer to the notion that her hand cannot keep up with her brain and that is why she struggles with handwriting. However even allowing for the fact that there is probably room for improvement in her fine motor skills, handwriting exercises to improve this skill, while perhaps of some limited benefit, will not provide the solution. Of most concern is the fact that laboring over handwriting exercises is a drain on her time in return for small gain and detracts from the joy of learning in general and more specifically that associated with composing a piece of writing. A favorite soundbite in non-scholarly articles on the topic of STT is some approximation of the following: the average human can speak 150 words per minute, but the average person can write only 40 in the same time (Boyd, C., 2018)). The suggestion being that STT will free us from such physical limitations. For some people, notably those with physical disabilities, it may to an extent do that, but that is only one part of what it provides. Using STT compared with handwriting or typing helps learners reduce some of the cognitive load placed on a learner’s working memory resources when they are trying to compose and transcribe at the same time (Acorn, N., Klein, P.D. & Domboroski, J.D., 2017) This appears to be the case for my daughter. There are signs that she has challenges with her visual working memory which taxes her ability to transcribe and compose at the same time more than many of her peers. It is early days for my daughters use of STT. As with any technology there is a learning curve and there are frustrations. What is evident though is her delight at being able to quickly put into text her ideas. Tasks that would exhaust her when attempted with pencil and would quickly lead to her giving up she now engages with for longer and more fruitfully.
From our current view point it is difficult to imagine STT becoming as widely used as the more traditional forms of writing; but it is also not hard to imagine that people used to handwriting felt the same way before typing became commonplace. Many office workers around the world have recently begun to experience more flexible working arrangements. No longer tied to an office desk, could the next step be freedom from a keyboard and the ability to write using STT while walking? In education, providing greater opportunities for all learners to use STT would be a good example of Universal Design for Learning, helping to ensure that learners are provided with multiple options for expression and communication (CAST, 2021). I am a long way from being ready to switch from typing to STT, but I am trying to at least use it some of the time in an effort to show my daughter that it is as viable a text technology as any other.
References
Arcon, N., Klein, P. D., & Dombroski, J. D. (2017). Effects of dictation, speech to text, and handwriting on the written composition of elementary school english language learners. Reading & Writing Quarterly, 33(6), 533-548. https://doi.org/10.1080/10573569.2016.1253513
Boroditsky, L. (2011). How language shapes thought. Scientific American, 304(2), 62-65. https://doi.org/10.1038/scientificamerican0211-62
Cazden, C., Cope, B., Kalantzis, M., Luke, A., Luke, C., Nakata, M., & New London Group. (1999;1996;). A pedagogy of multiliteracies designing social futures. In B. Cope, & M. Kalantzis (Eds.), (pp. 60-92). Routledge. https://doi.org/10.4324/9780203979402-6
Daily Writing Tips (2021) Can you write a book or a novel with speech recognition software? Retrieved from: https://www.dailywritingtips.com/book-speech-recognition/
Fortune Business Insights (2020) Speech-to-Text API Market Size.Retrieved from: https://www.fortunebusinessinsights.com/speech-to-text-api-market-102781
Haas, C. (1996;1995;). Writing technology: Studies on the materiality of literacy. L. Erlbaum Associates.
Haug, K. N., & Klein, P. D. (2018). The effect of speech-to-text technology on learning a writing strategy. Reading & Writing Quarterly, 34(1), 47-62. https://doi.org/10.1080/10573569.2017.1326014
Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences – PNAS, 117(14), 7684-7689. https://doi.org/10.1073/pnas.1915768117
MacArthur, C. A., & Cavalier, A. R. (2004). Dictation and speech recognition technology as test accommodations. Exceptional Children, 71, 43–58. doi:10.1177/001440290407100103
Ose Askvik, E., van der Weel, F. R. (Ruud), & van der Meer, Audrey L. H. (2020). The importance of cursive handwriting over typewriting for learning in the classroom: A high-density EEG study of 12-year-old children and young adults. Frontiers in Psychology, 11, 1810-1810. https://doi.org/10.3389/fpsyg.2020.01810
Otter.ai (2020) Privacy Policy, retrieved from: https://otter.ai/privacy
Peterson-Karlan, G. R. (2011). Technology to support writing by students with learning and academic disabilities: Recent research trends and findings. Assistive Technology Outcomes and Benefits, 7, 39–62.
Shadiev, R., Hwang, W., Chen, N., & Huang, Y. (2014). Review of speech-to-text recognition technology for enhancing learning. Educational Technology & Society, 17(4), 65-84.
Speech recognition (2021, December 3). In Wikipedia. https://en.wikipedia.org/wiki/Speech_recognition
CAST (2021, December 5). The UDL Guidelines. https://udlguidelines.cast.org/