Task 3: Voice to text

Posted by in Tasks

Week 3 Assignment

Transcripts of 5-minute Narration

Listen to an audio recording of the original oral narration:

And, just for fun, listen to Microsoft’s Read Aloud version of the transcript:

To preface, I wanted to know how Microsoft Word’s Dictation tool would perform compared with Voice Typing in Google Docs, so I recorded my five minutes of talking onto a voice memo that I used to dictate into each. The results were similar, but with some notable differences. To begin with, there is no punctuation in the entire text, as the speaker is required to name the desired punctuation when it is needed to instruct the software to insert it. While it is possible to identify sentence and phrase structures in the writing, there are instances where it isn’t clear where to punctuate. The verb tenses are not consistent. In one passage, the text reads: “early on when we moved there my parents wanted to get a boat so they can go fishing and we can go exploring”. There were also many occasions when the verbs and nouns did not fit the context of the sentence: “we were trolling which means you’re going at a slow speed dragging lines behind you in hopes that official bite and then when one bites you put the boat in neutral and play the fish breed it”. One might wonder what exactly was meant by “play the fish breed it”. The idiom intended in this—play the fish—refers to reeling it in on a fishing rod. The erroneously transcribed “breed it” leaves the reader mystified (among other things) as to what is going on. In another passage, the writing careens toward nonsense: “and then you would get to the bow and then well get pee off about and in calm water was fine”.

Most remarkable is how much is omitted, altered, or replaced in the dictation. The most obvious elements to disappear are the cadence and expression of the oral recording. Gnanagesikan (2013) discusses the loss of “intonation and emotional content” in written language (p. 9). In this text, not only is it challenging to decipher the writer’s tone, there are numerous phrases that only make sense as phonetic approximations of what was spoken. As an experiment, I played the text back using Microsoft’s Read Aloud software. Only then did phrases resemble something that contextually made sense. The phrase, “official bite” sounds like “a fish will bite” phonetically, but makes little sense in the context of the larger phrase. These words, spoken in English are transliterated and transformed at once. A person would likely need to read the passage aloud to identify what was meant; unless it was clear that this was generated by a speech-to-text software, that would not be an obvious action to take.  Given the differences between passages that were accurately transcribed and those that were not, this technology would influence me to change my enunciation, pace, and fluency of speech for the purpose of ensuring accuracy. The text I’m creating with it would likely disrupt my accustomed way of speaking (Schmandt-Besserat, 2009).

It is problematic using speech-to-text software to transcribe what is meant to be oral language. This is because it is not designed to capture a useful transcript of what is said; rather, it is a technology that needs to be instructed. Most fascinating was watching the writing process take place while I played my recording. The voice typing tool would write what had been spoken, only to replace it with a phrase that the software determined to be more ‘correct’—especially when my pace and pronunciation were not ideal for it to transcribe. These approximations of meaning illustrate Haas’ (2013) comments about how writing trades off by decontextualizing on the one hand, and allowing for increased precision and complexity of ideas on the other. My “instructions” would be for a writing process, not an oral one. Stating the necessary punctuation marks, for instance, would alter how I sounded significantly. Had I scripted this story and read it, I would likely be focused primarily on the text being produced on the screen as immediate feedback on the quality of my dictation. Inflection, pace, and tone are irrelevant to the program. As useful as this tool is, it does not presently threaten to take over the practice of typing or even handwriting. Haas also notes (2013) that changes in technology don’t always result in a corresponding change in people, and when it does, that change can sometimes be “paradoxical” (p. 18). When I offer speech-to-text software to students with learning disabilities, for instance, I need to be sure that they become practiced enough in its use if they are to successfully “write” with it.

Oral storytelling is distinct from written because of its dependencies on human memory, skill in delivery, and a familiar audience. Ong et al (2002) observe that oration to the ancient Greeks is a “crafting” or “weaving” of words in a way that will impress an audience that comes with a set of agreed upon expectations. Their idea that it is a form of “apprenticeship” speaks to mastering memory, delivery, and ability to read an audience for feedback (8). Were I to write my narrative, I would draw from what I have studied in other texts to make decisions about the details, their arrangement, and the diction I would use to express each part. I would probably make partial reference to other written texts—perhaps even without realizing it—to make my narrative more engaging. The diction of my writing would transmit my education, cultural background, and my social class. Once written, the story exists as an entity separate from my memory of the events themselves. Conversely, it exists as an oral story only when I tell it. Another family member’s version is not necessarily a reliable substitute or copy. A (crafted) written account would transform family members into characters, the ocean into a setting, and the events into a plot that relies on familiar structures. It may interfere with their memory of the event, or it may invite them to revisit it in a new way (Ong, 2002).

References

Gnanadesikan, A. E. (2011). The First IT Revolution. In The Writing revolution: Cuneiform to the internet (Vol. 25). (pp. 1-10) John Wiley & Sons.

Haas, C. (2013). The Technology Question. In Writing technology: Studies on the materiality of literacy. Routledge. (pp. 3-23).

Ong, W., (2002). Orality and literacy: The technologizing of the word. New York; London: Routledge. https://www.taylorfrancis.com/books/mono/10.4324/9780203426258/orality-literacy-walter-ong

Schmandt-Besserat, D. (2009). “Origins and Forms of Writing.” In Bazerman, C. (Ed.). Handbook of research on writing: History, society, school, individual, text. New York, NY: Routledge.