Task 3 – Playing with speech-to-text technology

Below is a five minute unscripted story of my family first processing the COVID -19 pandemic as captured by speech-to-text technology in the notes application of my iPhone. I’ve gone ahead and annotated one of the texts in order to classify the ways in which the text deviates from written conventions of English (scroll to the end to compare to the second text). I’ve sorted the differences into two categories: 1) Errors of the speech-to-text technology (represented in green) and 2) differences in spontaneous speech to written text (represented in purple). As an additional exercise I attempted to retell the same story, but using the speech-to-text technology on my Microsoft Surface. Telling the story twice was interesting for two reasons: 1) It was interesting to see if one speech-to-text application was more accurate than the other, and 2) It was interesting to see the way my story changed;  the details I did or didn’t include on the second retelling, even though I told the story back to back.

Unscripted oral story captured by iPhone speech-to-text technology

Both of the text-to-speech technologies differed similarly from the writing conventions of written English. The technologies didn’t make the same exact errors, but made the same types of errors. Overall, the text reads like a stream of consciousness and I think this is both due to how spontaneous speech occurs and to errors in speech-to-text technology, for that reason I did not ascribe lack of punctuation or lack of written structure specifically to either category. It is difficult to determine what the most common errors are as I don’t have an exact copy of my spontaneous story and therefore do not know precisely the nature of all the errors. Because there is a 24 hour lag from when I used the technology to when I began to review the technology,  I can not recall what may be missing from my story or what the speech-to-text technology has added. The most salient errors appear to be recognition errors. Interestingly, as Gnanadesikan (2009) notes, writing was invented to solve the problems of memory. Had my story been scripted and written down, I’d have an exact copy of what I said to compare the speech-to-text text, instead I’m relying on my own memory which is susceptible to error (Shaw, 2016).

I’ve never used speech-to-text technology before and have never spent time considering it. What has become very apparent to me is how poor it is at a means of a stand alone form of communication. In absence of an accompanying voice recording, we lose sense of tone, pacing, volume, and other conventions of storytelling. In absence of video recording we further lose gesture and facial expressions. In absence of a written script, we lose punctuation and flow. The rich details of communication are lost if we rely only on the raw transcriptions of speech-to-text technology.

If we’re just using the speech-to-text technology to analyze orality, which I think is the point of this task, the speech-to-text text reveals to us the spontaneous nature of orality which, in my case, would differ quite significantly from my written text. Specifically my written text would omit instances of repeated words, self corrections, filler words, and informal pronunciations/spellings. Likewise, I would have included much more detail in my written work. As Boroditsky (SAR, 2017) notes, when we speak we are only conveying a small portion of information and hoping that the listener fills in the gaps. I would say that this is true for my spontaneous oral story. I’ve only conveyed a small portion of information in my oral story, whereas I would have been more descriptive in my writing.  In chapter one of Orality and Literacy: The technologizing of the word, Ong (2002) explains that as the technology of writing progressed it went from transcription of oral speeches to being produced specifically as written text. In other words, it went from being a method to record an instance of oral utterance to being used strictly to produce an idea in written form. The act of using language to produce a piece of written text changes the way we use language compared to using it to tell an oral story. To return to the work of Gnanadesikan (2009), she states, writing is a time machine, and to that end  I can copy and paste my writing exactly ad infinitum. However, having used the speech-to-text technology twice by retelling the same story, I can see that oral transmission of information differs from one telling to the next. Perhaps this quality of orality could be positioned as a problem that needs to be solved, yet this might also be a benefit of orality. Iseke and Moore (2011) discuss the transformations of oral stories to video, specifically in the context of Indigenous elders and traditional storytelling. Though their discussion focuses on video recordings, I think it can be extended to the idea of transforming a story from oral tradition to written form. According to Iseke and Moore (2011), when we restrict a story to video or text we lose out on the ability to modulate the story for the needs of our audience. Children, for example, may require a different version of a story compared to adults. Recording a story necessitates a more generic version of a story and eliminates nuance and complexity that comes from being able to make adjustments for a variety of audiences (Iseke and Moore, 2011). Indigenous storytellers have developed the skills needed to determine what an audience knows and to help decide context for their stories, this is quite distinct from what happens when we freeze a story in time by recording it.

Orality, compared to the written word, differs in quite numerous ways, from the language we adopt, to the formality of the story we tell, to the details we choose to share and everything in between. This is only a small discussion of those differences and certainly there is more to be said.  I’d be interested in knowing about the oral traditions in other families, the stories we pass down from generation to generation, and the needs we have to preserve them. As I write this in the bathroom while my 5 year old takes a bath, she is asking me to tell the story of when she was born. I’ve told this story to her before and she loves listening to it. I’ve never written it down.

Unscripted oral story captured by Microsoft Surface speech-to-text technology

References

Gnanadesikan, A. E. (2011).The first IT revolution. In The writing revolution: Cuneiform to the internet (Vol. 25, pp.1-12). John Wiley & Sons.

Iseke, J., & Moore, S. (2011). Community-based indigenous digital storytelling with elders and youth. American Indian Culture and Research Journal, 35(4), 19-38. doi:10.17953/aicr.35.4.4588445552858866

Ong, W. J. (2002). The orality of language. In Orality and literacy : The technologizing of the word (pp. 5-15). Routledge.

SAR School for Advanced Research. (2017, June 7). Lera Boroditsky, How the languages we speak shape the ways we think [Video]. Youtube. https://www.youtube.com/watch?v=iGuuHwbuQOg&t=516s

Shaw, J. 2016, August 8. What experts wish you knew about false memories. Scientific American. https://blogs.scientificamerican.com/mind-guest-blog/what-experts-wish-you-knew-about-false-memories/

 

Standard

2 thoughts on “Task 3 – Playing with speech-to-text technology

  1. marwa kotb says:

    Deirdre, you raise very thoughtful ideas in your post. I like the statement, “the act of using language to produce a piece of written text changes the way we use language compared to using it to tell an oral story,” because I also thought that the experience would be very different if I weren’t keeping an eye on the software outcome. I might have been more at ease with my pronunciation; I might have sounded more natural. Clearly, I was cautious as I was narrating to make the tool recognize the words; that’s not how I act when telling a story. I appreciate that you took this exercise further and experimented with different tools. I think that some tools may be attuned to a variety of dialects more than the others; perhaps this might help in my case.

    • DeirdreDagar says:

      I hadn’t thought about accents or dialects. Is there speech-to-text technology that is specific to different languages? It would make sense if there was. Did you try using the technology in different languages?

Leave a Reply

Your email address will not be published. Required fields are marked *