Task 3: Voice to Text Task

Image source: completemusicupdate.com

I was very intrigued coming into this task. I have done a lot of work in my professional life testing different speech-to-text softwares for their quality and different features (i.e. identifying speakers, multiple languages, etc). So I am well-versed in the use of speech-to-text, but I looked forward to explore in this exercise the questions around oral communication vs written and the effectiveness of speech to text in capturing this nuance.

I will be using the following questions to prompt my reflection:

  1. How does the text deviate from conventions of written English?
  2. What is “wrong” in the text? What is “right”?
  3. What are the most common “mistakes” in the text and why do you consider them “mistakes”?
  4. What if you had “scripted” the story? What difference might that have made?
  5. In what ways does oral storytelling differ from written storytelling?

Here is the story in audio form:

And here is the link to my Otter.Ai Transcript: https://otter.ai/u/Yb5vML19QyAKTVVEBlD0k_2lfCg 

Looking at the speech-to-text version, a few types of errors become immediately apparent. Of course, the accuracy of the text is quite good, however, a percentage of words are misheard or repeated. This ranged from small errors like “I myself, I’m Italian.” (supposed to be” I myself, am Italian,” to more obscure concepts like the Italian “Aperitivo” being misheard for an english translation of “Appa TiVo” and “app in a TiVo”. Similarly, names were sometimes misheard. Occasionally Turin became “Turn”.  My Friend Chandhi’s name became (comically) “Chad knees”. So, while this software was relatively effective, it struggled when it came to concepts and names outside of basic English.

Otter also tries to add appropriate punctuation to the text, which in some cases works quite well, but in other cases actually a little heavy-handed. For instance, we see this sentence: ” I was in my second Co Op work semester, in 2017.” Here, the additional comma actually detracts from the flow of the text and would not be added in written English. Importantly, in a story like this one, quotation marks would be used throughout, but this software did not have this capacity, so the reader is left to guess based on context clues: For example, when I explain the interaction with Ed Sheeran, Otter produced the following: “So I just tapped him on the arm. I said, Hello, I’m a huge fan. I hope you have a really nice night tonight. Can I shake your hand and he shook my hand […]”. Without quotation marks, as we would use in proper written English, the text comes out clunky and unclear.

While the text to speech technology has some aforementioned flaws, especially when it came to transcribing a casual narrative, there were some things that, it got ‘right’. In my oral story, I use filler words like “um” and “uh” quite often. These words were automatically taken out in the text, showing how “smart” this technology is becoming. Surely, if those words had remained in the transcript, it would have made the reading experience much more jumbled.

Overall, I think the clunkiness of the resulting text, has more to do with my style of oral storytelling than it did with errors in transcription. Even in areas where all of the right words were picked up, the text still does not read as a proper written story should. When I speak aloud, I will often jump around chronologically, or repeat things I have said before. For instance, I say a few times how important/ exciting this experience in Italy was for me. And in one case, I interrupt a my story of the evening in question to go back and relay the information that, earlier that day, I had mentioned to a friend that Ed Sheeran was performing a concert in the city. It is not uncommon in oral storytelling to have these small issues of repetition and chronology, whereas, in written storytelling, these issues are generally corrected during the writing process to create a better logical flow. I imagine in this way, my 6 minute story could have been written more clearly and concisely, resulting in a much shorter text. Similarly, if I had scripted the story ahead of recording, I am sure the result would have been more clear and concise.

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *