Link 1

Graeme Baerg’s Task 3 (Voice to Text)

Task 3: Voice to text

Below I go delve a bit deeper into reimagining task 3, but first I wanted to make mention of Graeme’s blog layout. The website is incredibly easy to navigate and I really enjoyed that there were three main pages to look through: main tasks, final project, and linking assignment. The simplicity is calming and I found the font choice really satisfying to read. I think that I sometimes get lost in adding pictures or making things stimulating, but there is definitely a beauty in simplicity that I think I should embrace.

For this linking assignment, I wanted to take on this task once again, but with a little spin. Below you will find two recordings and two scripts. All of these have the same intention – to share my thoughts on my first link in this linking assignment.

There is no need to listen to and read everything, as they all have the same purpose. This was more to see the variety, and to put further thought into this task (the first audio is quite lengthy and maybe not the most coherent as I was expressing my thoughts using bullet points as a guide and not a full text – I struggle with this). At the bottom of this link, you will find some final thoughts.

First, you will find a recording is of me expressing myself orally, with no script, but some bullet points.
Second, there is a recording, but it was made with the purpose of voice-to-text turning it into a coherent text. I am reading this from a script, but I have not scripted in punctuation – I am trying to remember to do this as I go along.
Third, the text that has come from the voice-to-text recording. I used Microsoft Word, the same voice-to-text software that I used for my original task and one of the ones that Graeme used.
Fourth, my written text for this assignment.

1. Recording 1:

I could not upload this recording, so I am sharing it via my Google Drive.

2. Recording 2:

3. Speech-to-text:

I decided to look more closely at Graham’s post as they used both Microsoft Word and Google documents apostrophe speech to text software. For my task, I had only used Microsoft Word, but by using two different technologies for the same task, Graham was able to see the strengths and weaknesses of both. By seeing both texts side by side while listening to the recording, I was able to see the mistakes they both made. Surprisingly, both Microsoft Word and Google documents detected the same mistakes, but wrote down different interpretations.

One of the most visible challenges was the lack of punctuation. Grand stated how “while it is possible to identify sentence and phrase structures in the writing, there are instances where it isn’t clear where to punctuate.” this is a challenge when it comes to interpreting at times, as we need to “provide visual rather than auditory cues for information structure in written English” [more, 2016, P. One]. Graham also noted the lack of consistency with the tenses and, when the interpretation is incorrect, the text “careens toward nonsense”.

In my task, I had said that, when reading a text, the author must select words carefully and choose their punctuation wisely seeing as they are not able to use their voice to show the changes in emotion or energy. Graham also notice that “most obvious elements to disappear are the cadence and expression of the aural recording.” intonation and emotion are mostly last in written language [can an I guess he can, 2013].

When speaking with the intent of having a speech to text software write down the text, it feels quite unnatural. Graham shares how we need to change the way we speak, like our renunciations, the pace, and the fluency of our speech open brackets schmandt besserat, 2009]. This is not something we tend to focus on when communicating orally. Once we see our text written hour after using text to speech, we automatically see where we need to enunciate better, add in punctuation, etc. We can make our talk sound more like writing only after we see writingopen brackets Abe abboud, 2014 close brackets.

Finally, Graham mentions how, when giving this software to students with physical or learning difficulties, one would need to consider all these complications. During my first task, I miss the linking speech to text to its potential uses. Those that thrive in spoken word might struggle in writing, so speech to text holds a lot of potential. However, it would perhaps be too challenging to use as one has to think clearly about all these extra challenges like correcting misinterpretations and including punctuation. I also missed linking this back to all the oral traditions we may have lost in the transcription to written text. Oral storytelling gets disrupted when being turned into writing. Humans may be able to distinguish meaning and pick out mistakes, but there is still space for human error to change the original intonation and meaning of a story.

4. Written link:

I decided to look more closely at Graeme’s post as they used both Microsoft Word and Google Documents’ speech-to-text software. For my task, I had only used Microsoft Word, but by using two different technologies for the same task, Graeme was able to see the strengths and weaknesses of both. By seeing both texts side-by-side while listening to the recording, I was able to see the mistakes they both made. Surprisingly, both Microsoft Word and Google Documents detected the same mistakes, but wrote down different interpretations.

One of the most visible challenges was the lack of punctuation. Graeme stated how “While it is possible to identify sentence and phrase structures in the writing, there are instances where it isn’t clear where to punctuate.” This is a challenge when it comes to interpreting at times, as we need to “provide visual rather than auditory cues for information structure in written English” (Moore, 2016, p. 1). Graeme also noted the lack of consistency with the tenses and, when the interpretation is incorrect, the text “careens toward nonsense.”

In my task, I had said that, when reading a text, the author must select words carefully and chose their punctuation wisely seeing as they are not able to use their voice to show the changes in emotion or energy. Graeme also noticed that “most obvious elements to disappear are the cadence and expression of the oral recording.” Intonation and emotion are mostly lost in written language (Gnanagesikan, 2013).

When speaking with the intent of having a speech-to-text software write down the text, it feels quite unnatural. Graeme shares how we need to change the way we speak, like our enunciation, the pace, and the fluency of our speech (Schmandt-Besserat, 2009). This is not something we tend to focus on when communicating orally. Once we see our text written after using text-to-speech, we automatically see where we need to enunciate better, add in punctuation, etc. We can make our talk sound more like writing only after we see writing (Abe Aboud, 2014).

Finally, Graeme mentions how, when giving this software to students with physical or learning difficulties, one would need to consider all these complications. During my first task, I missed linking speech-to-text to its potential uses. Those that thrive in spoken word might struggle in writing, so speech-to-text holds a lot of potential. However, it would perhaps be too challenging to use as one has to think clearly about all these extra challenges like correcting misinterpretations and including punctuation. I also missed linking this back to all the oral traditions we may have lost in the transcription to written text. Oral storytelling gets disrupted when being turned into writing. Humans may be able to distinguish meaning and pick out mistakes, but there is still space for human error to change the original intonation and meaning of a story.

Final Thoughts:

I enjoyed creating this first part of my linking assignment. Originally, I wanted to simply write it out, but then I thought that I could perhaps create an audio in the spirit of changing up my delivery. From that and after reflecting on Graeme’s task, I felt that it would be interesting to see the difference between something shared orally with the intention of having an audience listening to the recording, and an audio recorded with the purpose of having it read as a written text. The result is as mentioned in my linking assignment seen above. The intent of how your text will be taken in (listened to or read) completely changes the tone of the delivery and the quality of the text itself. My first audio is quite relaxed, shows emotions, rhythm, tone, and mistakes. It’s not very concise and I lose my train of thought a few times. The second seems almost robotic. I am trying hard to pronounce words correctly, include punctuation, etc. All of this without the added complications of my original task where I included French names. The first text includes some mistakes, as with my and Graeme’s original tasks, even though it was done with care. Finally, the written text, although including very similar information, has a very different feel than the first recording. I was able to concisely share the information I wanted to share as I was able to edit it before including it in this post. It is formal and no emotion is shared. Without adding a lot of extra text, it would be very difficult to replicate the emotions and intonations from the first recording.

My very final thought is something I want to touch base on, especially after the feedback I was given from my original task. The whole changeover from oral tradition to written text seems more understandable to me after trying these out. Granted, voice-to-text uses a computer and my story or this assignment is not the same as a story including traditions, history or legends. However, I can see how my version of oral storytelling was completely changed and fully disrupted once I knew that the content would be received via legible text. How would you transcribe movement, dramatic pauses, intonation changes, noise levels, etc.? Even though a human may have detected more mistakes that a speech-to-text software, we must also leave room for human error, which leads us to wonder exactly how much was taken away when transitioning from oral storytelling to written stories.

References:

Abe Aboud (2014, September 8). Walter Ong – Oral cultures and early writing. [Video]. YouTube. https://www.youtube.com/watch?v=uvF30zFImuo

Gnanadesikan, A. E. (2011). The First IT Revolution. In The writing revolution: Cuneiform to the internet (pp. 1-10). Wiley-Blackwell.

Moore, N. (2016). What’s the point? The role of punctuation in realising information structure in written English. Functional Linguist, 3(6). https://doi.org/10.1186/s40554-016-0029-x

Schmandt-Besserat, D., & Erard, M. (2009). Origins and Forms of Writing. In Bazerman, C. (Ed.). Handbook of research on writing: History, society, school, individual, text (pp. 7-26). Routledge. https://doi.org/10.4324/9781410616470

ETEC 540

Just another UBC Blogs site