Task 3 – Voice-to-Text

Voice-to-text

I used https://dictation.io/speech

for this assignment I’m going to talk about the day I had yesterday I went to a zumba jam it was the first in person Zumba March 2020 or before that ever since the pandemic started they have been virtual and so it was really nice to get back in person Sam is a kind of Workshop or training for Zumba instructors ego and learn choreography that’s created by other Zumba instructors so it’s three hours we start with a warm-up the instructor who we call a Zumba Jammer basically leaves us through four or five songs that they’ve choreographed get down into chunks we learn each chunk and we repeat repeat and then put it all together for the song and we move on to the next song and then at the end we kind of repeat all the songs and finish with a cool down so it’s three hours of exercise of our body but also our mind as we’re learning new choreography and new moves sometimes be back in person as I said it’s been quite some time so faces I hadn’t seen in awhile and you know we did discuss the difference between being in person and being being virtually no people can have people are muted so you know when the answer to says how’s everyone doing and yes but they can’t hear so was nice to definitely be back in person we did have to follow some email guidelines as as for our local decisions in regulation but overall it was a it was a great day and you know I learned five new choreographies to bring back to my own Zumba classes and I’ll probably put in one or two new ones this week and save the others for later on I don’t like to put too many new songs in all at once because it’s learn all the new songs if I put them all in at the same time so I like to find the balance between putting a new songs but keeping the old ones so that people bored but they also can feel sick and yesterday I went out for ice cream and got myself my favourite that I always order which is ice cream dipped in chocolate and then it has little pretzels and caramel sauce drizzled all over it so it’s quite decadent but I figured after 3 hours of working out I deserved a little bit of extravagant yesterday

 

Analysis

When I look at the text created from this speech-to-text activity, I cringe a little. It is challenging for me to leave the text untouched, full as it is with mistakes. I think part of the reason I feel this way is because of the semi-permanence of this text that now exists and is attached to my name. Haas (2013) and Gnanadesikan (2011) both wrote that writing has the ability to endure through time and space. What I actually said it now lost, all that remains is this poorly rendered transcription. If a person were to look at the text above with not other content and not read this explanation, they would likely question my intelligence or ability.

The most obvious error is the complete lack of punctuation. From a quick glance and without even reading any words, one can easily spot that there are no periods and very few capital letters. The capital letters that are present do not mark sentence beginnings. Instead, they identify proper nouns. I find it interesting that the program realized that Zumba was a word that should be capitalized. I also find it kind of random that ‘workshop’ was capitalized, and I cannot seem to understand why that happened.

Beyond the lack of the punctuation, there are also a number of words that are incorrect. The program seems to have misinterpreted what I said and has included a number of words I did not speak. Whether this is due to me speaking unclearly or a failing in the program, I am unsure. Either way, I imagine that a person reading this text will struggle to understand some of what is now written.

Creating this voice-to-text assignment was interesting because I felt myself being analytical right from the get-go. In some ways, knowing that I was going to be looking more deeply at the text that was created influenced my speaking by distracting me from the topic about which I was speaking. I distinctly remember two thoughts that came to me as I was speaking. First, I recognized how many times I said ‘um.’ The second distraction came from seeing the words, especially the incorrect ones, appear on the screen after I spoke. I noticed that sometimes these incorrect words were fixed as I believe the program tried to make more sense of what was being recorded based on the other words I spoke. These distractions caused me to pause while I was speaking but those pauses do not appear in the text, neither do the multiple ‘um’s I muttered.

The missing elements of the pauses and ‘um’s is significant because it exemplifies some of what is lost when speech is converted to text. Gnanadesikan (2011) identified this failing of writing, “it does not record the identifying details of any individual utterance of those words. It records language, but not actual speech” (p. 9). When I think of speech as it is recorded in novels, the author imbues the speech with emotion by adding small details about how the character spoke. These details act as clues that help the reader better understand the tone or emotion of the character. Adverbs such as quietly and excitedly or verbs such as whisper and scream, provide readers with important insight that is lost when verbal speech is recorded by writing. Without this additional information, my speech-to-text creation lacks emotion. Additionally, there is no punctuation that would help a reader gain a better understanding of me as a writer. Had I typed this text rather than spoken it, I would have thought carefully about my use of punctuation which would provide a reader with more information.

Had I recorded this story or told spoken it aloud to an audience, there would have been a lot more information available to the consumers of this narrative. As it stands, the text lacks basic elements like punctuation and sentences but is also misses some of the more personal components like feelings. For a reader of this text, they have no clues as to the speed with which I spoke or the emotion I held in my voice. While these elements can be added in by skillful authors, it stands to reason that the process of transforming speech to text dilutes some of the power of our words. While writing has the advantage of communicating through time and space, spoken word has the advantage of emotion and passion.

 

References

Gnanadesikan, A. E. (2011).“The First IT Revolution.” In The writing revolution: Cuneiform to the internetLinks to an external site. (Vol. 25). John Wiley & Sons (pp. 1-10).

Haas, C. (2013). “The Technology Question.” In Writing technology: Studies on the materiality of literacyLinks to an external site.. Routledge. (pp. 3-23).

 

Leave a Reply

Your email address will not be published. Required fields are marked *