The following is my voice to text story. I used my iPhone, opened a text to myself and used the microphone button to record it:
So for this activity, I had a bit of a challenging time trying to think about what to dictate to my phone, but I figured I’d share a bit about a festival that my partner and I are volunteering for next weekend. It’s called Heart Of The City arts and music festival and it gives an opportunity to intercity artists to showcase their arts on a large stage for some of them very first time there will be musicians of all types. Visual artists who are also doing workshops. And there’s also a beat poetry stage, which I usually like hanging out around The festival takes place in Giovanni Goto Park, which is in the Mick neighbourhood and Edmonton. It’s generally a rougher area of town but for this weekend it brightens up and is filled with festivities that all people are welcome to enjoy. There’s also opportunities for people in the neighbourhood to be vendors selling their wears art writing and creative expressions. This will be the festivals 20th year which is really cool to think about.
We’ve been volunteering with this festival since about 2014 and it’s really need to see how it’s evolved grown and changed over the years in previous years. I helped with the art direction and organization of the art tent and that was a really fulfilling position. I remember that year we had an indigenous beading workshop we made peace flags did record painting and made scenes. We had a local Macaulay resident teach yoga in the field during the music Acts and we also had an artist from the neighbourhood who makes large scale sculptures create a wire frame that community members would come and add paper mesh to. He ended up creating a large dragon piece that was about 8 to 10 feet long. It was so neat to see all the different people come along, and add their own piece to the dragon sculpture and eventually by the end of the festival, everybody was adding their own paint to the skin of the dragon each year. There’s a different theme. This year‘s theme is reboot and it should be interesting to see what kind of interpretations there are of that in the music art and poetry that takes the stage.
This task was interesting for me. I generally do not use voice to text technologies, so this felt a little awkward to do. The first thing I noticed is that, whenever there was a natural pause in my speech, sometimes the technology would note it as the end of a sentence, and sometimes it wouldn’t. It would sometimes incorrectly indicate the end of a sentence based on the words I used, or maybe based on where I took a breath in my speech. For instance, in this passage, the end of the thought should have been at the end of the word dragon: “It was so neat to see all the different people come along, and add their own piece to the dragon sculpture and eventually by the end of the festival, everybody was adding their own paint to the skin of the dragon each year. There’s a different theme.” In all honesty, there were awkward pauses as I completed this assignment because was a little distracted watching the text appear as I spoke.
The text deviates from written English in a few ways. Many of the proper names of places were misspelled, and this likely has to do with the way I pronounce them (or the fact that the voice to text technology has not heard these terms before). For instance, Giovanni Goto Park should be Giovanni Caboto Park. Mick/Macaulay is a neighbourhood in Edmonton, and it should be McCauley. If another local person was listening to me talk about this festival, I don’t think there would be any question about which park or neighbourhood I was talking about. However, having it captured in text incorrectly, due to my pronunciation, could misdirect and confuse a reader who’s unfamiliar with these places.
Another way the text deviates from written English is in the punctuation (or lack thereof). What one might identify as a ‘run on sentence’ in the written form is far more acceptable and understood in oral speech. I notice that the text to speech functionality missed some periods and commas, but still capitalized a word because it is usually used at the beginning of a sentence. For instance, in this passage: “And there’s also a beat poetry stage, which I usually like hanging out around The festival takes place in Giovanni Goto Park, which is in the Mick neighbourhood and Edmonton.” This could have been related to the cadence of my speech—perhaps I paused less between the end of the first thought and the beginning of the second thought.
A few words were captured wrong, such as intercity instead of inner city, scenes instead of zines, and need instead of neat. Again, this likely has to do with the clarity of my pronunciation and the ability of the technology to identify and transpose words that are perhaps less commonly used in particular contexts or orders. But it underscores the differences in expression in aural texts versus written texts based on each individual’s linguistic qualities.
A scripted story would have probably followed more of a sequence. There would be a beginning, middle and end. Unscripted stories allow the speaker to jump around in time, follow intuition and interest, and spend more time on particular details. It would also include more formalized writing, and have less informal “ums” and “uhs” that people naturally interject into their stream of consciousness speech.
Oral storytelling allows for more personality, emotion, and animation by the individual telling it. An oral story is dynamic and may not be exactly the same each time it is told. It can also be influenced by the listeners (in the live environment). The listeners may ask questions, or have reactions to the story, which can impact the way the story is told. An oral story’s effectiveness depends a lot on the person who is telling it. Written storytelling is a static capture of what the author wishes to convey. It’s sequential, and lacks animation beyond the words written on the page. Each time it’s read, and regardless of who is reading it, it’s exactly the same. A written story has an existence beyond the person who wrote it, and lives well beyond the life of that person.
In oral storytelling, dramatic pauses, cadence of speech, emphasis on words, and the emotive quality of a voice is integral to how a story is received. This can be sometimes captured in text by using formatting. I am thinking of poetry that uses font style changes, spacing on the page, and bold or italicized words. However, text really cannot capture the nuances of a poem being spoken on a slam poetry stage; intonation, volume, rhythm, and even facial expressions that express the irony of a line in the poem are lost if only experiencing the poem in the written form.
Here is a great example of how spoken word lands differently than simply reading text. Kae Tempest, a notable poet/musician who I admire, recites a poem called “Getting On”. Listening to them speak their words, I truly feel something would be missing if one were to simply read this poem from a page in a book.
View this post on Instagram
Ong (1982, as cited in Haas, 2013) states that “sightbased text (written or printed texts) fosters contemplation, analysis and critique, whereas the sound-based temporal world of speech is totalizing; it pours into and envelopes the listener”. I might argue that with the advent of social media/video technologies, we are getting the best of both of these worlds. A recorded text like the one above incites contemplation and analysis, while also pouring into the listener/viewer.
Haas, C. (2013). The technology question. In Writing technology: Studies on the materiality of literacy (pp. 3-23). Routledge.
Tempest, K [@kaetempest]. (2023, April 23). Divisible by Itself and One is out today! [Kae Tempest reciting Getting On]. Instagram. https://www.instagram.com/reel/CriSadjImGA/?igsh=MXl0OXRxM3dyd3Jy
Lachelle, your experience with voice-to-text technology and its limitations has commonalities with my own. I also found that natural pauses in speech often led to misplaced punctuation and sentence endings (in my case it didn’t add any periods), which disrupted the flow of my narrative. For example, when my students informed me about the kindergarten students, the continuous dialogue was transcribed as one long, uninterrupted sentence, similar to your experience with the dragon sculpture passage. This issue highlights how voice-to-text software struggles to accurately capture the nuances of spoken language, such as where sentences should logically begin and end based on context rather than mere pauses or breaths.
Moreover, the misinterpretation of proper names and specific terms is another area where our experiences align. In my story, the software occasionally struggled with the context of the names and actions, leading to potential confusion for readers unfamiliar with the situation. This mirrors your mention of Giovanni Caboto Park being miswritten as Giovanni Goto Park, which can mislead someone who doesn’t have local knowledge.
Thanks for sharing your experience!
Hi Joti,
Thanks for your comments! I read your story, and I also couldn’t help but chuckle about the visual of freeing a kid from his jacket, zipped around the pole! Kids just do hilarious things!
In reading your story, there is almost no punctuation! It’s wild that Microsoft Word’s text to speech technology seems designed to capture sentences in this way. Perhaps this is because a word document’s inherent use is to create and revise drafts, whereas a messaging software like iMessage or Whatsapp is intended to send a message to a reader right away. Just speculating as to why this is the case, but regardless, you’re definitely spot on to suggest the improvements needed in these softwares to accurately capture the nuance of speech. We definitely had a similar experience that way.
Thanks for sharing your thoughts!
Hi Lachelle,
Thanks for a great post. I think that you have echoed what many of us noted in our voice-to-text experience; no punctuation and incorrect spelling.
Thank you for sharing the poem “Getting On”, by Kae Tempest. As you noted it “is a great example of how spoken word lands differently than simply reading text.”
This got me thinking about other examples of the spoken word that is popular, TED Talks, which highlights your point “dramatic pauses, cadence of speech, emphasis on words, and the emotive quality of a voice is integral to how a story is received.”
Here are two of my favourite TED talks.
~ “The danger of a single story by Chimanmanda Ngozi Adichie https://www.ted.com/talks/chimamanda_ngozi_adichie_the_danger_of_a_single_story?language=en
~ “Every kid needs a champion” by Rita Pierson (https://www.youtube.com/watch?v=SFnMTHhKdkw)