Our task this week was to record a five minute oral story using a speech-to-text program and then to analyze the results.
This activity was a different experience for me, normally for an assignment, a presentation, or in preparing for a lesson with my students I have at the very least outlined what I want to say, to make sure that I am covering everything I deem important to get across. To speak unscripted is not uncommon as I have conversations every day, but there is often not the intent to make a single statement that will stand on its own without explanation, elaboration, or the chance for follow up.
I was alone when I recorded this story and used the speech-to-text function of the Notes app on my phone. In keeping with what I believe to be the spirit of the activity intentionally was not looking at the screen to check the accuracy of what was interpreted from my speech, nor was I including the punctuation that I normally use when speech-to-text for text messaging. When I reviewed the resulting text later in the day, I was disappointed in myself because, to me, key pieces of details in the story were missing, the things that add flavour and depth to the events described weren’t there. Had I been writing out the story I would have likely caught those omissions during a review or proofreading and added them in, similarly if this was an actual conversation or I was telling this story to humans, I could judge the need for additional information on the reactions of the others I was conversing with, or the missing details would have come out through questions.
As a teacher I am used to sharing information orally, but like many who have grown up in a literate culture, I have a reliance on written text as a memory aid, like writing a note of ideas that come up in the middle of the night or a voice memo while driving, a basis to store information that may be needed in the future. As I mentioned earlier, I have usually have notes whether on the board, or on my desk when teaching that help to keep me on track, my students can speak to my ability or propensity to go off on tangents and highlight key elements that need to be covered. I also tend to vary my ‘stories’ based on the reactions and responses of my students. I have not undergone the training, repetition, or apprenticeship like that of orators in cultures without a written language which have trained their memories to precisely recount the history and cultural elements with minimal variation time after time and so need notes, or to practice my ‘story’ which I probably should have done in this scenario (Ong, 2002, p.8).
Looking at the generated text, two things immediately stood out to me. One was the length of the text; I would have expected much more of it though I was speaking slower to give the software a better chance of success. The other was the lack of paragraphs – it is all one body of text with many separate thoughts which under the normal conventions of English grammar are seemingly combined into one. Upon closer review while some punctuation exists, in many cases it is missing creating run-on sentences and like the lack of line spacing effectively combining separate thoughts into one. Proper nouns in some cases are capitalized, but not in others. In general, the generated text, to me at least, lacks emphasis and inflection, it seems very flat, something that the use of punctuation and grammar could have alleviated, or in the case of a truly oral story tone, inflection, and body language.
The software missed some words completely, and in many cases has included a different word altogether. As someone who is not trained in phonology the words generated don’t match the phonemes of the words I spoke, for someone who is trained in phonology or has learned English more recently these replaced words may make more sense (Gnanadesikan, 2009, pp. 7-9). I have, what I believe to be, a neutral accent and am a native English speaker, I wonder what the results would be for someone with a strong regional or dialectic accent or someone for whom English is not their first language? It also looks much like the typed work submitted by my students. I had always assumed that the improper grammar and word usage was the result of them using their phone’s keyboards to record their thoughts. Now I wonder how much is generated by speech-to-text? Though, unlike many of my students, if it wasn’t for our instructions not to do so I would have used the generated text as a starting point and then gone through to correct grammar and proper word usage, etc.
It is easy to stand back and point out all the flaws in the software’s attempt to transcribe the spoken words, but to look at the forest instead of the trees it is amazing the progress that has occurred in the last decade even. When I was studying in university any oral information, whether given as a lecture or in a recording, had to be transcribed by hand, there wasn’t software that could do it. To see speech-to-text in such a common device as a cellphone or the live captions of a YouTube or TikTok video is amazing, a literal example of what Ong (2002) has to say about the relationship of oral expression and writing – writing can’t exist without orality (p. 8). Most of the words I spoke were correctly identified and spelled. The software, in places used the correct punctuation and grammar. When complete, the software recognized that there were likely errors and made suggestions to what I might have said instead, suggestions that for the purposes of this assignment I ignored.
In comparing oral and written storytelling there are a few key differences.
In oral storytelling the teller can set the mood and convey emotions with the tone and volume of their voice, also by changing their body language and proximity to the audience if it is an in-person or filmed telling. Changing the pace of their speech can draw in the audience and/or add an element of suspense to the story. The audience is immersed in the experience and can, in some cases, mentally visualize the events of the story better as they don’t have the cognitive load of translating the words on the page into sounds in their head and then again into a mental picture like with a written story (Gnanadesikan, 2009, p. 8; Ong, 2002, p.8). An oral story will only last as long as there are people who remember it and pass it along to the next generation unless it is written down. As we are seeing with the attempts to record indigenous languages around the world. both in an oral and written form, if there are no longer people who speak a language the stories and the elements of culture they contain will be lost as well, at least in part as there is no absolute translation between languages (Gnanadesikan, 2009; Hadley, 2019; SAR School for Advanced Research, 2017).
With written storytelling there is a longevity to stories that a purely oral story does not have, they are preserved on the page/tablet/internet for as long as those mediums and all of their copies exist. The story is also more static as it is not reliant on the memory of the teller to tell it the same way time after time, though different versions and retellings may emerge as time goes on, especially if the story is copied by hand. Elements that represent intonation, emotion, volume and pacing in a written story need to be intentionally included by the author because as Gnanadesikan states, written stories are a recording of language, not speech – much of the information about speech is lost (2009, p.9). Written storytelling can allow for a much more fragmented experience, it gives the reader the flexibility to consume it in small sittings of a few minutes or to binge the whole story in one sitting, something not possible in live oral storytelling. Written storytelling also reduces the need for the reader to remember as many key details as you can always flip back a few pages or chapters to remember a needed piece of information. Whether this reduced need for a trained memory is a good, or bad, thing is very debatable and can have wide reaching impacts on society.
Speech-to-Text Results
I was having a conversation yesterday with one of the ladies in our school office and the topic shifted to today being Friday. I know she was excited because it was payday for our keeping staff. The first teachers payday was last week and I was commenting it didn’t really have a paycheque this week because it came in and went straight back out. I had a tire flat tire on the way to work last week and it was so bad that it could not be repaired and because the spare tire already had damage. It meant that I was buying a new set a tires for my vehicle, I drive a jeep so it meant not for new tires but five new tires and it turned into a whole hassle and that the wrong tires were ordered and then the proper tires had to be ordered and installed for two appointments and two sets of tires later I got the right set of tires for the vehicle and looking at it with the wrong set a tires on it. It really reaffirmed to me just how much I wanted the tires that I wanted, a witch from entering’s and one of the reasons why it caused me to reflect I’m just all the adventures I’ve had with the vehicle and one in particular came to mind where most of my adventures were planned out and I prepared and I had all the proper equipment and all the Safety equipment in the extra parts and tools and etc. but this time I’ve been up for a hike past Whistler and Erin Falls, and on my way back as I was getting back to the vehicle I saw the firetrucks go by I saw an ambulance go by on the highway in the direction I needed to go when I went shoot And Shernoff as I got going, I got the radio on TuneIn to hear that there was an accident logging truck and caught fire inside of the highway and the highway was closed and as I hit the end as I hit the closure, the end the lineup I look to my right into my right was a logging road and forestry road that I had wanted to explore for a couple of years and driven by it Probably 100 times and that one of these days I’m gonna do that road and I decided that was the day that I was gonna do that Road so turn off turn the breaker on turned off the road went up explore the road for about two hours and when I got back Back to the highway the lineup was still there so I rejoin the lineup and I was talking to people in the vehicle came limping out of that road I just saw them pulled over but I hadn’t realize it they Donely made it up a few hundred metres before they got a flat tire and the tires I had the Monterey they had on the truck on the jeep And had no problem so even though people have told me that they’re there no good the toddler expensive they really allow me to have the adventures that I have without the concerns of getting a flat tire and on some of these back roads and the irony of it is an all of my time driving this vehicle thousands of kilometres of gravel and back roads and decommission roads. I’ve never had a flat tire in fact in all the years I drove from BC rail onto those back roads and on the track this was the first tire the first tire I had where it was flat within 10 minutes. It was that bad. I’ve had a couple of other little nails and whatever that Cause the tires need to be patched or pumped up never one word it was this bad and the irony is it wasn’t on the back road. It was on my five minute commute to work on the main road. It’s kind of like the windshields I’ve had to replace on the vehicle that is not the back roads. It’s not the gravel roads That I’ve gotten them I’ve had windshields destroyed. It’s been major highways every single time so when people say the back road is so dangerous all of my major damage is occurred on the main roads and on the main highways
References:
Gnanadesikan, A.E. (2009). The first IT revolution. In The writing revolution: Cuneiform to the Internet (pp. 1-12). John Wiley & Sons. https://onlinelibrary.wiley.com/doi/book/10.1002/9781444304671
Hadley, H. (2019, January 11). New Indigenous language app targets ’21st century’ learners. CBC News. https://www.cbc.ca/news/canada/thunder-bay/indigenous-language-app-1.4970376
SAR School for Advanced Research. (2017, June 7). Lera Boroditsky, how the languages we speak shape the way we think [Video]. YouTube. https://youtu.be/iGuuHwbuQOg
Really fun (and difficult) story to read and follow, Mike. Your thoughts on the experience of speaking without a script and having speech-to-text software transcribe your words are really interesting. It’s not something we do every day, so the challenges are expected. Speaking off the cuff is a different beast, which is perhaps why so many people are terrified of public speaking.
You mentioned how you didn’t check the transcription while speaking, which is something most of us wouldn’t do either. It’s a bit like having a spontaneous conversation where you don’t plan every sentence in advance. But when you looked at the transcribed text later, you felt like some important details were missing. It makes you realize how much we rely on written text to remember things. I’ve told various stories hundreds of times over the years, with each retelling highlighting something else or forgetting an important detail.
You mentioned that you’re not like those orators in cultures without a written language who can precisely recount stories without any notes. They’ve trained their memories to remember every detail. For most of us, notes are a must. I’d like to know if you have a memory or a story that you can basically retell verbatim? I have a few up my sleeve!
When I read and looked at the generated text, I saw the same issues you did. It all ran together, which isn’t how we usually write in English. Punctuation was missing, which made sentences run on. I noticed this in my generated text story as well.
You brought up a great point about accents and non-native English speakers. How would the software perform with strong accents or different languages? It’s something worth exploring. Do you think we will get to a point where the software will be so good that it will perfectly transcribe English even through a thick foreign accent? I think so, but it won’t be for a while.
You also noted that the text resembled what your students sometimes submit. This is such a sad commentary on the state of education! What I read from students tends to include run-on sentences and multiple strands of thought.
Nice comparison of oral and written storytelling. I agree that oral storytelling is incredibly powerful, and can convey emotion through tone, body language, and interaction with the audience. Written storytelling has permanence but relies on the author to express emotions and pacing explicitly. Quality versus longevity.
Overall, your reflection offers a deep dive into the intersection of spoken and written communication, the influence of technology, and the unique aspects of each storytelling medium. I’m encouraged to think about how technology is changing our traditions of storytelling and language preservation. Thanks for the entertaining read, Mike.
Simon,
Thank you for your reply and the extending questions.
You are absolutely right, there are stories that I have told time and again and like you they vary slightly from time to time, sometimes with more detail sometimes with less sometimes the story is longer, sometimes shorter, but the essence of the story is the same.
I have a few stories that I can retell verbatim time after time, as teaser… one involves the time I was being stalked by a grizzly bear in northern BC while working for BC Rail.
As to recognizing people with stronger non-Western North American English accents I can’t help but think of the AI models for facial recognition that were trained using data sets of white male faces and had issues identifying women and people of colour. In some cases they couldn’t identify them as human even!! I wonder about the data sets that were used to train the voice recognition software? I think you are correct that we will reach a point where voice-to-speech software will be able to properly recognize English and other languages without errors, but that it will not be for a while yet.
I want to be clear that I am not opposed to my students using speech-to-text software in their work, my issue with them using it, or any other digital tool to complete their work is that they aren’t using the software in the way it is designed. I used to be amazed that students didn’t understand what the red or blue underlining of a word or section of their work meant. They assumed that the software was autocorrecting for them anything that was wrong. These tools are very power and offer a great advantage that didn’t exist when I was in high school to not only spellcheck but also grammar check their work and offer suggestions on how to improve their writing, the “Editor” function in the latest version of word has been a great asset for me in the MET program in helping to improve my communication of my thoughts – much like Dr. Pena mentioned in the Zoom session this morning with international students using ChatGPT to improve the readability of their writing. What is missing is that students, mine at least, are not taking the time to go back over what they have written to see if what is one the screen is what they meant to say, and that to use the tools to improve it.
Hi Mike!
It was lovely to hear from you in my voice-to-text submission and to be in another course with you. Actually, we had some discussions in our last course together about some covid-related topics that have stuck with me and I have never forgotten our conversations. You really made me appreciate a few things at the time that I hadn’t been particularly appreciating and I wanted to say thank you for that!
I enjoyed reading your story despite it being slightly difficult to understand in parts, however it reminded me so much of my own and I could certainly relate to many of the things you mention in your follow-up notes. It sounds like a really rough day and I can definitely empathise there! It does seem easier to remember (sometimes verbatim) the more difficult days rather than the good or non-challenging ones. For some reason, bad experiences are often easier to remember and stick with us for longer. I wonder why this is? I too felt disappointed when I read through my script because to me, it sounded spacey and not that intelligent. It almost reminded me of a script from the 90’s movie Clueless! And I wondered, do I sound like this in real life when I speak? I can also agree wholeheartedly that the lack of paragraphs was a point of frustration and I know the reasons that I appreciate carefully organised paragraphs are that it is it easier cognitively and visually to follow a format with spacing and a clear start and ending point for each idea, much like when we use bullet points in a visual presentation, large sections of pure text can be quite tiring for both the brain and the eyes (forgive this very long winded sentence btw!).
Another interesting point that you mention is that we are not trained orators nor have we grown-up in traditional storytelling cultures that teach and train us to remember things without notes or prompts. As someone who relies heavily on notes, memo’s, sticky notes, you name it! I often wonder if I have lived in a pre-text age, if I would have a sharper memory and/or ability to remember stories or even songs (Twinkle, twinkle is about the best I can do (in several languages but really not that impressive considering its only about 25 easy words and everything rhymes). This applies to numbers as well, I wonder if I would be better with them if I had grown up in a time without calculators (my guess is no!)
Your story did also make me very nostalgic for Canada, the mountains and having a jeep. All of which I used to have in my pre-Japan life in Canada! Here, I do not have a license or a Jeep and listening to your day, albeit very frustrating for you I can imagine, makes me miss Canada and all the wonderful nature, great outdoors and freedom to explore that you have. I guess with all that wonderful privilege, we just have to take the bad days as they come and try to appreciate the days when there are no major disasters/frustrations. As far as voice-to-text technology goes, there is still obviously a lot of room for improvement but perhaps we can use it as a tool to start training our brains to remember things better. Thank you for sharing your story with us and for allowing us a peak into your outdoor adventures!