540 Task 3: Voice to Text Task

Task 3: Voice to Text Task

Platform used: Apple notes.

“For this five minute unscripted speech to text exercise. I’m going to tell a little story about a road trip. This is a road trip that I took with three friends 13 years ago. My best friend from Germany and his friend that was visiting him from Germany. And another friend from Korea, who later became my wife and is now the mother of my children, and we’ve been happily married for the last 12 years we took a road trip from Calgary to Vancouver Island, and for anybody who hasn’t done that. It is a wonderful experience and a highly recommend it. so there’s two instances that are a bit funny but sometimes my wife and I look back and laugh about on this road trip. The first was when my German friend rented the car in Calgary they were trying to set up the Bluetooth through voice activation in the rental car. However the voice recognition software had no idea what these two German guys were saying despite the fact that their English was basically fluent. So these two German guys are cursing this Bluetooth voice recognition thing, and as I was the only native speaker there I had to do it, and it could recognize my speech commands. we all had a good laugh over it. The second thing that happened on the road trip in a somewhat related situation was that we pulled into a Tim Hortons somewhere in a small town very late at night almost midnight. We went in and the only other people in there was, a gaggle of teenagers who were kind of being silly and frustrating. The old lady that was working there. I say old lady in my memory she was like 90, of course she can’t of been 90 but she was probably 70. It seemed like she was too old to be working kind of sad. She probably couldn’t afford to retire, but here she was at maybe 70 years old working the night shift in a Tim Hortons in some little town in the middle of nowhere and dealing with a bunch of teenagers putting in an order. So the four of us went up there and my comment now and my two German friends, whom keep in mind of them are native, English speakers and well I have to say at that time my wife’s English wasn’t that good, and the two German guy had a very strong accent And this old lady was so flustered with them trying to order. I remember her saying to us “are you guys just messing with me? Poor old lady had no idea, probably no exposure to foreigners at that time and just had no idea what they were trying to order. Once again, Mi, as the native English speaker had to step in and put their order in and assure her that no, they weren’t trying to mess with her. It was just really a combination of her being Very old and from a very rural area with little to no exposure to anybody who’s not a native language speaker. And for anybody who’s had that experience you might have noticed that people in the city understand people with accents a lot more than people in the countryside just for the fact that they are exposed to a broader diversity of of accents and maybe just the fact that they’re expecting it, and that they’ve learned from a different data set of Accents and texts and styles of speaking, etc. And I wonder if that same thing happened again now 13 years later if the Bluetooth voice recognition would recognize that German accent, on the very fact that these types of AI learning, language learning models are exposed to such greater data sets. Kind of the way people in a more diverse setting like a city or an urban area are exposed to a more diverse data set that kind of updates their language Model. So that’s kind of a funny story in my relationship with my wife that we look back on. But I thought it kind of related for this purpose and this course in the way speech relates to text in terms of language, Model and the way an accent can give or deny us access. And then in the follow up questions to this story, I’ll be analyzing the difference between text and speech what the difference would be if I wrote this out instead of said it.”

-end of voice to text-

  • How does the text deviate from conventions of written English?

What a wonderful exercise it is to have your extended speech laid bare in front of you transcribed verbatim.  Wonderful that is, once you get passed the embarrassment of it.  The interesting difference is there is no editorial process to our speech.  We usually have one real-time filter from mind to mouth.  Once those sounds escape your lips, there is no do-over.  Therefore, oration too is an art, and one that needs to be practiced.  One aspect lost in story when written in text is emphasis, tone, and cadence.  Orating a story like this I would also impersonate the German accents and do the voice of an old lady.  Oral communication itself differs greatly if we are having a conversation and trying to think about a concept together, ‘thinking aloud’, with that of say telling a story, giving a speech, or giving instruction.  The conciseness varies greatly.  Written language too can vary from story, to instruction but usually is less organic and more polished than speech.

  • What is “wrong” in the text? What is “right”?

There are a few errors that are ‘misheard’ by the voice recognition software as well as incorrect punctuations.  Interestingly, where I took a pause in my speech, the software decided I was just finished my sentence, regardless of whether that made sense or not.  For example:  “and for anybody who hasn’t done that. It is a wonderful experience and a highly recommend it.” I also notice, in that very same sentence there is another error that is all on me as the speaker.  “and a highly recommend it”.  ‘a’ Should clearly read ‘I’.  It is so embarrassing when I read that aloud, I read that as if in a southern US drawl, a highly recommend it.  Quite funny, on one hand the voice recognition makes some mistakes, but on the other hand it reveals things to you about your own speech and pronunciation.  I do not, by the way, have a southern US drawl, but the AI doesn’t lie yawl.

Another mistake of mine, that the voice recognition picked up that I notice in the writing is “of course she can’t of been 90”.  ‘Of been’?  How cringe worthy, of course it should read have been. Again, I can’t blame that on the software, that is lazy speech that I would not have been aware of had I not seen it transcribed. I make different mistakes when I am typing, and again different ones when I am writing in pen.  Most of which I notice and get corrected.  With text however, you don’t often get to see extended transcribed speech like this, it is a very interesting exercise to analyze your own speech.

  • What are the most common “mistakes” in the text and why do you consider them “mistakes”?

When I skim over the whole body of text, I see way too many ‘and’ & ‘so’ to join sentences.  These linking words allow a speaker to stall for a moment to compose the next thought, while at the same time letting the listener know that the speaker has not conceded the speaking space yet.  A practiced and confident orator may feel confident rather to leave a pause, which if done skillfully (reading the audience for the appropriate time) can have a powerful effect.  The same considerations, either way, pose significant challenges to convey in written text.  They either look redundant, for example writing ‘and so’ or are hard to express, for example using the limited ‘…’ for pause.

  • What if you had “scripted” the story? What difference might that have made?

Had I scripted this story I would have put time into using descriptive flowing phrases, while also being more concise.  I would have taken the time to make sure my story had a defined beginning and end.  Perhaps we take writing more seriously because while people listen to each other all the time, people tend to only read by choice what is quite interesting or well written.  We never write something to fill the silence. In writing, there is a need to ‘make up’ for the inability to express through sound and gesture with expressive and accurate language.  It is a fascinating exercise to go back and forth analyzing between writing and oration.  We often read aloud what we have written to see if it ‘sounds good’.  Seldom however do we give ourselves to opportunity to really examine and improve our speech by seeing it transcribed.  I have to admit, if I scripted the story, I might have more difficulty starting, kind of like writers block.  Somehow it seems easier to start telling something without knowing exactly how it will come together.

  • In what ways does oral storytelling differ from written storytelling? 

Oral story telling can be contextual to the audience.  The orator can adjust the story on the fly as they read the audience.  The orator also breathes the life into the story with their own expression of character and style.  The exact same story told by two different people may feel quite different indeed.  If you read a stand-up comic’s script, it may not be that funny without the performer breathing context into it. Publishing a story in text however is like giving birth to a new life.  That written story is now independent of the writer, out in the world to be interpreted at will across the ages.  It is up to a reader to choose if and when to consume it, and how they interpret it.  Reading is a more personal journey than listening.  Speaking and writing are two very different art forms and skill sets.  One of the beautiful things about text is that it will almost certainly long out live the storyteller.  It can be said that to love to read is to journey into time and make friends with the dead.

By Richard Payne

Leave a Reply

Your email address will not be published. Required fields are marked *