Task 3: Voice to Text
Transcript:
Let me tell you the story of the best night of my life. It all began with my love of historical fashion the extreme desire to attend a masquerade ball like Marie Antoinette. I started researching if any parties or masquerade balls like this existed in Europe, at the time I was living in London. I found that palace of Versailles does indeed host a masquerade ball Ella Marie Antoinette once a year on the third Saturday of June. I then set out to attend this ball in the grandest fashion I could afford. I found my gown on Etsy and ordered one to buy rather than to rent because I knew that this would be something I wanted to keep forever. The Gown arrived and it was perfect so then I booked a weekend in Paris with my best friend. We did research to find a hotel that would let us stay as close as possible to the Palace and so we booked a room at the Hilton which is actually on the premises of versailles. The room was incredible with a balcony that overlooks the grounds of Versailles and we I spent the whole day getting ready. In the morning we recreated a scene from Sophie a couple has film Marie Antoinette eye spying an enormous array of pastries and Delicacies to put out on a table so we could enjoy these delicious all dressed up in. I did the hairstyles for both myself and my friend which took about an hour and a half each. They turned out incredible. After the hair then we did some. Whip makeup and once this was finished it took about half an hour each to get ourselves into the gowns. We of course had appropriate undergarments including corsets stalking and pam yeas. Once we were all dressed up we enjoyed the pastries, add a mini photo shoot, and then at sunset. To The Gardens of Versailles for a fireworks display. After the fireworks were over the masquerade ball began at midnight. It was magical and everything I hoped it could be.
Questions to Consider:
- How does the text deviate from conventions of written English?
- What is “wrong” in the text? What is “right”?
- What are the most common “mistakes” in the text and why do you consider them “mistakes”?
- What if you had “scripted” the story? What difference might that have made?
- In what ways does oral storytelling differ from written storytelling?
My Response:
Having used Siri to reply to text messages while driving quite often, I knew that this task would not feel like I was simply telling a friend an anecdote about my life. I chose to tell a ‘story’ about one of my favorite life experiences – the first time I attended an elaborate masquerade ball in France (that I have since attended several times). This experience is so burned into my memory that I knew I could talk unscripted for far longer than 5 minutes about it, which I thought would make the task easier for me in a sense. However, even going into it knowing how stilted my speech would be in order for the voice-to-text to ‘work’ properly, it was even worse than I was prepared for.
All the magic and excitement that I normally have while sharing this story was completely devoid from the final transcript. In order to ensure that the software ‘heard’ me correctly, I spoke very slowly and over-enunciated as much as I could. There was no natural cadence or rhythm to my speech, no excitement, and no variation in my intonations. I feel that this comes across in the transcript, as it reads more like an emotionless list of statements rather than what its authentic telling would have conveyed.
Where the transcript deviates:
The first half of the transcript is fairly accurate to what was being transcribed, but the latter half has several mistakes – a few of them which completely obscure the original meaning. Some of the words I said simply did not get picked up correctly by the software, such as my reference to the film Marie Antoinette by director Sophia Coppola or the word panniers, a French word for the historically accurate side-hoops worn under gowns during the 18th century. I’m not surprised by these mistakes, though, since the first example is a person’s name and therefore would not exist in a dictionary, which I assume is what the voice-to-text software draws upon, and the second example is a word in another language than English, which again I assume is the language of the dictionary being referenced by the software. Some of the sentences near the end of the transcript were punctuated in weird spots, as well; if I had to guess, I would attribute this to unconsciously speaking faster the longer I told the story.
If I had scripted this story, I think I could have used language to my advantage and the transcript would have been able to convey a lot more joy, excitement, and passion about this experience – however, if it was scripted it would not have needed to be transcribed in the first place. That step would be removed, and it could be edited and revised until I was happy with the final product. Still, this would remain a fundamentally different recounting of my experience than if I were to tell the story orally – and unscripted.
Telling the story orally would have the advantage of all the nuances of speech, which is a powerful accompaniment to any story. Ong (2002) says that “[h]uman beings communicate in countless ways, making use of all their senses, touch, taste, smell, and especially sight, as well as hearing”, so I would be injecting my words with my mood, tone, and if it was being told in person to someone, my physical expressions as well (p. 6). Even if it was only an audio recording, there is still a way to hear the smile behind someone’s words, so to speak.
Of course, Ong also says that “[w]hen an often told oral story is not actually being told, all that exists of it is the potential in certain human beings to tell it” (p. 11). This is true about this experience – whenever I am not telling the story of my first masquerade ball, it doesn’t exist anywhere except in my mind. If I stop telling it one day, then the story is lost forever. Unless of course I chose to write it down at some point, but even then, the version of the story that I told orally would be lost forever, since I am absolutely certain that my unscripted oral version is always slightly different depending on who I am telling it to and under what circumstances it is being told.
Scripted or unscripted, written or oral, stories carry the weight and intent of the method that they are captured and presented.
Recreating the Sophia Coppola inspired pastry scene
References
Chapter 1 of Ong, Walter, J. Taylor & Francis eBooks – CRKN, & CRKN MiL Collection. (2002). Orality and literacy: The technologizing of the word. New York; London: Routledge.
What an incredibly cool experience!
On the technical side, I am impressed by how well your transcription software was able to infer where periods should be placed (in comparison to mine and others I have seen that didn’t try to include any punctuation at all). This just got me thinking about the choices a human transcriber would need to make when converting someone’s spoken words to written text… we don’t always speak with obvious commas or periods in our sentence construction/delivery, and so I wonder if there exists any criteria for recognized “best practices” of transcribing/interpreting punctuation or if this is always just left up to the discretion of the transcriber?
And after asking that question, I did a quick search. It looks like there are many standards depending on the context, but this one is very in-depth and does a great job of highlighting many of the other idiosyncrasies of spoken English that need to be addressed during transcription: http://support.onespace.com/training-resources/transcription-style-guide
– James
Hi James, to follow up what you discovered about the standards for transcription, I am pretty certain that sign language interpreters have to be skilled enough to translate the meaning behind someones words as they do it rather than a literal word for word translation. It’s really interesting to consider how one would learn that!
Hi Angela!
First off, nice to see you in another course together!
Secondly, this experience sounded so magical! I have heard of parties such as these and that’s incredible that you were able to attend! I completely understand what you meant by “it reads more like an emotionless list of statements rather than what its authentic telling would have conveyed.” I had a similar experience when completing this task. In the attempt to accurately “tell” the software a story, the over-enunciating and slow speech made it feel much more stiff.
As you described your first half being more accurate; I also got that sense. In the beginning of your transcript, I was able to “feel” your excitement and the fantastical sense of attending this party. Perhaps this is due to your retelling of this story; maybe you often start out telling the story in certain way that the your excitement always translates – even across computer software!
Do you think if you spoke a little more freely, and did not worry about the software translating your speech accurately, you would garner similar results in your transcript? I was thinking about you using Siri in your car to text. I use the same feature and I sometimes dictate the punctuation to get my point across. Did you use this, or consider using this, for the task? (I did not; even though I also use Siri to dictate text, but it did not occur to me to try it for this task ????)
Hi Clarissa!! Yes, I absolutely think that if I had spoken more freely and wasn’t so focused on grammatical accuracy then I would have been able to include more in my story. However, I still think that it would lack a certain je ne sais quoi 🙂
Super interesting story and result. I can’t believe how different it is from what I produced, which has so many errors with spelling and capitalization, and absolutely no punctuation at all. Yours is really surprisingly accurate. Was wondering which app you used to create the text? I used Notes, and although I tried to speak normally and you mentioned you made an effort to go very slowly and enunciate extra clearly, I’m still really surprised at the difference.
Hi Nick, I’m glad you saw some accuracy in my story! Of course grammatical accuracy is still different than emotional accuracy in this instance 🙂 I can’t remember which program I used off the top of my head, I think I just googled “speech to text website” and used whatever came up first!