Task 3 | Voice-to-text using Speechnotes


Introduction

Telling a story isn’t one of my skills and speaking an “unscripted” one in a language that’s not the one I grew up with seems to be a daunting task, even though I have been speaking English on a daily basis for more than twenty years. Starting with the task, I wondered if the voice-to-text tool (unspecified yet) has been trained to capture accents, speech impediments, etc., of users like myself. To the best of my knowledge, “it may”, if they are included in sufficient numbers in the training data. With these concerns upfront, I thought it would be more suitable to share something that I am at ease talking about, and I talked about several times: “my coding journey”. I organized my thoughts before recording; I will talk about my beginnings as a teenager, my college experience, and my career path. And, I will use short sentences and simple vocabulary as it is easier on my part.

The outcome

The text below is generated by “Speechtexter”. I bolded the parts that the tool failed to capture, and added corrections between parenthesis (i.e., the total corrections are 128 words out of 701 words). I would like to note that I recorded myself, while talking to the screen as I thought that this would be helpful in the analysis section (the recording time was 6:09 minutes, more than one minute longer than the task requirement).

I don’t know when I fell in love with coding I know that that I just did the first time I program 2022 (programmed was in) high school I would look (bought) booklets for teaching basic (BASIC) programming language which was well known to the young scooters (coders) and computer hobbyists interested in microcomputers these booklets help me to write the sample (simple) loops use conditional(s) and print (to the screen). But I had never figured out how to do something useful of an (or fun) letter (later) in school I was introduced to logo (Logo) Turtle microburst (Microworld) the technology was widely used in mathematics classes to extend geometry and coding for (it was) straight forward to learn and used (use) by (it has a) small object the on-screen (on the screen) shaped like a triangle and to make it turbo petrol use pipe (move and draw, you just type) simple comments (commands) on their (the) keyboard so for example if you write down for what 16 the Church (forward 50, the turtle) will move in the direction (it) is facing and draw a line of 50 pixels for 15000 (or 50 turtle) steps the logo turtle the microarray it (microworld) was fun though (although) it was no more educational (than a video) game at (at) it had a profound impact on my college choice that opened a whole new world to me I joined the (a) computer engineering college in the mid 90s in early (in the early) years I spent a fair amount of time doing Cup (cup) popular computer games such as Tic Tac Toe and that was (Tetris) games and AdSense (I learned advanced) program (programming languages) Sach Kasam (such as) Pascal C + + and Java I also spend long hours in the library reading about topics of interest such as algorithmic thinking and high-performance microprocess (microprocessor) circuits then tonight (the internet) that was relatively young at this time and the library had the Richard (richer) resources Xolo (although) most of my college projects were incomplete and imperfect they made me feel happy and capable to the point that I call myself Invincible the five years and I will say (passed and I was) good academically so I joined the teaching staff and started them (them) as a programming lab assistant my years in the teaching profession were terrific I don’t (gained) experience I grow (grew) academically and completed my Masters in computer engineering I also got recognition is there(the) several Institutions I would the four (I worked for) as a role model to other female students pursuing their dreams in the computing field which is pretty much known as the men’s field however something was missing out for me teaching even in a dynamic area as confuting (computing) is still monotonous I want (wanted) to go out (code) and I want (wanted) to be portrait (part of) of real life project(s) in (and) complex computing system(s) after a decades virtue (a decode or so) of being and (an) educator I resigned (I) press(ed) the refresh button and started the (the) working towards another chapter in my life so I turned it (joined) some networking and the intelligence (intelligent) security courses looked (I looked) into AI and machine language areas however the material or what’s up (was) quite Complex for (targeting) professionals in this area and honesty I stumbled across them I also tried doing a mobile app I took a long time trying on that but it wasn’t of good quality I guess if (that) the technical & size (and design sides) required in this area or (are) not Ameen (in me) I think that text (attained tech) certifications to shop in (sharpen) my skills they help (helped) me to stay in sync with (the) technological graph (growth) and finally after two and a half years I got a job opportunity in a big organization the old having (hiring a) specialist in network security you might be surprised the that it took this long. For someone with a graduate degree in computer field is (but the) academic field is very different than the industry was that (world) the transition (period). So I call my job this (jobless) time has helped me to see the many processes (paths) in computing skills (field) I tried and sales (failed) multiple times and tell (till) I figured out the pause (path) for me it was also useful in developing the (meta) skills that I think of as more available (valuable) than any fundamental technical skills simply because we our main thought in completing (task in the computing) field is to solve problems (that) was (we have) never encountered before every project is a new start it’s been almost 7 years since I switch (have switched) to my current (career) if I have my doubts (I have my ups) and downs but still I feel excited and charged after all the change has allowed me to follow my passion for colleague (coding).

The analysis of the experience and outcome

Even with my prior concerns, I thought I had a well-crafted logical plot in my mind before starting this exercise. I also felt that I had an ultimate message, which is, “we need to do what we love and want to do, being patient and embracing our failures are core to success.” However, as I read the unscripted text, I thought that the message was vague; there were many irrelevant details (such as the description of “Logo Microworld”), detracting from the core message. As I listened to the recording, I was disappointed; the story lacks the emotional core necessary to capture the listener’s attention. My voice was monotonous as if I was dictating words to be typed; I spoke very unamusingly, with one tone, there was no enthusiasm or emotions, it was obviously agitated at some points. Indeed, it was difficult to speak up about the thoughts and ideas in my mind and turn them into coherent words and phrases simultaneously. Additionally, I lost the tempo of what I was saying and got distracted. I started focusing on avoiding linguistic or grammatical mistakes, so the tool can efficiently recognize the speech; I was intimidated every time I saw the errors pop up as I talked. And towards the end, I had to scurry with my words as I had exceeded the allotted time.

The conventions of written English are semantics (using meaningful terms and sentences) and syntax (grammar, spelling, punctuation, and capitalization). With both, the writer can convey the intended message. The generated text deviates significantly from these conventions. Though, most of the words were correctly recognized by the tool (i.e., the recognition accuracy is approximately 82%), yet the flaws, whether the terms that were incorrectly identified or those that weren’t recognized, made the story difficult to understand and follow along. Furthermore, the generated text lacks paragraph structure and punctuation (except the two misplaced periods). There are also capitalization errors when defining abbreviations (e.g., BASIC) and names (Logo), wrong tenses, and missing articles. In short, the groundwork required for effective writing is missing in the generated text; it is likely that the reader will be bewildered and will fail to decipher the intent and the content of the story without reading through the corrections.

According to Gnanadesikan (2011), writing takes words and turns them into a visual or tangible object; with its spatial quality, the written form of language makes it possible to contemplate, manipulate, and analyze the message in a way that isn’t possible for spoken language (Haas, 2013). Therefore, if I had the chance to produce a written form of the story before the narration, I believe that there would be a significant difference in the outcome. I would have been able to draft my thoughts and read them out loud to be sure the story flows smoothly and fits into the allotted time. I would also have a chance to rephrase and arrange words, sentences, paragraphs, use more varied vocabulary, replace unclear or overused words, remove any unnecessary or off-topic sections, and correct spelling, capitalization, and punctuation.

Keeping the guide in hand (.i.e., a script) might have positively impacted the narration as I would feel less intimidated because I know that I am well-prepared. The writing might also have helped me to pause at appropriate spots to add a period, comma, and new paragraph, thus, enhancing readability and understanding. Besides, with a complete image in my mind, it would be easier to emphasize the essential milestones in the story and reveal the emotional responses concurrent with what would be narrated. Thus, the audio version might be more engaging.

Oral storytelling differs from written storytelling in several aspects. Firstly, oral storytelling is spontaneous, lengthy, and tolerant, while written storytelling is more abstract, concise and accurate in representation (adhering to the writing conventions and grammar rules). In their study, Horowitz and Newman (1967) demonstrated this idea. They asked undergraduates to talk and write about the same story. The researchers reported that the spoken story is more facile, producing longer stretches of language and more repetition. Secondly, the human memory is far from being infallible; even for people with a good memory, retaining data is never granted (Gnanadesikan, 2011). Thus, every time the spoken story is told, it will be different (in details, order, etc.). Gnanadesikan (2011) gave an interesting example, stating that in the party game “telephone,” each participant will alter the message as he/she passes it in the circle. In contrast, the written version of the story will remain unchanged; it is meant to be “immortal” (Haas, 2013). Thirdly, the oral narrative is more ambiguous than the written storytelling. I recall the well-known phrase “A woman: without her, man is nothing.”, the listeners can misinterpret this message. It may make them angry, upset, and hurt if the narrator didn’t stop at the appropriate times (i.e., inaccurate pauses). Lastly, emotion is crucial in comparing the discourse in both genres. Oral storytelling reveals the emotional aspect more prominently than written storytelling, even if the topic endorses scientific rationality. In the assigned video of the past week, you can sense the intense excitement, curiosity, interest, and wonder evident in Dr. Lera Boroditsky’s video when she is speaking about her work and discoveries (2017). In contrast, these feelings are entirely hidden in scientific writing that discusses the same topic, “How Language Shapes Thought” (2011).

Overall, this activity has taught me that while we tell stories, we shouldn’t rely only on a series of logical events, but also our experiences, feelings, emotions, and backgrounds; I was definitely keeping an eye out for the generated script, focusing on the recognition accuracy rather than telling my story. The experience may also suggest that the voice-to -text software is not yet attuned to the non-native English speakers’ dialects.

References

  • Boroditsky, L. (2011). How language shapes thought. Scientific American, 304(2), 62-65.
  •  Boroditsky, L. (June 2017). [Video post]. How the languages we speak shape the way we think. Retrieved from Lera Boroditsky, How the Languages We Speak Shape the Ways We Think – YouTube
  • Gnanadesikan, A. E. (2011).“The First IT Revolution.” In The writing revolution: Cuneiform to the internet“. (Vol. 25). John Wiley & Sons. (pp. 1-10).
  • Haas, C. (2013). “The Technology Question.” In Writing technology: Studies on the materiality of literacy. Routledge. (pp. 3-23).
  • Horowitz, M. W., & Newman, J. B. (1964). Spoken and written expression: An experimental analysis. Journal of Abnormal and Social Psychology, 68(6), 640-647. doi:10.1037/h0048589

2 thoughts on “Task 3 | Voice-to-text using Speechnotes

  1. Marwa, I really appreciate your systematic and organized approach to studying. I always learn so much from you! For example, recording yourself was such a great idea! I regretted not doing that while working on my analysis, especially when trying to desipher words that the tool made completely unrecognizable. Also, I can totally relate to your concerns about doing this task in English. Did you also wonder how much difference it would make if you told the same story in your first language?
    In your analysis, you write that you “spoke very unamusingly, with one tone”. I had the exact same issue when I talked – being aware of the fact that my words were being scripted further affected the way I talked. This is why I prefer oral communication where body language, voice, and the contact with the audience can compensate for my imperfect English.

    • Hello Olga, Thank you for you your kind comments. Saying unscripted stories even in your language can still be a confusing and intimidating process. Indeed, my mother tongue might have eased things; however, without preparation and script, there is always the pressure of coming up with something at the moment, selecting meaningful vocabulary, delivering consistent information, etc. It is important to think whether the attachment of a written outcome affects the way we think of oral storytelling in the first place—furthermore, the impact of the audience on our spoken stories. Let me put this in a question form: Why would standing up and saying a story in front of a roomful of people feel like a qualitatively different experience to more immediate events such as informal conversations and voice messaging?

Leave a Reply

Your email address will not be published. Required fields are marked *

Spam prevention powered by Akismet