“This sentence is a time machine…the marvelous technology that allows the past to speak directly to the future…it is writing.” (Gnanadesikan, 2011, p.1.)
This week’s task takes me back in time as I tell, and subsequently “write,” the story of a childhood habit I referred to as “telling stories.” The story is “written” through two voice-to-text technologies, and an analysis of the results follow.
Gnanadesikan (2011) states that “…writing…records language, but not actual speech” (p.9). This week’s Voice-to-Text task asked us to record our speech using tools that purportedly translate speech into written form. Some might not consider the product generated by voice-to-text as “writing,” as it follows a process of thought-to-speech-to-documentation, as opposed to the more direct, traditional process of thought-to-documentation. However, in a world of ever-increasing automation, where teenagers like my daughter no longer know how to perform cursive handwriting, it could be argued that voice-to-text could be the “writing” of the future.
In what ways does oral storytelling differ from written storytelling?
Written storytelling demands more attention to the intricacies of grammatical correctness than oral storytelling. It is not able to rely on contextual cues provided through oral storytelling’s variations in tone, pace, volume, etc. Compared with in-person, oral storytelling, written storytelling is restricted in its ability to adapt its messaging in response to the real-time reactions of an audience. The oral storyteller might also be considered to be more present than a written storyteller, which in turn can influence the experience of the story listener.
The materiality of written storytelling lends itself to quality control when a story is widely disseminated. As the children’s game of “telephone” illustrates, an oral message loses more and more of its original content as it is repeated. Written storytelling assures a storyteller that the same words will be used to convey the story, no matter where it is read. Despite this increased sense of control over the message, written storytelling may afford greater interpretation for a reader. With the reduced physical presence of the written storyteller, a reader may be freer to create their own constructs in response to the story.
For this Voice-to-Text Task, I played the role of oral storyteller and then editor of the resulting voice-to-text output. If I were to try to tell the same story in written form within the five-minute limit, I don’t think I would have reached the end of the story. The time required for word selection and grammatical correctness is greater than the time required to verbalize a story. The material permanence of the written word adds a layer of cautious care as we anticipate our work being interpreted and critiqued by others. The voice-to-text task demonstrated that such care might also enhance my oral storytelling. The efficiency of oral storytelling can be compromised if we are soft-spoken or sloppy with our diction. Words that were missed or misunderstood by the voice-to-text technology, but were perceived as having been communicated by me as the storyteller, provided evidence of this potential downside of oral communication. Additional care could have also resulted in more effective word choice and organization, a process that is more easily achieved with the written process. I would hope that a written version of the story I shared would not include a series of run-on sentences and informal, conversational phrases such as “pretty much.” As I write this, however, I realize that I am judging the “writing” of my voice-to-text story based on the indoctrinated standards from my education. Is it fair to say that those standards always apply? Who deems a sentence as a “run-on” or a phrase as “informal”? This leads to consideration of what might be considered as “wrong” or “mistakes” in the the output of voice-to-text technology. It also begs reflection regarding the relationship between language and power.
What is “wrong” in the text? What is “right”?
In contemplating “right” versus “wrong,” the judicial system comes to mind. In courtrooms where matters of “truth” are intended to prevail, accurate documentation is critical. Yet, even with the concept of voice-to-text as practiced through “…dictation and courtroom stenography, much information about the actual speech is lost, such as intonation and emotional content” (Gnanadeskian, 2011, p. 9.)
For something to be considered “wrong” or a “mistake,” there has to be some manner of evaluation or judgement. This can involve a power dynamic where authority is given to deem something as “wrong” or a “mistake.” For the purpose of this exercise, this power resides most with education. Indeed, Gnanadeskian (2011) notes, “writing is associated with education, and education with wealth and power”(p. 5.) Further, Goody (1977, 1987) views writing as a “technology of the intellect” (Schmandt-Besserat, 2009, p.20.)
Based on the power (and limitations) of my education, I will review the mistakes and deviations from conventions of written English made by two voice-to-text technologies that I used to record two unscripted renditions of my story. I first told the story to Otter.ai, a technology that I have used in the past. When it appeared that the Otter.ai recording didn’t work, I re-told the story to Google Doc’s Voice Typing. Later, I discovered the Otter.ai recording had indeed worked, so I have included both documentations for comparative purposes. Otter.ai includes punctuation and grammatical capabilities beyond those offered by Google Doc’s Voice Typing feature.
“TELLING STORIES”
As documented by Otter.ai
This is the story of a little secret. I had when I was a child. Actually, it lasted into my adolescence, actually, if I’m, to be honest. I used to have a favorite pastime that I referred to as telling stories.
I would
walk around a room in circles, waving pencil about. As I talked about imaginary people, and relationships that were inspired by images of people in the Sears catalog, which I’m now dating myself to to reference. And it was important that the pencil be unsharpened, basically unused. And there was never any writing involved in this storytelling. It was just me talking and creating and talking. I usually did this in the privacy of my home, and my parents and younger sibling would know to leave me alone when I was telling my stories. But there were some times that I would talk to complete strangers about it. I recall one particular moment when I was about three or four sitting outside of my home on the sidewalk. And I was talking away about our neighbor’s business. He was self employed electrician of some sort and I had taken my inspiration in that story from the, the logos and pictures on his work truck. So this is all really interesting
and
interesting because I’ve always liked to talk but I’ve also always liked to write, but this part of my creating never had a written component to it. I love to write so much that there was a time in about grade four that my teacher had asked me to answer some question on the blackboard. And I proceeded to cover every inch of Blackboard that was in that classroom which I believe spanned three quarters of the wall space. So writing was not a foreign topic for me, but there was something really precious and personal and special about this world that I would inhabit. Again, inspired by characters in a, in a catalogue, like magazine images, where I would make up fantasy lives and an act, the roles of every character. Within those fantasy lives. I don’t really know why the pencil was an important piece of it but it really was there was something about the weight of it in my hand I can remember how different it would feel if I didn’t have that unsharpened pencil. And it’s not like I used it as a microphone, it was just some extension of me as I would wave and about talking, developing dreaming and creating. Now there is one time though I, I should confess to, there was a large family I was creating in my storytelling and I was get losing track of some of the character names so sorry mom and dad but there was a time where I actually wrote the characters names on the wallpaper in our formal front room. Fortunately I did use pencil for that purpose. So, I must have had a sharpened pencil for that storytelling session. So there you go, I have been a storyteller, for as long as I can remember, I’ve never been able to get back to that wandering around the room creation kind of space I have tried it. But I will always hold it as a special part of my childhood.
“TELLING STORIES”
As documented by Google Doc’s Voice Typing
I’m going to share secret from my childhood secret about a favourite pastime of mine that truth be told actually extended into my adolescence the secret past time I called telling stories come from a very young age as long as I can remember really I would find myself a quiet room Sears catalogue which dates me little bit I know and an unsharpened pencil one of those that had an eraser on it yellow with a red eraser and had to be one like that and it needed to be on sharpened because I would wave it around in my hands as I talked and told stories I would create imaginary worlds inspired by photos of people models in Sears catalogues I would create characters and lives and jobs and relationships families everything was fair game and and I would tell stories about them for hours on end my family members my parents and my younger sibling with no to give me space when I was telling my stories fortunately they didn’t shame me for it ever which I appreciate and while I did spend most of my storytelling time in the privacy of my own home I do recall a time when I was about three or four and decided to tell one of my stories to complete stranger walking by on the sidewalk I was sitting on the sidewalk outside of my home and I was talking away about the electric company that our neighbour had his work truck I guess he was an electrician was in his driveway and I was making up all kinds of exciting stories about what his business was all about so there were Times Obviously where I would diverge from the Sears catalogue source of inspiration but most of the time that was my good old standby and I would leave all kinds of tales of an interesting lives from that thick collection of pages and it’s not like I didn’t like writing none of these stories were ever written by me except for the time that I was having a hard time keeping track of all the family members for this one story and so sorry Mom and Dad light actually wrote on the good wallpaper of our formal dining room so I could remember but fortunately I have a broken pencil so that was okay yeah I never wrote any of these stories I love to write in fact I love to write so much that one time in about grade for when my teacher asked me to answer a question on the Blackboard I proceeded to cover every inch of Blackboard in that classroom which has ever called pretty much constituted Three-quarters of the wall space in that classroom so I enjoyed writing but this special kind of telling stories was exempt from that particular activity there was something special and unique about the freestyling of dreaming and creating true words in the moment with just the inspiration of some photos an unsharpened pencil I have tried to recreate that special creativity in my adult hears in moments when I was struggling to solve a complex problem when I thought that creativity might help me to Muppet solution and unfortunately it has never returned so it will just remain a special part of my childhood but a time that is well connected to my Lifelong Love of words and particularly communication in the context of South development and relationships .
What are the most common “mistakes” and why are they considered “mistakes”?
The “writing” demonstrated in this voice-to-text task exemplified the notion of “…writing as once-removed, a derivative of speech…” as suggested in the perspectives of Ong, Goody, Havelock, Plato and Socrates (Haas, 2013, p.12.) While one of the voice-to-text tools was superior in capturing grammatical elements and including fewer errors, neither tool could convey the story as effectively as told in oral form. The experience of this task contradicts Ong’s (2002) observation that“…writing from the beginning did not reduce orality but enhanced it…” (p. 9.)
A focus on phonology instead of word context might have led to errors in documentation. For example, there was confusion with homonyms, with the word “for” being recorded when it should have been the number “four,” and the word “no” being recorded when it should have been “know.” In the latter case, this error might have resulted from an error in the preceding word as well (e.g., the software captured “with no,” when it should have been “would know.”) The voice-to-text technology also misheard some words. For example, it heard “true” instead of “through,” “hears” instead of “years,” and “South” instead of “self.” These errors are considered to be mistakes because they do not accurately match the words that were spoken. As such, they could create confusion and misunderstanding for the reader.
Google Doc’s Voice Typing software missed capitalizations for new sentences, and added odd capitalizations for some words (e.g., “ Time Obviously,” “Blackboard” and “Lifelong Love.”) It is possible that these capitalizations were the result of a pause in my speech, suggesting a new sentence. Some of these occurred with the Otter.ai technology as well.
The Otter.ai software seemed to capture “disjointed” moments in my “thought-to-speech” process, placing one word on a separate line, likely as I processed and paused. For example:
…inspiration in that story from the, the logos and pictures on his work truck. So this is all really interesting
and
interesting because I’ve always liked to talk but I’ve also always liked to write, but this part of my creating…
Despite its ability to add punctuation, the Otter.ai voice-to-text technology sometimes “over-punctuated,” turning one sentence into two, resulting in grammatical error. While Otter.ai separated run-on sentences, this created multiple sentences beginning that began with the word “and.”
Rules between the two technologies were sometimes inconsistent. For example, Google Doc’s Voice Typing hyphenated “three-quarters,” while Otter.ai did not. Google Doc used the Canadian/British spelling of words such as “neighbour” and “catalogue” while Otter.ai included American spelling.
I found it interesting to notice that I was less compelled to edit the punctuated version provided by Otter.ai, even though I winced at the clunkiness of my spoken word. My need to edit and “sanitize” was much stronger when reviewing the punctuation-less version provided by Google Doc’s Voice Typing.
In what I assume were especially “mumbly moments,” Google Doc’s Voice Typing software generated interesting results. For example, when I said “…creativity might help me to come up with a solution,” the software recorded: “…creativity might help me to Muppet solution.” This may be indicative of the cultural biases inherent in AI solutions, as I don’t expect that Jim Henson’s Muppets are widely recognized in all corners of the world.
When language is materialized through writing, it can be vetted and controlled in accordance with the norms and standards of existing power structures. For example, Schmandt-Besserat (2009) describes how, until the early 20th century, the Chinese civil service influenced writing standards such that “…all dialects of Chinese were written in a literacy dialect that dated to the late Old Chinese period, about 1100 BCE to 100 CE. This meant that literate Chinese wrote in a way they themselves did not speak” (p. 18.)
Voice-to-text technologies make predictions based on the vast volumes of data extracted from other technologies such as Google Analytics. These predictions are inherently biased as they draw from data that is already subject to biases such as racism and sexism. Further, not all cultures contribute equally to these data sets, as there is unequal access to Internet technologies.
The influence of power structures in language is even found in the animal world. For example, researchers in Berlin recently discovered different dialects among naked mole-rats (McDonald, 2021.) Dr. Alison Barker and her team identified 20 different vocalizations in their analysis of the rats’ speech sounds. They also interpreted that the naked mole-rats function under authoritative rule with limited freedom of expression. Specifically, Barker and her team found that when the queen of a colony was “murdered”, the cohesiveness of the colony’s dialect was lost. The translating technology the team had used to decipher the rats communications was no longer able to discern dialect. Once a new queen’s rule was established, and “order was restored”, the dialect returned.
What would’ve been different if it were scripted?
The unscripted, voice-to text speech documentation revealed through this task highlights the importance of punctuation. Punctuation helps to ensure written works are easy to read and understand. In addition to the inclusion of punctuation and grammatical correctness, a scripted approach would have considered the pre-existing knowledge level and interests of the intended audience. It would have been better organized to clearly convey relevant points in an easy-to-follow format. It would have also eliminated unnecessary words and details that could cause a listener to lose interest.
A scripted version would have lost an aspect of authenticity, though. There is something compelling about the raw vulnerability of unrehearsed and unscripted content.
Ong (2002) writes “….without writing, human consciousness cannot achieve its fuller potentials, cannot produce other beautiful and powerful creations” (p. 14.) If I had documented the “telling stories” experiences that I shared in this exercise during my childhood, would I be better able to access that “special” time again? Would it help me to reclaim creativity that could lead to achieving fuller potential? Without writing, I’m not able to answer this. I cannot benefit from the “time machine” of writing as so eloquently described by Gnanadeskian (2011.)
References:
Gnanadesikan, A.E. (2009). The First IT Revolution. In The writing revolution: Cuneiform to the internet (pp 1-12). John Wiley & Sons. doi: 10.1002/9781444304671
Haas, C. (2013). The Technology Question. In Writing technology: Studies on the materiality of literacy (pp. 3-23). Routledge. doi: 10.4324/9780203811238
McDonald, B (2021, January 30). Naked mole rats learn their ‘language’ from their queens and speak in dialect [Radio broadcast]. CBC. https://cbc.ca/listen/live-radio/1-51/clip/15821769
Ong, Walter, J. Taylor & Francis eBooks – CRKN, & CRKN MiL Collection. (2002). Orality and literacy: The technologizing of the word. New York; London: Routledge.
Schmandt-Besserat, D. (2009). Origins and Forms of Writing. In Bazerman, C. (Ed.) Handbook of research on writing: History, society, school, individual, text. Routledge. doi:10.4324/9781410616470