Task 3: Voice to Text Task

ETEC 540 Text Technologies: The Changing Spaces of Reading and Writing  (June 3, 2022)   

Transcription file 

 

How does the text deviate from conventions of written English? 

Outside of missed or misinterpreted words, punctuation and syntax are the most significant deviations from convention between the speech-to-text results and written English. In a brief literature review, I found several articles including one from 1979 (Levinson, 1979) that described a system that could formulate complete sentences using a vocabulary of 127; it also included programmed grammar and a syntax analyzer. Today, the average speech recognition software can recognize about 15-,000 words (Gartee, n.d.).

 

What is “wrong” in the text? What is “right”?

As mentioned above there are several examples within the speech-to-text results of missed or misinterpreted words that impact the structure of sentences. In one occurrence, instead of saying “nice complement,” it says “ex complements.” Another “wrong” attribute of the text is its inclination to use the American spelling of some words (I would guess that could be adjusted in the programming of the software, but it is not an obvious available adjustment).

Much of the dictation is correctly. Considering that the recording occurred outside (and beside a busy highway), the accuracy was pretty good. When the punctuation was correct, especially with periods, the software included capital letters to start new sentences. It did capitalize its own software name but only when I started it in full. When I only used the program name without the company name, it did not capitalize it.

 

What are the most common “mistakes” in the text and why do you consider them “mistakes”?

As the readings this week point out, oration differs from written communication, and as much the “mistakes” that I encounter related to spelling or sentence structure. Reflecting on this, it makes me recognize this technology as an evolution of what Denice Schmandt-Besserat and Michael Erard called “sound of speech emulated” where video representations were made of sounds (Schmandt-Besserat, 2009). Instead of images that represent sounds, the text is being generated based on my sounds.

The subtle and not so subtle mistakes in the transcription have an impact on the way the reader responds to the information. Other than the examples of the wrong word altogether or missing words, the transcript is an accurate reflection of what I talk about in the video. The mistakes, however, are considered mistakes because they have a negative impact on the reader understanding the content.

 

What if you had “scripted” the story? What difference might that have made?

If the video had been scripted, it may have encouraged me to enunciate and focus on my diction to enable the reader to more accurately transcript the words and punctuation. Technology like the quality of the software, microphone, and surroundings can also impact the quality and accuracy of the transcription.

 

In what ways does oral storytelling differ from written storytelling? 

This is an interesting question… In fact, I was just doing some research related to Indigenous storytelling and a major takeaway was the evolution of stories through this type of communication. As Hass (2013) says “writing is made material through the use of technologies, and writing is technological in the sense and to the extent that it is material.” So, what differentiates oral storytelling and written storytelling? I would suggest the difference lies in punctuation and syntax. Ironically, that is precisely the challenge that voice-to-text software faces. While a stenographer can ask for clarification or make an educated assumption about meaning, the transcription software is basing its results strictly on how it was programmed, its vocabulary, and the speaker’s inflection.

 

References

Gartee, R. (n.d.). How speech recognition software works. Retrieved June 3, 2022, from https://wpscms.pearsoncmg.com/wps/media/objects/11505/11782012/gartee_speech_recognition.html

Haas, C. (2013). “The Technology Question.” In Writing technology: Studies on the materiality of literacy.

Levinson, S. E., Rosenberg, A. E., & Flanagan, J. L. (1978). Evaluation of a word recognition system using syntax analysis. Bell System Technical Journal, 57(5), 1619-1626. https://doi.org/10.1002/j.1538-7305.1978.tb02114.x

Schmandt-Besserat, D. (2009). “Origins and Forms of Writing.” In Bazerman, C. (Ed.). Handbook of research on writing: History, society, school, individual, text.Links to an external site. New York, NY: Routledge.