Task 3: Speech to Text

I enjoyed doing this exercise because I am fascinated by the act of using my voice to create form. This year I have been practicing singing and oratory on a daily basis, doing recordings of myself for music and narration projects. In the context of speaking, I’ve developed a passion for effective pronunciation and vocal vibration, recognizing it as an art form that can transmit beauty and emotional healing. While doing this week’s readings, I found Haass’ (2013) description of writing as a technology to be compelling and I think the same applies to speech.

With this in mind, while doing the exercise of speech to text, I had some technical advantages and experience that kept the transcription from having many flaws. I must admit, of course, that I was more mindful and careful in my speech. However, because of my practice, this is normal to me now.

While looking at the text, the major deviation I find from written English is that there is no punctuation. I have a big paragraph that looks like an endlessly flowing, very (very) long sentence. When I read the sentence I find myself slipping through sentences that sound non-sensical. This makes me think about the importance of punctuation and how this must have been an issue that the investors of the technology of writing came across while creating it. This must have required an analysis of how we use silence and inflections while speaking and find a way of symbolizing that through colons and commas. It’s interesting to think how we have assigned a form (a colon or a comma) to something formless (silence). This makes me think about Gnanadesikan’s (2011) statement about writing being “only a means of expressing language, not language itself” (p.4).

In this week’s readings, Schmandt-Besserat (2009) made me reflect on the connections between graphic symbols and language. While exploring in detail my experience of speaking, even in an improvised context such as this, I’ve noticed that words first appear in my mind, and then I read them. Looking more closely, it seems to me like, before a word is formed in consciousness, there is an image of an experience. In that sense, the text I am seeing is a description of images and experiences. What feels “wrong” in the text is when this description becomes non-sensical through an error in the text. For example, one was “nobody is really perfect, and if that is true Bend I have no reason if you ashamed”. It seems to me that sentences like this are considered wrong or mistakes because they create sentences that can’t be properly referenced to images and experiences in our memory. 

It’s interesting to think about how this might have been if it was scripted. As a musician, I have experience with improvisation, so the text came up fluently and structured. However, it is not as tight as it would have been if I had taken the time to think more carefully about what to say. As noted by Hass (2013), “written texts foster contemplation, analysis, and critique” (p.9). Speaking is similar to singing in the sense that there’s almost no time to take pauses to reflect, go back, and edit lines of thought. Writing on the other hand is more like sculpturing, in the sense that the thought objects we create can be molded and refined. Spoken words come into existence and quickly fade out of existence – there’s no turning back; while written words come into existence and are situated as an object in space. As noted by Gnanadesikan (2011), “writing is generally done more deliberately than speaking, so finished written pieces are much more carefully crafted than a typical spoken sentence” (p.5).

 

References

Gnanadesikan, A. E. (2011).“The First IT Revolution.” In The writing revolution: Cuneiform to the internet (Vol. 25). John Wiley & Sons (pp. 1-10).

Haas, C. (2013). “The Technology Question.” In Writing technology: Studies on the materiality of literacy. Routledge. (pp. 3-23).

Schmandt-Besserat, D. (2009). “Origins and Forms of Writing.” In Bazerman, C. (Ed.). Handbook of research on writing: History, society, school, individual, text.New York, NY: Routledge.

2 Thoughts.

  1. I think your unedited text is probably one of the best among all the Task 3 posts. Same as you, I noticed the obvious miss of punctuation. I wonder if there is one voice-to-text software that can insert necessary punctuation based on the length of the pause. If there is, will the unedited text be perfect or what kind of deviation we may find?

    The most interesting thing I have read in your post is how words will first appear in his head before he reads it. I have actually never thought about it. The example you made reminded me of the first task that I have done in this course, in which I had included how a Chinese character (山) had developed from a simple drawing of a mountain to the actual character that people are using today, in contrast to how English evolved from its Greek origins based on meaning (Schmandt-Besserat, 2009). I don’t know if my understanding is correct, but this relationship between symbols and language mentioned in your example seems to represent abstract v.s. concrete. Something you have experienced or have an image of is more on the concrete side and makes you feel right, and something deviating from our life too much can feel wrong. Will this apply more to the culture and value than the use of language itself? Something considered normal in a small tribe may be very strange for us if we do not know the culture and the historical reason behind it.

    I have included a written version of my story after I had done the voice-to-text. Exactly as you quoted from Gnanadesikan (2011), “writing is generally done more deliberately than speaking, so finished written pieces are much more carefully crafted than a typical spoken sentence” (p.5). Even though my written piece was done quickly without thinking too much about it intentionally, it was still significantly better than the voice-to-text one. It is more concise, focused (on the main point), and more structured.

    On one final note, my observations of people around me align with your experience that his text is more fluent and structured thanks to his experience as a musician. I have noticed my friends who sing perfectly in tune also speak other languages better (with fewer accents). Definitely no scientific proof on this, just my observations of a group of friends. I just thought it would be interesting to share!

    • Hi Ping,

      Thanks for your kind comments!

      I think that having voice-to-text software that can insert punctuation based on the length of pauses would certainly be useful. Although, in some cases (particularly if unscripted) we might stop in our speech without intending to finish a sentence (just thinking about what to say). I think that another feature that indicates pauses that were not discussed in my reflection is tonality. Usually, we modulate our voice in very particular ways when creating sentences (for example raising our pitch at the end of a sentence to evoke a question). The voice-to-text software could also analyze tone to decipher punctuation. In the end, it seems like what we are trying to achieve is software that mimics our ability to understand speech!

      By the way, I did one of my linking assignments on your Twine Task, which I thought was phenomenal. If interested, you can read it here: https://blogs.ubc.ca/eduardo540/2021/10/16/linking-assignment-2/

Leave a Reply

Your email address will not be published. Required fields are marked *

Spam prevention powered by Akismet