Speech to Text Lessons Learned – My “Quakers” Trip

My Rev.com experience shaped how I approached this challenge. Has anyone else spent any time on Rev.com? It is a site that gives people minimal amount of money to edit generated “Speech to Text.” I have made only about $30 before I stopped, the time spent trying to correct generated speech to text was not worth it.

In this challenge, I wanted to share the story of my travels to Ecuador, attached is the generated Speech to Text  result.

 

The text deviates very far from the conventions of written English, there is no formatting, and the entire story is missed. My favorite mistakes were:

  • Quito to “Keatings”
  • “highs and the lows” to “high school bowls”
  • “approximately 20 hours of travel” to “proximately 2020 hours of troubled”
  • “Equator” to “Quakers”
  • “Or the kind of equivalent” to “a Buddy Kenneth of the quibble it”
  • “Guest house within their…” to “gets tense week that bear”
  • Teña to  Kenya
  • “Um, no wifi, no nothing” to “bomb life I know nothing”

Overall, the text is illegible. For the sake of comparison, here is the audio used:

There were not many areas that turned out “right” but I was surprised that near the end, it captured the city Quito correctly. In addition, there were a few lines that were generated almost verbatim “of course there were cars back logged on the other side as well and that the very last..” and “do to make its satellite phone or something so that way if there was any issues during these periods of time that they’ve got to be able to contact somebody.”

I think the biggest learning lessons here is that I need to enunciate all of my words clearly and precisely. Another learning lesson, is that audio and speech to text will always turn out better when you have a script. This has always been my experience, and it is very difficult to enunciate clearly and tell a story, and keep on time, without one.

If I had scripted the story, the words may have come out clearer. I also would have created the transcript/captions differently because I already had created a script in which I can use. The script would have been formatted correctly, in full sentences. However, there would have been less laughs along the way, I’m sorry but, “bomb life I know nothing” is pure gold.

Oral story-telling is not scripted, therefore, in many ways it is imperfect and can be changed according to the story-teller. However, I like to think of oral story-telling as alive. There is the ability for it to fluctuate between teller to teller (if there is no written copy), but it gives the opportunity for people to share their own feelings and passions. Each story-teller can emphasize the details that are important to them, and listeners can feel the heart in the story. If my travel companion were to tell this story, it would have come out completely different. She was there for all of these experiences, however, her experience may have been completely different from mine.

2 Thoughts.

  1. I also have some experience on Rev.com! I stopped after making about $10…Gave up because I found a lot of “inaudibles”, and I felt that the time it took to get a transcript accurately always took longer than I expected. I did not expect to run into a Revver here at all!

    Anyways…

    Wow, what a story! The audio you attached was VERY clear and the way you recounted your experience was VERY engaging! But the transcript came out really weird. The formatting is totally weird, with sentences all broken up. I am curious to know what software you used to go from voice to text. When I used Speechnotes.co for mine, which was the one listed as an example in the assignment description, the formatting turned out alright. However, there were no punctuations included. I found the missing punctuations really annoying for me. I actually dictate my writing often because I am pretty lazy, so I got used to saying the punctuations. This exercise was weird for me because I had to intentionally talk to a machine without saying any punctuations.

    In your reflection, you mentioned that if you had scripted the story, your words would come out clearer, which I assume means more clearly enunciated. Perhaps it’s just me, but again, the audio track that you attached was REALLY clear; I wonder how much more clearly you may have been able to pronounce the words…I wonder if that would really make a difference in the accuracy of the transcript.
    .

    • Thanks Christopher, fellow Revver!

      To answer your question, I used Camtasia, it has a voice to text component to it and worked very easily.
      I think to expand on my clearer/script comment, the areas I noticed that were the most illegible were the areas that I used filler words, “uhms” “ahs” or repeated myself.
      In those cases, the script would have definitely helped me to pace myself to go slower and ennunciate better, and use fewer filler words, as I would not have had to recount my story on the spot.

Leave a Reply

Your email address will not be published. Required fields are marked *

Spam prevention powered by Akismet