Final Project: Describing Communication Technologies

The pdf version: Text and Speech Recognition Technologies.



OCR (Optical Character Recognition)

History (Chaudhuri et al., 2016)

The inception of character recognition technology occurred in the mid-1940s with the advent of digital computers. Initial efforts in automating character recognition focused on machine-printed text or a limited set of clearly defined handwritten text and symbols.

By the mid-1950s, Optical Character Recognition (OCR) machines became commercially available.

The OCR systems introduced between 1960 and 1965 were commonly called first-generation OCR, characterized by constrained letter shapes specifically designed for machine reading.

In the mid-1960s and early 1970s, second-generation reading machines emerged. These systems recognized regular machine-printed characters and possessed capabilities for recognizing hand-printed characters.

The mid-1970s saw the introduction of third-generation OCR systems, marked by significant advances in hardware technology that achieved low-cost and high-performance objectives. This led to the developing of sophisticated OCR machines catering to a broader user base.

Despite the commercial availability of OCR machines in the 1950s, only a few thousand systems were sold until 1986, primarily due to the high cost of these systems.

Substantial progress in OCR systems occurred during the 1990s, leveraging new development tools and methodologies empowered by the continuous growth of information technologies. In the early 1990s, combining image processing, pattern recognition techniques, and artificial intelligence methodologies significantly enhanced OCR capabilities.

Today, advancements continue with more powerful computers and precise electronic equipment, including scanners, cameras, and tablets.

Applications

Assistance for the visually impaired: OCR, coupled with speech synthesis systems, empowers individuals with visual impairments to comprehend printed documents (Chaudhuri et al., 2016).

Automated license plate recognition: Several systems for automatically scanning car number plates are available. Unlike other OCR applications, the input image is not a natural bilevel image and needs to be captured by an extremely fast camera (Chaudhuri et al., 2016).

Automated cartography: Using computer technology and algorithms to create, analyze, and interpret maps. Recognizing characters from maps poses unique challenges due to the intertwined symbols and text that appear at varying angles and fonts.

Language translation: This application facilitates the translation of printed or handwritten text into different languages.

Banking applications: The primary application of OCR is found in the banking industry. The system verifies the customers’ identities by comparing their signatures to patterns stored in a reference database (Chaudhuri et al., 2016). Moreover, OCR proves highly beneficial in ATMs, where designated customers can use their mobile phones to scan and deposit checks (Sarika et al., 2021).

Document digitization: Document digitization transforms images into digital format, aiming to enhance materials through OCR. According to Sarika et al. (2021), document digitization is primarily employed to modernize libraries and provide online services utilizing OCR.

Challenges (Awel & Abidi, 2019)

Many OCR techniques are facing accuracy problems due to the following reasons:

Complex scenes: Separating non-textual content (buildings, painting, and more) in the input data complicates preprocessing, thereby impacting character recognition.

Varying lighting conditions: Images captured by cameras are susceptible to the influence of varying light conditions and shadows, complicating the task of detecting and segmenting characters.

Skewness and rotation: Photography with a camera is often affected by incorrect image angles, leading to inaccurate results.

Blurring and degradation: Blurring and degradation occur when pictures are taken from a distance, attempting to capture moving subjects and lacking proper focus.

Diverse fonts and styles: Characters connected or overlapped, such as Arabic or Italic, create challenges to accurately detect and separate words into individual characters.

Multilingual settings: Text with characters spanning multiple environments, such as those found in languages with a large number of characters like Chinese, present unique challenges.

Damaged documents: When dealing with input documents that are extremely aged and damaged, the presence of extensive noise often results in the unintentional loss of essential content or characters.

Educational implications

Text digitalization: OCR transforms printed or handwritten text into machine-readable digital content. This process facilitates the establishment of digital libraries, providing students and educators with effortless access to extensive information.

Accessibility and inclusivity: OCR plays a pivotal role in the utilization of text-to-speech technologies and various assistive tools, enhancing the accessibility of educational materials for individuals with visual or learning disabilities.

Document administration: OCR simplifies the management of extensive document collections within educational institutions, improving administrative efficiency and minimizing the time dedicated to manual document processing.

Automated grading: OCR can streamline the grading and assessment procedures, saving time for educators while facilitating quicker feedback to students.

Language learning: Utilizing OCR can enhance student language acquisition by facilitating the translation of printed or handwritten text into various languages.


ASR (Automatic Speech Recognition)

History (Wang et al., 2019)

In 1952, Bell Labs in the United States achieved a ground-breaking milestone by developing the first truly comprehensive speech recognizer.

A rudimentary voice-activated typewriter and a speaker-independent vowel recognizer were developed in the following several years. During this period, speech recognition systems were limited to recognizing single words or vowels.

The 1960s witnessed the emergence of Japanese laboratories showcasing their ability to construct specialized hardware for speech recognition tasks. Noteworthy examples included “the vowel recognizer of Suzuki and Nakata…, the phoneme recognizer of Sakai and Doshita…, and the digit recognizer of NEC Laboratories” (p.2). Kyoto University’s efforts laid the groundwork for future continuous speech recognition systems.

Around the 1970s, the development of linear prediction technology, dynamic programming technology, and Linear Predictive Coding (LPC) cepstrum fueled the rapid evolution of speech recognition for speaker-specific tasks with isolated words and small vocabulary.

Researchers began to expand speech recognition for non-specific-speaker tasks but met serious difficulties with existing technologies. In the mid-1980s, statistical HMM technology gained widespread attention and application in speech recognition, marking significant progress. The SPHINX system, particularly, achieved a breakthrough in Large Vocabulary Continuous Speech Recognition (LVSCR), standing as a milestone.

During the 1990s and early 2000s, extensive research was conducted on the HMM-GMM framework, which dominated the field of speech recognition until the application of deep learning techniques.

More recently, deep learning has brought notable improvements and new developments in speech recognition. In 2001, a research team from Microsoft Research Institute introduced the (CD)-DNN-HMM system, demonstrating significant performance gains compared to the traditional frameworks.

Applications

Voice command systems: Technologies that enable users to interact with electronic devices or software using spoken commands, providing a hands-free way to control and operate devices, applications, or services. For example, intelligent virtual assistants like Siri, Google Assistant, and Alexa (Vadwala et al., 2017).

Dictation systems: ASR is used in dictation applications to convert spoken words into written text. Google’s speech-to-text service and Apple’s dictation feature are good examples of dictation systems.

Accessibility services: ASR contributes to accessibility features, making technology more inclusive for individuals with disabilities who may struggle with traditional text input methods (Fendji et al., 2022).

Telecommunications: ASR is integral to Interactive Voice Response (IVR) systems, commonly used in customer service and support. Ibrahim and Varol (2020) claimed that ASR allows verbal interaction between users and automated systems and directs users to appropriate operators based on their needs.

Transcription services: ASR is widely used in transcription services, automating the process of converting spoken content into written form. For example, in the healthcare industry, medical transcriptionists can capture reports verbally instead of hand typing (Ibrahim & Varol, 2020).

Language learning: ASR in language learning apps helps users improve their pronunciation by analyzing their spoken words and providing instant feedback. Simulated conversations with virtual characters or AI chatbots also allow learners to practice speaking and listening skills.

Challenges (Vadwala et al., 2017)

In order to attain high accuracy, efficient speech recognition systems must cope with challenges associated with:

Vocabulary: The complexity, processing demands, and accuracy of a speech recognition system are influenced by the size of its vocabulary. Applications requiring very large dictionaries must have enough vocabulary to reach high accuracy.

Channel variability: One aspect of variability pertains to the perspective from which the sound wave is emitted. Challenges arise from noise that changes over time, diverse types of microphones, and other factors that influence the content of the sound wave.

Utterance approach: How words are articulated, individually or in a connected fashion, holds significance. For example, an isolated word ASR system will be extremely aberrant for multiple-word inputs.

Utterance style: All humans speak differently due to personal terminologies, unique ways to emphasize, or emotions. Natural speech, whether spontaneous or extemporaneously generated, includes disfluencies and poses a greater recognition challenge than continuous speech.

Speaker model: All speakers have their specific voices. Speaker-independent systems offer greater flexibility but pose more challenges to develop and yield less accuracy than speaker-dependent systems, which are designed for a specific speaker.

Educational implications

Remote Learning: ASR empowers seamless real-time communication and feedback between students and teachers, enhancing virtual classrooms and fostering interactive learning environments.

Accessibility and inclusivity: ASR plays a crucial role in improving accessibility for students facing disabilities, especially those with challenges related to speech or language, such as dyslexia (Ibrahim & Varol, 2020).

Multilingual education: ASR systems can facilitate multilingual education by offering language assistance and feedback across various languages. This is especially advantageous in educational environments characterized by linguistic diversity.

Automated grading: ASR can be utilized to automate the evaluation of spoken assignments or presentations. This frees up time for educators and ensures students receive prompt and consistent feedback.

Language learning: Utilizing ASR offers students real-time feedback on pronunciation and fluency, elevating language acquisition and improving speaking skills.


References

Awel, M. A., & Abidi, A. I. (2019). Review on optical character recognition. International Research Journal of Engineering and Technology (IRJET)6(6), 3666-3669

Chaudhuri, A., Mandaviya, K., Badelia, P., & Ghosh, S. K. (2016). Optical character recognition systems. In Studies in fuzziness and soft computing (pp. 9–41). https://doi.org/10.1007/978-3-319-50252-6_2

Fendji, J. L. K. E., Tala, D. C., Yenke, B. O., & Atemkeng, M. (2022). Automatic speech recognition using limited vocabulary: A survey. Applied Artificial Intelligence36(1), 2095039.

Ibrahim, H., & Varol, A. (2020, June). A study on automatic speech recognition systems. In 2020 8th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1-5). IEEE.

OCRology. (2021, December 10). A quick history of OCR – OCRology – Medium. Mediumhttps://medium.com/ocrology/a-quick-history-of-optical-character-recognition-ocr-c916d58e2170

Sarika, N., Sirisala, N., & Velpuru, M. S. (2021, January). CNN based optical character recognition and applications. In 2021 6th International conference on inventive computation technologies (ICICT) (pp. 666-672). IEEE.

Vadwala, A. Y., Suthar, K. A., Karmakar, Y. A., Pandya, N., & Patel, B. (2017). Survey paper on different speech recognition algorithm: challenges and techniques. Int J Comput Appl175(1), 31-36.

Wang, D., Wang, X., & Lv, S. (2019). An overview of End-to-End Automatic Speech recognition. Symmetry11(8), 1018. https://doi.org/10.3390/sym11081018

Task 9: Network Assignment Using Golden Record Curation Quiz Data

Initial observations

Please click on the interactive elements to view the screenshots of data visualization.


Specific analysis

Among the other 21 participants, Didy Huang and Garth von Buchholz had the most similar choices as me: seven same songs out of the ten pieces.

Didy Huang: Track 3, 6, 7, 9, 14, 18, 24.

Garth von Buchholz: Track 3, 7, 14, 18, 20, 23, 24.

Songs we all chose: Track 3, 7, 14, 18, 24.

On the other hand, Richard Payne had the most different choices than me. We only have one same song out of the ten pieces we selected: track 9.

I was curious about the rationale behind our choices, so I looked into their task 8 posts and summarized our criteria. I noticed that Richard’s selections of songs in task 8 were highly different from his choices in the dataset; therefore, I chose to compare Michaelle Haughian’s criteria instead since her list was the second most different from mine (we only had three same songs).

According to the comparison table, it is evident that I had more similar parameters as Didy and Garth than Michaelle, which explained at least part of the reason why we had similar or different choices.


Reflection

Reasons behind the similar responses

Similar parameters: As discussed above, the similar criteria we made in task 8 were one of the reasons why we chose many similar songs in our lists.

Cultural influences: People who grow up in the same geographical area may have similar music preferences because of early exposure to similar musical influences.

Emotional resonance: People may connect with music on an emotional level. If the genre, melody, rhythm, or lyrics resonate with their emotions and experiences, they may choose similar music. For example, Michaelle emphasized her “own emotional relations to the sounds” when choosing the pieces (Haughian, 2023).

Personal preference: Music choices are undoubtedly influenced by personal preferences due to personality, values, and other factors. For example, Laura mentioned in her post that she chose “songs familiar and upbeat” based on her bias (Orlowski, 2023).

Network visualization

One of the advantages of network visualization is that it can explicitly reveal patterns and clusters within data. Networks represent relationships and connections in a visually comprehensible manner, making it intuitional to analyze complex data. Also, Network visualization allows users to explore data interactively (zooming in/out, changing node positions, and filtering data), fostering a deeper understanding of the network.

However, in large and densely connected networks (like the one we have here), overlapping edges can hinder readability and create confusion. Also, network visualization tends to emphasize the structural aspects (connections, patterns, relationships, etc.) of a network, therefore overlooking other attributes (e.g., reasons behind the choices) that could be important for understanding the data. Regarding null choices, network visualization may not accurately represent the full scope of the dataset when null choices are not addressed, leading to incomplete or misleading conclusions. Even if creating a new data visualization with input data of “songs you didn’t choose,” the reasons behind these null choices cannot be shown explicitly.


References

Haughian, M. (2023, October 29). Task 8: Golden Record Curation | Text Technologies: The Changing Spaces of Reading and Writinghttps://blogs.ubc.ca/auntiesocial/2023/10/29/task-8-golden-record-curation/

Huang, D. (2023, October 25). Task 8: Golden Record Curation – Didy’s Webspacehttps://blogs.ubc.ca/etec540ddhng/2023/10/25/task-8-golden-record-curation/

Orlowski, L. (2023, October 29). Task 8: Golden Record Curation | ETEC 540 Blog by Laura Orlowskihttps://blogs.ubc.ca/etec540lauraorlowski/2023/10/29/task-8-golden-record-curation/

Von Buchholz, G. (2023, October 28). Task 8: Golden Record curation assignment | ConText | Garth von Buchholzhttps://blogs.ubc.ca/garthvb/2023/10/28/task-8-golden-record-curation-assignment/

Wang, B. (2023, October 18). Task 8: Golden Record curation – Bingying (Iris) Wang-ETEC540https://blogs.ubc.ca/etec540bingyingwang/2023/10/18/task-8-golden-record-curation/

Linking task 8: Golden Record curation – Simon Worley

Links: My post for Task 8; Simon’s post for Task 8

I chose to link Simon’s post because we have similar choices from the 27 songs. Among the ten pieces of music we chose, five were the same. Therefore, I thought it would be a good idea to dive deeper into Simon’s post to find the similarities and differences between our song-choosing criteria.


Tool used

We are both using WordPress on UBC Blogs, so the content-authoring capabilities for our sites should be similar. However, we use different themes, resulting in differences in the end-user interface. For example, his theme’s background and text color are white and black, respectively, while mine is reversed. Also, the text size for my theme is larger than Simon’s, which means my post would appear longer but easier to read on devices with smaller screens. Moreover, the “recent posts” and “recent comments” sections for his theme always appear at the right of the blog content regardless of the changeable page size, making it more convenient for users to check other posts.

For this task, we both primarily used text to present our criteria for choosing the ten pieces of music. I used a Genially interactive map to show the songs I picked and the countries they are from. Simon mainly used text to show his selections and brief reasons, as well as his reflection on this task.


Song list comparison


Criteria comparison


Similarities

1. Universality

We both included the universality of music in our criteria. Simon believed these songs must feature meaningful and universal themes that transcend language and culture. I thought universally understood music would have a better chance of being understood by a wide range of potential extraterrestrial life forms. Therefore, we chose simple and intuitive music unrelated to specific linguistic contexts.

2. Human values

We both appreciated universal human values. Simon mentioned “love, hope, and unity” (Worley, 2023), while I thought of “love, cooperation, and empathy” (Wang, 2023). I believe that incorporating aspects of human values can help represent the best aspects of humanity.

3. Cultural diversity

Both of us paid attention to revealing humans’ cultural diversity. In order to present a more comprehensive representation of human expression, experiences, creativity, and culture, we were mindful to add recordings from various geographical areas.

4. Balance between instruments and human voices

Simon aimed for “a balance between instrumental and vocal recordings” when choosing songs (Worley, 2023). Same for me, I included five songs with human voices and the rest with different instruments. If extraterrestrial life forms have the capacity for audio communication, including human voices and various instrumental sounds could serve as a connection point.


Differences

Iris – Mathematical foundations

In the case of communicating with creatures with no hearing or living in different time scales, adding music with mathematical foundations enables an additional way of communication (Taylor, 2019). Both Bach’s and Beethoven’s music was highly twined with mathematics; I only picked Beethoven’s Fifth Symphony for personal preference and avoiding repetition.

Iris – Music genre diversity

To ensure sending a record with a highly diversified music genre, I included Jazz and blues, Rock and Roll, popular music, classical music, and pieces tied to specific cultural contexts. I believed adding various types of music would increase the chance of being comprehensible to many potential recipients.

Simon – Positive and uplifting

Simon claimed that music with positive and uplifting tones should be prioritized because “these recordings have the potential to bring joy and inspiration to those who encounter them” (Worley, 2023). I agree that expressing peaceful intent might serve as a foundation for building a positive and cooperative relationship.

Simon – Opening message

In addition to the ten pieces of music he chose, Simon suggested that an opening message should be added at the beginning of the record, explaining critical information about Earth and the purpose of sending the record.


References

Taylor, D. (Host). (2019, April). Voyager golden record. [Audio podcast episode]. In Twenty thousand hertz. Defacto Sound.

Wang, B. (2023, October 18). Task 8: Golden Record curation – Bingying (Iris) Wang-ETEC540https://blogs.ubc.ca/etec540bingyingwang/2023/10/18/task-8-golden-record-curation

Worley, S. (2023, October 25). ETEC 540 Task 8 Golden Record Curation Assignment | Simon Worley Blog site ETEC 540https://blogs.ubc.ca/sworley/2023/10/25/etec-540-task-8-golden-record-curation-assignment/

Task 10: Attention Economy


Screenshot


Reflection

“Deception appears in various guises in user interfaces on the web today” (Brignull, 2011, para.1). I did not realize how disastrous a user interface with many deceptions could be before playing User Inyerface. Now, I have a greater appreciation for well-designed user interfaces.

User Inyerface is an intentionally frustrating and challenging online game that put me through counterintuitive tasks, leading to helplessness and irritation. After playing the game, I could not agree more with Brignull’s claim that it is pretty easy to “take our understanding of human psychology and flip it over to the dark side” (para.4). Humans use top-down processing a lot to perceive things by utilizing existing knowledge, experiences, and expectations (Cherry, 2023). User Inyerface takes advantage of human’s top-down processing to design tasks that go entirely against common design principles to get players in trouble.

However, I indeed found this game amusing precisely because of its intentional absurdity. Also, this game encouraged my critical thinking and pushed me to think outside the box, preventing me from relying on a single method or metric to solve problems.


Analysis of User Inyerface

User Inyerface is filled with traps or challenges that subvert common user interface conventions. These traps are meant to confuse players and test their patience. Here are some of the “annoying” designs I found in the game:

(Please click on all the interactive elements for full text, images, and GIFs.)

 


References

Brignull, H. (2011). Dark patterns: Deception vs. honesty in UI design. A List Apart, 338.

Cherry, K. (2023). What is Top-Down Processing? Verywell Mindhttps://www.verywellmind.com/what-is-top-down-processing-2795975#:~:text=Top%2Ddown%20processing%20involves%20perceiving,to%20interpret%20new%20sensory%20information

Linking task 6: An Emoji Story – Richard Payne

Links: My post for Task 6; Richard’s post for Task 6

I chose to link to Richard’s post because I understood his emoji story immediately after reading it. The fact that we both chose famous stories to complete this emoji task might be a good starting point to compare and contrast our tasks and reflections. Therefore, I thought it would be interesting to dive deeper into Richard’s post to see if the points we made in our posts support one another.


Tool used

We are both using WordPress on UBC Blogs, so the content-authoring capabilities for our sites should be similar. However, we use different themes, resulting in differences in the end-user interface. For example, Richard’s page layout is more centralized than mine, resulting in a more concise end-user interface. His theme’s background and text color are white and black, respectively, while mine is reversed. Also, the text size for my theme is larger than Richard’s, which means my post would appear longer but easier to read on devices with smaller screens. Moreover, the “recent posts” section for Richard’s theme always appears at the right of the blog content regardless of the changeable page size, making it more convenient for users to check other posts.

For this task, we both primarily used text to present our reflection on the emoji story-making. Also, we used outside emoji makers to complete the story and inserted a screenshot into our post. I believe Richard applied emojis from another system than iOS because his emojis appeared simplified with a highly uniform style.


Content


Themes we both discussed

1. Using top-down processing

Top-down processing is a cognitive and perceptual process that involves using existing knowledge, experiences, and expectations to interpret new sensory information (Cherry, 2023). Instead of letting the readers interpret the emoji stories based solely on their sensory information, Richard and I made use of top-down processing by choosing popular stories that most people are familiar with. Since readers already knew the storyline, the possibility for them to understand our emoji stories was maximized.

2. Limitations to emojis

We both discussed the limitations of emojis during the emoji-story-making process. Richard mentioned that “symbols, while efficient for instantly conveying a whole meaning [are] however, inevitably ineffective building out and pointing to specific complexities” (Payne, 2023). I found that emojis, as an intersection between words and image depictions, possessed the downsides of both. For instance, applying emojis to convey the meaning of more complicated or abstract concepts is challenging. Also, there is a limited number of emojis to use.

3. Readers’ interpretations

Both I and Richard believed our peers would find it easy to understand our emoji stories due to familiarity. In addition, Richard believed that “people in many cultures might be able to grasp the meaning quickly” even if they did not know the story (Payne, 2023). Aligning with his opinion, I expected that my peers (with various cultural backgrounds) might explain my emoji task in “various but similar words” (Wang, 2023).


Emoji stories comparison

1. Theme choosing

Both of us chose our story theme based on how easy it would be to visualize and its popularity. By choosing a famous story that most people know, we tried to minimize the possibility that our peers get confused when reading the emoji stories. In addition, Richard noticed that the brief storyline made completing this task “incredibly advantageous” (Payne, 2023).

2. Relied on words and ideas

Both of us relied mainly on words when completing this task. We used direct-related emojis for specific concrete words. For example, pig, wolf, and house in Richard’s story and mermaid, princess, and boat in mine.

For verbs and abstract concepts, we used combinations of emojis to convey the ideas. For example, we both used the “SOS” emoji to represent “in danger.” Moreover, we used right-pointing arrows to represent story progression.

3. Order and position

We both used left-right and top-to-down sequencing for our emoji stories, and the passage of time in our stories followed the emoji-writing directions. Moreover, for each emoji chunk, the first symbol represents the actor of the following movements, aligning with Kress (2005)’s claim that being first might indicate being the cause of a behavior. The difference is that Richard used punctuation (colon, comma, ellipsis, and ampersand) to help explain the storyline, while I relied mainly on space and right-pointing arrows.


References

Cherry, K. (2023). What is Top-Down Processing? Verywell Mindhttps://www.verywellmind.com/what-is-top-down-processing-2795975#:~:text=Top%2Ddown%20processing%20involves%20perceiving,to%20interpret%20new%20sensory%20information.

Kress, G. (2005), Gains and losses: New forms of texts, knowledge, and learningComputers and Composition, 2(1), 5-22.

Payne. (2023, October 13). Task 6, An Emoji Story | Rich 540 Text Technologies. Retrieved October 20, 2023, from https://blogs.ubc.ca/540rp/2023/10/13/task-6-an-emoji-story/

Wang. (2023, October 10). Task 6: An Emoji Story – Bingying (Iris) Wang-ETEC540. Retrieved October 20, 2023, from https://blogs.ubc.ca/etec540bingyingwang/2023/10/09/task-6-an-emoji-story/

Task 8: Golden Record curation

The 10 pieces of music I chose:

Please click on the red interactive icon (left of the title) to view the full music list, and click on the country flags to watch individual videos.

 


Parameters and criteria

Human voices

Human speech and the human voice are primary forms of communication on earth. If extraterrestrial life forms have the capacity for vocal communication, including human voices could serve as a point of connection. Also, human voices can express a wide range of emotions and intentions, potentially helping more efficient message passing. Therefore, I included pieces 1, 2, 4, 5, and 7.

Universality

Universally understood music may have a better chance of being comprehensible to a wide range of potential recipients. This means choosing relatively simple and intuitive music unrelated to specific linguistic contexts. So I added pieces 2, 3, 6, 8, and 10 to my list. Additionally, adding music with universal mathematical foundations enables communication with creatures who don’t have hearing or live in different time scales (Taylor, 2019). Since mathematics was highly incorporated into Beethoven’s music (Chang, 2007), I included the number 9.

Human diversity

In order to present a more comprehensive representation of human creativity and culture, I chose to add songs with different types and backgrounds. My list included Jazz & Blues (number 3), Rock and Roll (number 4), popular music (number 4 and number 5), and classical music (number 9). Moreover, I added pieces tied to specific cultural contexts, emphasizing our cultural diversity. For example, the second song shows the culture of Navaja Indians, and piece 6 represents Chinese culture.

Human values

Incorporating aspects of human values, such as love, cooperation, and empathy, can help represent the best aspects of humanity and convey the idea that we aspire to live by these principles. Expressing peaceful intent may serve as a foundation for building a positive and cooperative relationship. Therefore, I selected songs that beautifully depict highly abstract terms. In detail, pieces 1 and 7 tell about marriage, love, and memory; number 2 is about healing; and pieces 6 and 10 depict the nature of the earth.


References

Chang, C. C. (2007). Fundamentals of piano practicehttps://fundamentals-of-piano-practice.readthedocs.io/about.html

Taylor, D. (Host). (2019, April). Voyager golden record. [Audio podcast episode]. In Twenty thousand hertz. Defacto Sound.

Task 7: Mode-bending

This is my original task 1: What’s in your bag?

Here is the new version:


Reflection


Process

My initial idea was to create a Twine game with audio, sound effects, and images since I had just learned to use Twine for the previous task and enjoyed the game-making process. Also, I believed that this Twine game would align with the New London Group’s (1996) emphasis on multimodal representation by including text, audio, and images. However, I realized that a Twine game was still mainly a visual design supplemented by audio elements. Therefore, I changed my mind and decided to create a production that was primarily audio and supported by visual elements if needed.

According to The New London Group (1996), all meaning-making is multimodal. To align with this claim as much as possible, I decided to add my native language (Mandarin) in my work as “spoken language is a matter of audio design as much as it is a matter of linguistic design” (The New London Group, 1996, p.81).

Thinking further about the audio design, I decided to add sounds made by my items to make my task more creative. After deciding things to include in my work, I recorded audio clips of me describing the Mandarin names and functions of my items, then recorded how these items sound when I use them. After that, I combined the audio clips (using Veed and Jianying). In addition, I added corresponding images and visual effects to supplement my audio design. To abide by my initial decision that this product is primarily audio, I did not add any captions (visual elements).


Potential benefits of engaging in mode-changing

1. Multimodal content is more engaging.

Multimedia presentations that combine text, images, and audio can create a more immersive and appealing communication experience than text-only ones.

2. Multimodal content aids in comprehension and memory.

Different people absorb information in distinct ways. Combining multiple modes can increase the chances of people understanding and retaining the information. For example, using visual aids alongside spoken explanations in presentations can help clarify and reinforce key points. Moreover, Li (2020) claimed that multimodal content helps information and deepen content knowledge.

3. Mode-changing enables better contextual adaptation.

Certain information is better suited for different modes. For instance, statistical data is better illustrated by visual tables or graphs, while audio explanations convey abstract concepts better. I believe that combining spoken pronunciation and visual text is a better way to show my audience a new language.

4. Mode-changing facilitates multimodal literacies.

Multimodal literacies, the inevitable forward trajectory of literacy, are the main focus of 21st-century literacy (Albers & Sanders, 2010). The New London Group (1996) also claimed that “multimodal literacies … are increasingly important to all communication, particularly the mass media” (p.62). According to Kress (2005), various modes available for delivering information, yet provide specific potentials for communication, have different limitations. Therefore, mode-changing enables better cooperation between various modes; as a result, facilitating multimodal literacies.


Potential challenges of engaging in mode-changing

1. Complexity and efficiency

Converting content between different modes can be time-consuming and complicated, requiring additional software or skill. For instance, making a creative audio-based presentation (recording and editing audio clips) takes more time than just typing the text.

2. Inconsistent user experience

People absorb information in various ways. Some individuals are more visual learners, while others prefer auditory learning. Content delivered using different modes may result in inconsistent user experience. For example, visual learners might find my audio-based work hard to follow and remember, leading to confusion or frustration.


References

Albers, P., & Sanders, J. (2010). Multimodal literacies : An introduction. Literacies, the Arts, and Multimodality3, 1–25. https://secure.ncte.org/library/NCTEFiles/Resources/Books/Sample/32142Intro_x.pdf

Kress, G. (2005), Gains and losses: New forms of texts, knowledge, and learningComputers and Composition, 2(1), 5-22.

Li, M. (2020). Multimodal pedagogy in TESOL teacher education: Students’ perspectives. System94, 102337. https://doi.org/10.1016/j.system.2020.102337

The New London Group. (1996). A pedagogy of multiliteracies: Designing social futures. (Links to an external site.) Harvard Educational Review 66(1), 60-92.

Linking task 5: Twine Task – Louisa Green

Links: My post for Task 5; Louisa’s post for Task 5

I chose to link to Louisa’s post because we both appreciated Bolter’s (2001) idea that traditional books with single orders of sections and pages are linear, hierarchical, and static, whereas hypertext provides a story network that is multiple and associative. So, diving deeper into Louisa’s post would be interesting to find the similarities and differences between Twine games and game-making experiences.


Tool used

Louisa uses Wix, whereas I’m using WordPress on UBC Blogs. Based on my experience with Wix, I see some content-authoring differences between these platforms. Wix provides website-building tools and hosting for users’ websites, while WordPress offers users more control over their websites’ hosting. Therefore, people using WordPress are responsible for setting up and managing their hosting environment. While both platforms provide a wide range of templates and themes, WordPress is more flexible and extensible, allowing skilled users to customize websites, add plugins, and modify the code.

Our end-user interfaces appear different, too. Louisa presented all her tasks on one page, while I used separate pages to show specific tasks. Also, her page is equally divided into two sectors with different colors, with the left showing the task title and the right displaying task content. In contrast, my task content follows the title vertically, appearing at the center-left of the page. Our text sizes are similar; both are appropriate for reading on any device.

For this task, we mainly used text to reflect on the game-making process. In addition, I added pictures to help explain my words and inserted an online version of my Twine game for a better user experience.


Content


Themes we both discussed

1. Differences between books and hypertext

We both appreciated Bolter’s (2001) ideas about the differences between traditional books and hypertext. I mentioned that “compared to traditional books with single orders of sections and pages, hypertext provides a story network” (Wang, 2023, para.1). Louisa noted that printed text is hierarchical and static, while hypertext is multiform, connected, and responsive to readers.

2. Interactivity of hypertext

We both talked about the interactivity of hypertext. Louisa said hypertext allows readers to interact with the story and choose different options. I tried to give my readers “the illusion of control” by letting them choose their preferred links (Bolter, 2001, p.43).

3. Twine is user-friendly

We both mentioned that Twine is a user-friendly tool for creating interactive, nonlinear stories. “Twine stories using only text are fairly simple to navigate” (Green, 2023).


Differences between our reflections

1. Experience using Twine

Louisa mentioned in her post that she has used Twine previously in various courses, while this was my first time using Twine to make a text-based game. Louisa also found adding images alongside text is more complicated than she remembered. I searched the online tutorial and followed the detailed steps to insert images in my game.

2. How we come up with the storyline

Louisa said that her story was “produced out of a stream of consciousness on [her] part” (Green, 2023). In contrast, I came up with my story topic based on my psychology background and laid out a network of ideas with the aid of textbooks.

3. Opinions about game-playing

Louisa talked about her experience playing The Temple of No and shared her opinion about game playing. She enjoyed the casual language and the “escape from the real world” feeling experienced in the game, which she thought was important for game-playing (Green, 2023).


Twine Game

Please click on the interactive elements below for details (text and images).


References

Bolter, J. D. (2001). Writing space: Computers, hypertext, and the remediation of print. Lawrence Erlbaum Associates.

Green, L. (2023). Tasks | UBC Met ETEC 540 Tex. Ubc Met Etec 540 Tex. Retrieved October 12, 2023, from https://louisaagreen.wixsite.com/ubc-met-etec-540-tex/tasks

Wang, B. (2023, October 4). Task 5: Twine Task – Bingying (Iris) Wang-ETEC540https://blogs.ubc.ca/etec540bingyingwang/2023/10/04/task-5-twine-task/

Task 6: An Emoji Story


Reflection

According to Kress (2005), “words are highly conventionalised entities, and only exist in that manner” (p.15). Therefore, words are limited and nearly empty of meaning, and the meanings must be filled with readers. On the other hand, “there is an infinitely large potential of depictions — precise, specific, and full of meaning” (Kress, 2005, p.16).

Before completing this task, I thought using emoji to represent plots would be more similar to using depictions than words. However, I found a limited number of emojis to use, and most of them convey specific meanings. Therefore, emojis are more like an intersection between words and image depictions, possessing the upsides and downsides of both.

As Bolter (2001) claimed, different people may interpret the same image message in different words, and people who speak different languages may have similar picture writing. Emojis, as an intersection of words and images, might evoke mixed reactions among readers. I expect that my peers might explain my emoji task in various but similar words since most emojis express particular meanings. Moreover, due to the cultural, linguistic, and experience differences, my peers might have very different emoji writing than mine for the same story.


I relied mainly on words and ideas

After deciding on the storyline, I broke the story into its essential elements, such as characters, settings, and emotions. To ensure that my peers could follow the storyline, I focused on the most critical aspects of the story and kept emojis concise.

For concrete words, such as movie, love, and castle, I directly searched the word using the Emoji Keyboard. Words with no linked emojis pushed me to think out of the box and convey the meaning using easily recognizable emojis. For words with multiple emojis, I chose the most straightforward one to maintain clarity.


Order and position

For each chunk of my emoji story, the first symbol represents the actor of particular movements, aligning with Kress (2005)’s claim that “being first may means … being [the] cause of an action” (p.12).

I also used left-to-right and top-to-down sequencing for my emoji story, the same as my writing sequences. I believe this unconscious decision for writing order was influenced by my language (English and Mandarin), aligning with Boroditsky (2011)’s claim that languages affect people’s direction of writing. Moreover, the passage of time in my emoji story follows my writing direction. Therefore, I started with the title and completed the task following the ongoing plots.


How did I choose the theme?

Honestly, I chose the work based on how easy it would be to visualize and its popularity. I initially tried to work on Spider-Man: Across the Spider-Verse, which I watched most recently, but I couldn’t figure out a way to represent all the different versions of Spider-Man. As I mentioned before, there are limited numbers of emojis to use, restricting my ability to describe the storyline. Therefore, I chose a famous story that most people know, minimizing the possibility that my peers get confused when reading my emoji story.


References

Bolter, J. D. (2001). Writing space: Computers, hypertext, and the remediation of print (2nd ed.). Lawrence Erlbaum Associates.

Boroditsky, L. (2011). How language shapes thoughtScientific American, 304(2), 62-65.

???? Emoji Keyboard Online ???? ???? ???? ???? – Click to copy emojis. (n.d.). https://emojikeyboard.io/

Kress, G. (2005), Gains and losses: New forms of texts, knowledge, and learningComputers and Composition, 2(1), 5-22.

Spam prevention powered by Akismet