What Can Image Gen-AI Models Teach Us About Image Perceptions?

A Critical Response Post to THE IMAGE: REPRESENTATION, REINCARNATION, REPRODUCTION by Matthias von Loebell, Danial Schatz, Django Mavis, and Sydney Wilkins.

By Micah Sébastien Zhang


A few days ago, I have stumbled across a work by some of my peers — Matthias von Loebell, Daniel Schatz, Django Mavis, and Sydney Wilkins — on the class blog, in which they talked about the significance of images in media, and how can the manipulation of images affect people’s perception. The blog article rolled out smoothly as it took us from the early and general form and definition of images at the start, then to the connections between theories, and it all falls back to the general summary of how is their whole thesis point playing out in the modern, contemporary field of world.

The article chose a sociological point of view when comes to the analysis of images and their effects, which is a proper move in my opinion. Similar perspectives and ways of research could never get old as the time and world are shifting forward. What I found particularly agreeing is their opinion on the essence of images, as they quote it as "a visual abstraction." Through this piece of thought, we can fairly arbitrate the concept of image falling within the classical frame of media mediation, in which images serve as a mediation to a summary of thought(s).

However, in this critical response post, I would like to take a step back and make my way to a summit that grants a holistic and figurative perspective on the conception of images, notably through a rather unusual example — text-to-image generative AI models.

How come? The reason why I’m proposing this peculiar perspective approach is that I personally found the technical process of text-to-image generative AI is similar to the humanistic experience of image perception. Yet before we can go into the comparable details of it, we should first understand how do text-to-image generative AI models usually work.

A research guide from the University of Toronto gave us a pretty comprehensive outlook of the technical process, yet for the sake of convenience, a summary will be also provided below. To be technically focused and more concise, I will only focus on the process for diffusion models.

Diffusion model is a common type among image generative AI models; both Stability AI’s Stable Diffusion and OpenAI’s DALL•E are categorized as diffusion models. Inspired from thermodynamic diffusion, the technical process of a diffusion model includes two methods. The first method — forward diffusion — will declutter and scatter (or "diffuse" according to the manual) the pixels of a normal image into random noises. The machine is learning to recreate the image by reconstructing a normal image from a randomized, noisy version. That is, for example, a normal image of an apple will be diffused into randomized noise and given with the "apple" tag, then from the tagged noisy images, the machine will recreate the normal images upon requests from prompts. Each creation of the image comes from the synthesis of noises, and this will result in different image outputs even with the same prompts.

Through this process, we can partly mirror this to a general humanistic perception of images if we consider images as a mediation to higher-level information. The creation of actual, in-real-life images comes from the diffusion of the higher-level knowledge in our brains; those pieces of higher-level knowledges are, in my opinion, properly stored as a culmination of humanistic experiences since one’s birth. Upon perceiving an image, we’re essentially transforming a two-dimensional plane of "diffused noise" (this could be any form of visual representation) as pieces of higher-level knowledges in our brain, yet they could be deviated from the original intention and meaning.

On this note, images are indeed better compared to pure texts. In this example, if I put the word "apple" here, my viewers could have different perceptions to the term: maybe it’s a red apple; maybe it’s a green one; maybe it’s even Apple Inc. that made iPhones. Images can provide a more directional rectification towards transmitting higher-level thoughts and concpets. Nevertheless, it is still incomparable to direct transmissions of higher-level thoughts as it falls within the constraints of diffusion of thoughts.

Going back to the article by my peers, one of their claims is that the values of images are diminishing along with the mass production of them. Quoting from the Frankfurt School thinker Walter Benjamin, their claim is reflecting on his claim that viewing artist labour "as the process by which art is imbued with meaning." Reflecting to my claim in this article, the mass production of images may symbolize technological advancements on means of media production and the media industry itself, yet considering this holistic overview, it may also make the transmission of information into a more chaotic stage where the mass produced images bear incomplete representations of higher-level informations.

As new media studies scholars, it is important to note down the challenges currently faced by our field of study, yet having new perspectives that challenge pre-constructed perceptions may provide us more beneficial insights to shape our field of study — and sometimes it could mean taking a step back and seeing things as a whole to find general patterns.

Works Consulted

“Research Guides: Artificial Intelligence for Image Research: How Generative AI Models Work.” University of Toronto Libraries, guides.library.utoronto.ca/image-gen-ai. Accessed 29 Nov. 2025.

Von Loebell, Matthias, et al. THE IMAGE: REPRESENTATION, REINCARNATION, REPRODUCTION | Approaches to Writing for Media Studies. 29 Sept. 2025, blogs.ubc.ca/mdia300/archives/115. Accessed 29 Nov. 2025.

Image Acknowledgement

The header image was produced by Jonathan Kemper on Unsplash.

2 thoughts on “What Can Image Gen-AI Models Teach Us About Image Perceptions?”

  1. Hi Micah! I really like how you bring text-to-image AI into the discussion as a way of understanding how we perceive images. Your comparison between diffusion models and human interpretation makes the idea of images as “visual abstractions” much easier to grasp. I also found your point about mass-produced images especially interesting. Rather than only seeing them as a sign of technological progress, you show how they can also create more incomplete forms of meaning. This perspective adds a fresh layer to the original blog post and helps deepen the conversation about what images actually do in today’s media environment.

  2. Really good read! Your reflection’s move toward a “zoomed-out” view of images is really interesting, especially when paired with your comparison between diffusion models and human perception. Your point about mass production creating representational chaos makes me wonder whether generative models amplify this problem or simply reveal that it was always there in human perception. If both machines and humans reconstruct images from partial, noisy information, how can we tell whether our interpretations reflect the world or are just the limits of our own diffusion process?

Leave a Reply