Task 11: Text-to-Image

Prompt: Podium winners of the all-around gymnastics competition in upcoming Paris 2024 Olympics

Prompt: Border collie swimming

With the upcoming Paris 2024 Olympics and being a long-time follower and fan of the U.S. Women’s Artistic Gymnastics team, my first prompt was: Podium winners of the all-around gymnastics competition in the upcoming Paris 2024 Olympics.

I assumed that biases and stereotypes would be apparent in the generated image, but I was hopeful that I may be proved wrong. I hoped that the image would contain three gymnasts, including Simone Biles, an African American gymnast who is heavily favoured to win the gold medal in this competition (Macmillan, 2024). Given that Simone has been dominant in her sport for nearly a decade (this will be her third Olympic games) (Simone Biles, 2024), I was hopeful that images of Simone would have been included in Craiyon’s training data and included in the generated image.

The generated image includes four, lighter-skinned gymnasts, none of whom are clearly identifiable and none of whom are Simone Biles. This indicated Craiyon’s lack of predictive ability, lack of associative ability, and stereotypical representations in its training data.

Unlike ChatGPT, which was able to predict winners (and good predictions in my opinion), the generated image did not indicate a clear prediction of the winners, as prompted. As a test to see if pictures of Simone Biles were included in Craiyon’s training data, I entered a second prompt including just her name. Several images of Simone Biles appeared, indicating that pictures of her were included in Craiyon’s training data. Upon reflection of my prompt, it did not explicitly include Simone’s name, but it included several words associated with Simone Biles – the words gymnastics, Olympics, and winner. However, Craiyon’s output was solely based on the words used, and not based on unwritten associations or inferences. The program cannot generate something that isn’t prompted. This also made me wonder how images are tagged and how tagging may be based on distinguishable people and objects in images, versus the news, events, or points in time that images may be associated with.

The generated image includes four-lighter skinned gymnasts, who also appear slender and long-limbed. This highlights stereotypical representations in Craiyon’s training data as the sport has historically been dominated by lighter-skinned gymnasts with slender body types being preferred. This highlights the first layer of bad algorithms as described by O’Neil (2017), which encompasses unintentional problems that reflect cultural biases. The image of lighter-skinned and slender gymnasts perpetuates stereotypes of who can be successful in the sport of gymnastics and the body type that gymnasts ‘should’ have. Note that in my prompt, I did not specify that I wanted an image of women’s gymnastics, and yet only women appeared in the generated image. This also highlights the stereotype of women doing gymnastics more than men, further perpetuating the stereotype and reflects unbalanced training data.

Craiyon was also ineffective at generating realistic images of people, as the gymnasts’ faces and limbs are…far from realistic. It performed poorer than expected in this regard and seems to be a result of cutting and pasting different images together.

Other generated images of this prompt included the Olympic rings and the Eiffel tower, but it seems as though my prompt was too complex for Craiyon as it generated pictures of either gymnasts or Paris or Olympic icons, but not all three together. This highlighted a lack of understanding or a lack of training data with clear associations between Paris and gymnastics.

My second prompt was : Border collie swimming.

It is evident that Craiyon’s training data included dogs and different breeds (border collies being one of them) as Craiyon successfully returned an image of a border collie. Similar to the stereotypes identified in my first prompt, the returned image of the border collie depicts a ‘perfect’ border collie with black/brown fur and symmetrical white markings. This could lead to skewed representations of dogs in the world, particularly when it comes to people picking their dogs that match these ‘picture perfect’ representations. Dogs with unsymmetrical markings or unusual colouration may be seen as less desirable.

As my second prompt was less complex, Craiyon was able to understand the association between the words in my prompt, and generated an image of a border collie swimming. Upon closer look however, the dog’s right leg is raised near its head, as if doing a ‘front crawl’ stroke, which is a swimming stroke performed by humans. This indicates that Craiyon’s training data likely included more humans swimming compared to dogs swimming. The image humanifies a swimming dog and doesn’t accurately capture the ‘doggy paddle’, which as far as I know, is the only stroke they know!

Leave a Reply