The Golden Record data in Palladio was not initially clear to me, I first needed to learn how to decipher the data by reading other Palladio user guides.
I played with the facets (curator vs. track. vs community) to generate tables and graphs that would provide me an overview of the most selected songs to find connections. This did not add more insight as I was lacking any context behind song selection. I then arranged the data to display weighted track nodes (biggest on the bottom) and compare my selection with that of my peers.
I thought I could find a pattern or a link to more than one song. Without intel on why curators chose the songs, the idea was that perhaps I could identify those who used the same criteria as I had by the degree of connectivity of tracks. However, that would presume that their rationale was the same as mine and I had no corroborating information to support this. I based my track selection on feeling, itself intangible and immeasurable. I admitted that my selection method was flawed as it was solely based on my perception and had no qualitative data.
The main problem with the Golden Record Palladio data is that other than seeing node and edge information, and the limited ability to play with facets, the data revealed “who” (curators) and “what” (track selection) but not “why” (criteria). In the real world, this small sample could not be considered representational as it lacks values that would permit a proper analysis. Even if used as predictive model, it would fail in its primary task, which was to chose songs that can, as much as possible, encapsulate the human race.
Two strands of thinking tie together here. One is that the algorithm creators (code writers), even if they strive for inclusiveness, objectivity and neutrality, build into their creations their own perspectives and values. The other is that the datasets to which algorithms are applied have their own limits and deficiencies. Even datasets with billions of pieces of information do not capture the fullness of people’s lives and the diversity of their experiences. Moreover, the datasets themselves are imperfect because they do not contain inputs from everyone or a representative sample of everyone… creating a flawed, logic-driven society and that as the process evolves – that is, as algorithms begin to write the algorithms –(Rainie and Anderson, 2017)
This implies that one cannot justify or prove that the song selection is inclusive of all cultural, socio-economic, political factors, or even if those factors played part. Further, assumptions and links to song choice cannot be made without a clear profile of the curators themselves because biases cannot be identified.
I was intrigued by the possibility of adding more data into Palladio, integrating values like curator gender, class, etc. Additionally, the Golden Record Curation Data Gathering “Quiz” might include multiple choice questions explaining song selection as facets. For example:
Which of the following did you mostly base your song selection on:
-
-
- Personal preference
- Cultural representation
- Song popularity
- Other.
-
Even then, these values would be biased by my own thought process and what I assume are plausible choices.
If algorithmic bias is merely a data problem, the often-touted solution is to de-bias the data pipeline. However, data “fixes” such as re-sampling or re-weighting the training distribution are costly and hinge on (1) knowing a priori what sensitive features are responsible for the undesirable bias and (2) having comprehensive labels for protected attributes and all proxy variables. (Hooker, 2021)
The original tracks for the Golden Record where chosen by renowned astrophysicist Carl Sagan and his team of first world, successful and educated astronomers, sound engineers, musicologists, record executives, journalists and artists. Were they qualified to make such an important decision on behalf of mankind? Probably. However, much like the Palladio data, it was very exclusive from the start.
References
Conroy, M. (2021). Networks, Maps, and Time: Visualizing Historical Networks Using Palladio. Digital Humanities Quarterly, 015(1). http://www.digitalhumanities.org/dhq/vol/15/1/000534/000534.html
Rainie, L., & Anderson, J. (2017, February 8). Code-Dependent: Pros and Cons of the Algorithm Age. Pew Research Center: Internet, Science & Tech; Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2017/02/08/code-dependent-pros-and-cons-of-the-algorithm-age/
Hooker, S. (2021). Moving beyond “algorithmic bias is a data problem.” Patterns, 2(4), 100241. https://doi.org/10.1016/j.patter.2021.100241
Wikipedia Contributors. (2018, November 18). Voyager Golden Record. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/Voyager_Golden_Record