When you first look at the visualizations, there are large amounts of nodes and edges that make it difficult to see all the links between nodes (participants) and the music pieces that they chose. When you chose a community, this “cleans” up the data a bit and makes it easier to visually see the links between nodes. The big thing I noticed is that the generation of communities by the software is based on the similar music choices made by each node (person). However, the visualization doesn’t capture the reasons behind the choices made by each node. It only shows that there is a link between each node and the music choice. It doesn’t show why the link was made between each node (eg. Personal choice, genre, instruments, beat etc). Hence, there is a lot of information missing from this graph. We would need the raw data and the posts made by each person in order to capture the reasons behind their choices. We can only infer why the responses are similar, but without the raw data, we cannot truly know. A graph doesn’t show all the data points and the reasons behind the responses.
One thing I noticed is that this graph is not a weighted graph. Instead, I would categorize it as something that is more closely related to unidirectional graph (every though there are no arrows). Each link within a community “points” from a person (node) to a music piece (the other node). I say its unidirectional because the music piece didn’t choose the person, the person choice the music piece.
The biggest takeaway I had from the statistic courses I took was how easily data is manipulated. It is very to change parameters so that you get a certain conclusion from the data that fits with what you are looking for. Algorithms are affected by the biases of people that create them and train them. The way someone gathers data, the types of questions, things that one omits, includes, or assumes, and types of programs used all affect what the graph/data shows. This makes it very easy to manipulate and cater your data to an objective. We often seen stat headlines (eg. 60% of people agree that pineapple shouldn’t be on pizza) that only includes the conclusion. Unless we get to read on method and procedures of the study and the raw data, there could be tons of reasons that the conclusion was made. In the pineapple on pizza example, it is possible that the type of questions asked forced people to answer the question or pick a “side.” The program used could have also analyzed data so that a certain outcome was achieved. This can group people in certain ways and either empower them or divide them.
In a political context, this type of “community” forming algorithm can have positive and/or negative impacts. If we look at the issues of conspiracy theories/anti-vaxx/misinformation spreading, it because of how the algorithm works. Just like how this program which the curated music lists connected people and made communities, people who have certain searches will have websites/groups shown to them who have similar searches (like the Google algorithm using Spider). This can lead to confirmation bias and spread of misinformation when you have groups of people validating each other’s thoughts. The algorithm is trained to provide information and data based on what other people before have searched and clicked on. All the hyperlinks that are connected to a page is what the algorithm uses to showcase potential things a person wants to see. On the other hand, finding communities that empower and help others is very beneficial. There are tons of online platforms that cater to people looking for certain things. Of course, this is assuming everyone has an agenda and is trying to manipulate things to suit their interest. While there are tons of people that do that, there are also tons of other people who are just trying to find answers and avoid as much bias as possible.
In terms of “null” data points, unless the data is being included in the graph, I don’t see it being reflected in the data. The limitation of the internet, algorithm, and search engines is that they are unable to make changes without actual data points given. They can’t reflect on things without having data given to them. It is possible for them to use the data given to them to infer possible reasons, but unless concrete data is given, telling them “these are the reasons,” it will be hard to accurately interpret. That is why it is important to have data sets/points that show the reasons why something was chosen and why something wasn’t chosen. This allows for a full picture of the graph.
One response to “Task 9: Network Assignment”