Geographically Weighted Regression

Using Regression Analysis to Understand Children’s Social Scores in Vancouver, BC

Regression analysis is used to spatially model varying relationships in order to answer complex questions. Tools such as Ordinary Least Squares (OLS) regression and Geographically Weighted Regression (GWR) not only help with addressing where a phenomenon occurs, but also help to potentially clarify why it is occurring. In this analysis we are trying to figure out what factors improve or hinder a child’s social skills (our dependent variable) based on variables related to the individual child and their neighbourhood (our explanatory variables).

The difference between OLS and GWR is that OLS provides a global model whereas GWR provides a local model. So in our analysis, OLS created a single regression equation to represent how the explanatory variables impact a child’s social score. OLS can determine how well our model worked to predict a child’s social score and whether the explanatory variables we chose have a negative or positive impact on that score. GWR creates a regression equation of a variable for every feature in the dataset.  In this dataset, GWR created a regression model for each chosen explanatory variable (gender, income, and language score) and compares results with a child’s surrounding neighbours. This allows us to understand how these individual variables impact a child’s social score and whether it has something to do with where they live. To sum up, OLS can tell us how well all our explanatory variables predicted a child’s social score and GWR allows us to look at each individual explanatory variable across our study area to better understand how these variables differ spatially.  Basically what these tools are allowing us to do is uncover the underlying factors that potentially contribute to whether a child will have a high or low social score.

As the bulk of analysis depends on explanatory variables, it is crucial to use the proper ones in our regression models. We used a data mining tool in ArcGIS that enabled us to try all possible combinations of our explanatory variables to see which ones most accurately described a child’s social score. The best model found used gender, language skills and income.  Running these variables through OLS determined that this model can explain 37% of children’s social scores. Gender had a negative relationship (-5.49) on social scores. Females were given a value of 1 and males a value of 0 so a negative relationship with gender tells us that being female has a generally negative impact on a child’s social score.  Both language (0.52) and income (0.05) had a weak but positive effect on a child’s social score.

Results of the GWR analysis (Map 1 & Map 2) visualise the spatial patterns present regarding a child’s social score. Red areas reflect whether income or gender negatively impact a child’s social score and green areas represent a positive impact. Values associated with either red or green determine how much the variable impacts a child’s score. Areas where the regression model performed well are highlighted with blue dots. Since female is associated with a negative value, we can observe (Map 1) that areas with the strongest negative relationship means that being female in this part of the city can reduce a child’s social score by 17 points. We can also observe (Map 2) where income has a negative impact on a child’s social score. Areas with the strongest negative relationship to income indicate that in these areas, having a high income actually has a negative impact on a child’s social score. Areas where our model had the best performance line up very well with the spatial patterns observed (Map 2) prompting deeper research into why these patterns are happening.

Map 1.

Map 2.