Spatial Statistics

This page is meant to provide an explanation of the spatial statistics used in the analysis. All statistical modelling tools are used on ESRI ArcGIS Pro.

Regression Analysis, Dependent and Independent Variables

A regression analysis is simply a mathematical way of determining what variables have an impact on the issue at hand. It easily answers a few question including what variables matter most, what variables matter least, how the variables interact with each other and affirms how certain one can be about the importance of any one factor (Gallo et al., 2017). Within a regression there are two different variables: the dependent variable and the independent variables. The dependent variable is the issue or factor you are trying to understand and the independent variables are factors that are likely to have an effect on the dependent variable (Gallo et al., 2017). In this case, the dependent variable is the COVID-19 cases and the independent variables are the census data (see an in depth explanation here).

Exploratory Regression

In order to determine what the ‘best’ independent variables are to include in the OLS analysis, an exploratory regression must first take place to establishing the variables with the highest outcome value. The Exploratory Regression tool evaluates all possible combinations between the dependent and independent variables, looking for the model to best explain the geographic spread of, in this case, COVID-19 (Braun & Oswald, 2011). Because there is not just one variable that is causing COVID-19 cases to arise in Montréal, the Exploratory Regression tool acts almost like a Stepwise regression: an analysis that checks all other independent variables in the model after one independent variable has been assessed to see if their significance was reduced, and may be removed from the model (NCSS, 2017). The difference here is that the Exploratory Regression tool does not just look for models with the highest Adjusted R2 values, but also the lowest Akaike information criterion (AICc) value. The Adjusted R2 value is adjusted for the number of predictors in the model and increases if a variable improves the model more than would be expected under random scenarios (NCSS, 2017), and AICc estimates the quality of each model relative to all other models and is adjusted for small sample sizes (Braun & Oswald, 2011). It is important to remember here that the ‘best’ variables are only correlated with the spread of COVID-19 and are not the causation of it.

Generalized Linear Regression (Ordinary Least Squares)

Once the most important variables have been identified, the next step is to use the Generalized Linear Regression tool to create a model of the relationship between the dependent variable and the set of exploratory variables and determine their associated statistics (Nelder & Wedderburn, 1972). The model type used here is Ordinary Least Squares (OLS), which is a Continuous Gaussian model type, meaning it can take on a wide range of values rather than a binary or a discrete set of values (Mitchell, 2005). OLS creates a global model with a single regression equation, which will produce a histogram with a symmetrical bell curve showing that most values are normally distributed and are clustered near the mean (Mitchell, 2005).

Geographically Weighted Regression

The final step in this statistical analysis is to bring in the spatial component. The Geographically Weighted Regression (GWR) tool accounts for a localized spatial element, taking into consideration non-stationary factors, like physical topography and land value, to produce a local model for the spread of COVID-19 cases, rather than assuming all variables remain constant across space like the OLS (Charlton et al., 2006). The GWR creates separate equations for every variable in the dataset by incorporating the relationship of the dependent and exploratory variables variables within the ‘neighbourhood’ of each feature (Mitchell, 2005).. The neighbourhoods shape and extent is determined by the Neighbourhood Type and Neighbourhood Selection Method parameters, where Neighbourhood Type specifies whether the neighbourhood is constructed as a fixed distance or varies in extent depending on the density of features and the Neighbourhood Selection Method specifies how the neighbourhood size will determined (Mitchell, 2005). In this study, Neighbourhood Type is defined by Number of Neighbours, where if COVID-19 cases are densely concentrated the spatial extent of the neighbourhood will be smaller, and if sparse will be larger; and the Neighbourhood Selection Method is the Golden Search because it automatically finds the best value for Number of Neighbours and sets the neighbourhood size to minimizes the AICc value. Although a preliminary OLS regression must first be carried out before the GWR, the GWR is more meaningful because relationships will vary across space and measuring these differences can create real insight of local dynamics on a global scale.

Spam prevention powered by Akismet