Lecture 4: Statistics Review

In this lecture we reviewed statistics and the strength of its methods for geographical analysis. In particular, the lecture focused on regression which is a powerful tool used to describe the relationship between a set of variables. When conducting a regression analysis variables included in the model can be independent or dependent with the independent variables explaining the dependent in concrete or predicted relationships. The relationships between variables can be linear or non-linear and the minimum number of samples used to determine such relationships should ideally be no less than 30.

In analyzing the results of a regression the residuals, which is the difference between the observed and predicted values, should be evenly distributed throughout the extent of a study area. Examining the residuals can show bias or error within a regression model.  While it may seem that increasing the number of variables and samples in a regression would strengthen the model output, too many can obscure important relationships. So, it is important to strike a careful balance between simplicity and complexity in your model through research of the topic and exploratory regression in the initial stages of study.

Of particular use to geospatial analysis is a Geographically Weighted Regression (GWR). A GWR differs from a standard global regression model such an Ordinary Least Squares in that it constructs a local model of the relationship between variables using the logic of Tobler’s First Law which states that “everything is related to everything, but near things are more related than far things”. In this regression model variables near to other variables are given more weight allowing deeper insight into the spatial relationships between a set of variables.