Statistics: A Review

This week’s lecture focussed on reviewing the basics of statistics. Statistics are important because they enable us to summarize, to explore, to look for relations, and to predict. We focussed on regression, a quantitative approach that allows us to model, examine, and explore spatial relations. It can help us understand the factors behind spatial relations, and even allow us to make predictions through modelling.

Some key points regarding regression:

1. Variables can be independent and dependent, or explanatory and predicted

2. Relations amongst variables may be linear or non-linear.

3. The minimum number of cases to be used in a regression analysis should be 30.

4. It is possible to have too many variables. It makes the statistics problematic because they may be underdetermined.

5. Residuals should be evenly distributed and homoscedastic. Homoscedasticity is when there is equal variance throughout the range.

6. When choosing which variables to include in a regression analysis, it is important to find a balance between simplicity and complexity. Sometimes, adding extra variables may not help explain the response, because they add complexity to your analysis. It is important to be parsimonious in your selection.

An Ordinary Least Squares (OLS) model is an example of a simple regression model. It is based on the assumption that data are independent random samples from an underlying population, and that the model’s form and stats are valid and meaningful, and that errors are independent and identically distributed.

Such a simple model can cause issues in spatial modelling because of positive spatial autocorrelation! We know that because of geography, dependence between samples exists, and that heteroscedasticity is common. There may be issues with variance inflation – where if one variable changes it may change the variance in the model. Or, there may be issues with multi-collinearity, when two or more variables relate to each other and may pose difficulties in distinguishing which is responsible for which response.

One solution, like we used in lab assignment 3,  is Geographically Weighted Regression (GWR). Rather than maintaining linear coefficients like in a simple regression model, coefficients are allowed to vary spatially. Coefficients are determined by examining neighbourhoods of points using distance decay functions – making closer points more important, and points further away less important. This type of analysis even allows you to map the surface, residuals and parameters as a form of ‘Exploratory Spatial Data Analysis.’

Leave a Reply

Your email address will not be published. Required fields are marked *