Class 4: Statistics

Regression

Regression is the search for explanatory relations between a dependent and an independent variable (or variables, in the case of multiple regression). Linear regression is used in cases where coefficients are linear (for example, using Ordinary Least Squares analysis, or OLS). OLS models can be fixed where errors are normally distributed and independent from one another, and the data are sourced from independent random samples.

In GIS, regression can be used spatially by applying tools such as OLS to map varying degrees of strength of the relationship between the dependent and predictor variables. Competing models can be selected by finding the highest r2-value (how much of the variance of the data can be explained by a predictor variable?) and the lowest AIC value. Thus, it is possible to use statistical tools in ArcGIS to fit a model that incorporates spatial autocorrelation (see Class 2).

Application

If we have spatial data for variables that we suspect might be related, we can use regression methods to investigate the direction and strength of those potential relationships. In landscape ecology, for example, researchers may wish to investigate the relationship between biodiversity loss and processes such as forest fragmentation.

Potential Caveats

While regression is a powerful tool in spatial analysis, there are some issues that can arise in a model without careful consideration. For example;

  1. Multi-collinearity: the existence of correlation between multiple predictor variables, which should ideally be independent of each other.
  2. Omitted Variables: cases where important predictor variables are excluded from the model.
  3. Endogeneity: cases where the dependent variable actually causes changes in the predictor variable, in comparison to the expected model.

 

Leave a Reply

Your email address will not be published. Required fields are marked *