Regression analysis is a statistical technique which facilitates the study of the relation between a ‘dependent’ or ‘explained’ variable on one hand and a set of ‘independent’ or ‘explanatory’ variables on the other. Depending on the type of the explained variable considered and the nature of the research question (e.g., causality versus correlation), there are specific types of regression analyses.
This is the most basic and widely used regression technique. Most of the other regression techniques are essentially modifications of the OLS set-up in some particular way. The ‘least square’ technique gets its name from the fact that the parameters of the regression equation are estimated by minimizing the sum of the squares of the deviation of the dependent variable from the regression function.
More on (OLS)
Results: Stata Output
Interpreting Regression Results
Dummy variables, also known as indicator variables, are those which take the values of either 0 or 1 to denote some mutually exclusive binary categories like yes/no, absence/presence, etc. When one or more of the explanatory variables is a dummy, the standard OLS regression technique can still be used. However, a categorical dependent variable calls for a different regression technique, e.g., the logistic regression (or logit) and the probit.
Click here for more on
When the dependent variable is a non-negative count variable, e.g., the number of telephone calls made by a call-centre agent in an hour, the number of deaths in a war, the number of victims of smallpox during an epidemic, etc., the standard OLS regression technique should not be used. Instead one has a choice of using either a Poisson regression, or a Negative Binomial Regression, or a Zero-Inflated Negative Binomial Regression.
Usually regression coeffcients can be thought as the marginal impact of the explanatory variable on the mean of the dependent variable. However, the researcher might be interested in knowing the the marginal impact of the explanatory variable on certain quantiles of the distribution of the dependent variable, e.g., what is the impact of the food stamp program on the consumption of the bottom 10% of the consumption distribution? Such questions are addressed by implementing a quantile regression.
When there is reason to believe that the explanatory variable is correlated with the error term in the OLS regression (in other words, the explanatory variable is endogenous), then it cannot be claimed that the explanatory variable causes the change in the dependent variable, rather simply that there exists a correlation. One way to argue for causality is to use the technique of instrumental variable regression where one uses an exogenous variable (called the instrument) that is uncorrelated with the error term in the OLS but correlated with the endogenous explanatory variable.
When data is available over time and over the same individuals then a panel regression is run over these two dimensions of cross-sectional and time-series variation. Panel regression is essentially an OLS regression with some added properties and interpretation like fixed effects, random effects, pooled cross-section, etc.
To argue for the causal impact of a treatment on a dependent variable, the technique of Difference-in-Difference (or double difference) is used where the impact of the treatment is defined as the difference in average outcome in the treatment group before and after treatment minus the difference in average outcome in the control group before and after treatment: it is literally a ‘difference of differences’.
Regression discontinuity design is useful for estimating the causal effect of an explanatory variable in the case where there is an observable jump or discontinuity in the level of the explanatory variable.