A Simple Explanation of Econometrics

Updated: May 28


Because Google Adsense deemed this website to have too little content for ads, I'll be posting my top 25 undergraduate papers. This was a simple paper done in the class of labour economics. The great irony is that I could easily ace a paper in which I explain econometrics. But because I got a C- in calculus and statistics I was deemed not good enough at math to enter into econometrics at the same school which then led to me being rejected from the masters of economic development program. Life is hard sometimes.



Dalhousie University

ECON 3315

October 7th, 2020



Econometrics is how economists empirically determine economic relationships using statistical methods and economic data. To explain the basics of econometrics, this essay will be referencing a hypothetical model for earnings in Canada. The variables in the model include earnings, years of education, age as well as two dummy variables.(1) A dummy variable is a variable whose value is either zero or one to indicate whether or not the variable applies. The dummy variables used for this model include whether or not the individual resides in Atlantic Canada and whether or not the individual is female.

While the point of the model as a whole is to use variables to explain the variation in earnings in Canada, the fundamental research question being asked is - what is the effect of the quantity of years of education of an individual towards their earnings in Canada? Which is why the dependent variable at the head of the equation in figure 1 is earnings and the first independent variable of the equation is years of education. The model controls for other independent variables such as age, whether or not the individual resides in Atlantic Canada and whether or not the individual is female. The “e” at the tail end of the equation represents a random error.

Each coefficient listed in the table tells a different story about the effect of the independent variable in question on the dependent variable. For example, the coefficient listed beside the variable of “years of education” is 0.05. That coefficient is attached to the value of that independent variable in the equation. Which is why the 0.05 coefficient for “years of education” tells us that for every one unit increase in that variable, we would expect a 5% increase in the dependent variable (earnings). The coefficient of 0.025 for age tells us that for every one unit increase in that variable, we would expect a rise of 2.5% in earnings. Given that the coefficient for both dummy variables are -0.05, that tells us that if the dummy variable applies, we would expect a 5% decline in expected earnings.

In order to determine which independent variables present effects on the model that are statistically significant, we must divide its coefficient by its standard error. The standard error is obtained by dividing the sample standard deviation by square root of the number of samples.2 Applying this math to the coefficient and standard error for the variable “years of education” presents an answer of 5. That tells us that the effects of that independent variable in particular are statistically significant on the model because the result is higher than 2. These numbers are called “T-Ratios.” The T-Ratio for the variable of age works out to 2.5 which is also statistically significant. However, the T-Ratio for the dummy variable of residence in Atlantic Canada is only 0.5. Therefore that variable does not present a statistically significant effect on the model. The T-Ratio for the dummy variable of whether or not the individual is female turns out to be 5 which is statistically significant.

The final piece of numerical information to convey the story of this model is the R-squared. The R-squared tells us the overall goodness of fit. An R-squared of 0.2 tells us that the combined effects of all of the independent variables explain 20% of the variation in the dependent variable of earnings in Canada. To understand goodness of fit, it is best to think of a two dimensional graph in which there are many dots that represent individual values with a line of best fit drawn over the dots. Given that R-squared is always a value between 0 (no fit) and 1 (a perfect fit) an R-squared of 0.2 is not a strong overall goodness of fit. To achieve a better overall goodness of fit, more independent variables should be added to better explain the variation in earnings in Canada.

However, a high R-squared is not necessarily the goal of this model. The goal of the model is to show the effects of years of education on earnings in Canada. And we must also not forget that we have deciphered the statistical significance of each independent variable on the model which is very valuable information in the field of economics. The key takeaway from this model is that for each year of education an individual has, we expect an increase of 5% in the earnings of that individual in Canada. The other takeaways from the model are that the variables of “age” and whether or not the individual is female presents a statistically significant effect on the model. Whereas the variable of whether or not the individual resides in Atlantic Canada does not present a statistically significant effect on the model.




Figure 1



Source: Phipps, S, Essay Question 3, lecture notes, Department of Economics, Dalhousie University.







Bibliography


Phipps, S, Essay Question 3, lecture notes, Department of Economics, Dalhousie University.


Kenton, W. (2020, August 28). How Standard Errors Work. Retrieved October 07, 2020, from https://www.investopedia.com/terms/s/standard-error.asp




1 - See figure 1.

2 - Kenton, W. (2020, August 28). How Standard Errors Work.