Tuesday, August 3, 2010

My Understanding about Linear Regression - Part I

What is linear regression?


Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. Two-variable linear regression model is a model in which the dependent variable is expressed as a linear function of only a single explanatory variable. The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if one point lies on the fitted line exactly, then its vertical deviation is 0).

Multivariate linear regression


Multivariate regression takes into account several predictive variables simultaneously.

The model is now expressed as

yi = b 0 + b 1 x 1i + b 1 x 2i + … b n x ni

Where n = the number of independent variables

yi = expected or predicted value of the dependent variable for case i

b 0 = the intercept (if all x’s are zero, the expected value of y is b 0 )

b j = the slope (for every one unit increase in xj , y is expected to change by b j units, given that the other independent variables are held constant)

x i = the value of the independent variable for case i

Assumptions of Linear Regression


1. Homoscedasticity – the variance of the error terms is constant for each value of x.

2. Linearity – the relationship between each x and y is linear. To check this, look at the plot(s) of the residuals versus the X value(s). You don’t want to see a clustering of positive residuals or a clustering of negative residuals.

3. Normally Distributed Error Terms – the error terms follow the normal distribution.

4. Independence of Error Terms – successive residuals are not correlated, if they are correlated, it is known as autocorrelation. If possible, use the Durbin Watson statistic to check this.

Frequently used forms of Linear Regression


(i) Log linear Model

Log linear model is useful in case we need to measure the elasticity

lnY=β0+ β 1(lnX1)+ β2(lnx2) + ……….+ βn(lnxn) + Error.

Here, β 1 gives the elasticity of Y with respect to X, i.e., the percentage change in Y with respect to percentage change in X. This equation is also known as the constant elasticity form as in this equation, the elasticity of y with respect to changes in x as δ lny/δ lnxn = βn, which does not vary with xn . This loglinear form is often used in models of demands and production.

(ii) Semilog Model

ln(Y)= β0+ β 1X + Error.

Here, β 1 gives the relative change in Y for a absolute change in the value of X.

A semilog model is often used to model growth rates

Derivation: Yt = Y0 (1+r)^t, which is the compound interest formula.

Taking log , log(Yt)=log(Y0)+tlog(1+r) = β0+ β1 (X).

β1 = relative change in regressand/absolute change in regressor

If we multiply the relative change in Y by 100, will then give the percentage change, or the growth rate, in Y for an absolute change in X, the regressor. That is, 100 times β1 gives the growth rate in Y; is

Another type of semi log model

Y= β0+ β 1 (lnX)+Error.

Unlike the growth model, in which we are interested in finding the percent growth in Y for an absolute change in X, here we want to find the absolute change in Y for a percent change in X.

β1 = change in Y/change in ln X
     = change in Y/relative change in X


 

No comments:

Post a Comment