MULTIPLE REGRESSION ANALYSIS

MULTIPLE REGRESSION ANALYSIS

Introduction

Multiple Regression analysis attempts to study the relationship between a dependent variable and a set of independent variables (one or more). It is a statistical technique that allows us to predict someone’s score on one variable on the basis of their scores on several other variables. When using multiple regression the term “independent variables” are used to identify those variables that will influence some other “dependent variable”.

CASE ANALYSIS- 1

PROBLEM

An educational institute wants to build a regression model to decide the salary structure (dependent variable) of the teachers on the basis of the criteria (independent variable): experience in years, number of books published, number of journals published, additional qualification and number of projects handled.

INPUT DATA

Dependent variable

Y = Monthly salary (in 000’s)

Independent variable

X1= experience in years, X2 = number of books published, X3 = number of journals published, X4 = number of trainings attended, X5 = number of projects handled.

The data for 15 different colleges have been collected and shown in the following table.

Table No- 1: Input Data

Performing the Analysis with SPSS

For SPSS Version 11, click on Analyze ⇒ Regression ⇒ Linear

This will bring up the SPSS screen dialogue box as shown below.

After clicking Linear, this will bring up the SPSS screen dialogue box as shown below.

Select the Criterion (or dependent) variable and move it into the Dependent box. Select the predictor (or independent) variables and move them into Independent(s) box. Choose the Method you wish to employ. If in doubt use the Enter method. Now click on the Statistics button. This will bring up the dialogue box shown below.

Select Estimates and then Continue button. This will return you to the Linear Regression dialogue box. Now click on the button   OK.  The output that will be produced is illustrated on the following pages.

SPSS Output

Table-2: Variables Entered/Removed

                             a All requested variables entered.

                             b Dependent Variable: salary

This table tells us about the predictor variables and the method used. Here we can see that all of our predictor variables were entered simultaneously (because we selected the Enter method)

Table-3: Model Summary

                               a Predictors: (Constant), projects, training, journal, books, exp

The multiple correlation coefficient R = 0.972 indicates that there is a strong correlation between the monthly salary of the teachers and the variables predicted by the regression model. The Adjusted R Square value tells us that our model accounts for 91.4% of variance – a very good model.

Table-4: ANOVA

                             a Predictors: (Constant), projects, training, journal, books, exp

                              b Dependent Variable: salary

ANOVA table provides an F-test for the null hypothesis that none of the predictor variable is related to monthly salary. Here we can clearly reject this null hypothesis (F (5, 9) = 30.616,P < 0.05), and it is concluded that at least one of the variables projects, training, journal, books, exp is related to monthly salary of the teachers.

Table-5: Coefficients

                             a Dependent Variable: salary

The t and Sig (p) values give a rough indication of the impact of each predictor variable – a big absolute t value and small p value suggest that a predictor variable having a large impact on the criterion variable.

The standardized regression coefficients measure the change in the dependent variable in units of its standard deviation when the independent variable increases by one standard deviation.

The unstandardized Beta Coefficients give a measure of the contribution of each variable to the model. A large value indicates that a unit change in this predictor variable has a large effect on the criterion variable while other predictors remain constant.

Unstandardized regression coefficients are used to estimate the regression line as follows:

            Y = 8.810 + 0.855 X1+ 9.622E-02X2 + 0.373X3 + 2.240E-02 X4 +1.523X5

Y = 8.810 + 0.855 X1 (exp) + 0.0108 X2 (books) + 0.373X3 (journal) +0.1992X4 (training) + 1.523 X5 (projects)

INTERPRETATION OF RESULT

It is clear from the regression equation that all the variables are positively correlated with ‘monthly salary’. The coefficient (1.523) indicate that the variable X5 (projects) significantly influences the variable Y (monthly salary), whereas the impact of all other variables is insignificant. This is also evident from ‘p’ values of ‘projects’ (0.026). If the ‘p’ value is less than or equal to the level of significance (5%), then the variable is significant. The higher absolute value of‘t’ for projects is also the indication of significance of the variable on monthly salary. The results indicate that if the number of projects handled goes up by one unit, the salary would go up by 1.523 units while others (training, journal, books, and exp.) remain constant. The second most important independent variable is ‘exp’ with unstandardized coefficient 0.855.


CASE ANALYSIS- 2


PROBLEM

A TV manufacturing unit is interested to predict the demand (dependent variable) for its product in 18 selected territories. The independent variables for this purpose are: number of dealers in the market, advertising expenditure, number of existing customers in the territory and index of the competitive units.

INPUT DATA

Dependent variable

Y = Demand (in 00’s)

Independent variable

X1= Number of dealers in the market, X2 = Advertising expenditure (in 000’s), X3 = Number of existing customers in the territory (in 00’s), X4 = Index of the competitive brand on a 5-point scale (1=low, 5=high)

The data for 15 territories have been collected and shown in the following table.

Table-1: Input Data

SPSS Output

Table-2: Variables Entered/Removed

                             a All requested variables entered.

                             b Dependent Variable: Y

Table- 3: Model Summary

                             a Predictors: (Constant), X4, X3, X2, X1

The multiple correlation coefficient R = 0.668 indicates that the relation between the demand of television and the variables predicted by the regression model is not significant. The Adjusted R Square value tells us that our model accounts for 27.6% of variance – not a very good model.

Table-4: ANOVA

                             a Predictors: (Constant), X4, X3, X2, X1

                             b Dependent Variable: Y

Table-5: Coefficients

                              a Dependent Variable: Y

The regression equation so formed is as follows.

Y = 22.744+1.078 (dealer) – 0.240X2 (expenses on advertisement) + 7.917E-02X3 (customers) – 2.992X4 (index)

INTERPRETATION

The variables ‘number of customers’ and ‘number of dealers’ are positively correlated and the variables ‘expenses on advertisement’ and ‘index of the competitive brand’ are negatively correlated with the demand of television. The demand of TV is highly influenced by the number of existing dealers in the market.

SPSS Command

  1. Click on ANALYZE at the SPSS menu bar (in older versions of SPSS, click on STATISTICS instead of ANALYZE).
  2. Click on REGRESSION followed by LINEAR
  3. Select the dependent variable and move it to dependent box. Similarly select the independent variables and move them to independent box.
  4. Choose the method ENTER.
  5. Click on STATISTICS and select ESIMATES and MODEL FIT followed by CONTINUE.
  6. Select OK of the main dialogue box.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.