**MULTIPLE REGRESSION ANALYSIS**

**Introduction**

Multiple Regression analysis attempts to study the relationship between a dependent variable and a set of independent variables (one or more). It is a statistical technique that allows us to predict someone’s score on one variable on the basis of their scores on several other variables. When using multiple regression the term “independent variables” are used to identify those variables that will influence some other “dependent variable”.

**CASE ANALYSIS- 1**

**PROBLEM**

An educational institute wants to build a regression model to decide the salary structure (dependent variable) of the teachers on the basis of the criteria (independent variable): experience in years, number of books published, number of journals published, additional qualification and number of projects handled.

**INPUT DATA**

**Dependent variable**

Y = Monthly salary (in 000’s)

**Independent variable**

X_{1}= experience in years, X_{2} = number of books published, X_{3} = number of journals published, X_{4} = number of trainings attended, X_{5} = number of projects handled.

The data for 15 different colleges have been collected and shown in the following table.

**Table No- 1:** Input Data

**Performing the Analysis with SPSS**

For SPSS Version 11, click on **Analyze ⇒ ****Regression ⇒ Linear**

This will bring up the SPSS screen dialogue box as shown below.

After clicking **Linear, **this will bring up the SPSS screen dialogue box as shown below.

Select the Criterion (or dependent) variable and move it into the **Dependent **box. Select the predictor (or independent) variables and move them into **Independent(s)** box. Choose the **Method **you wish to employ. If in doubt use the **Enter** method. Now click on the **Statistics** button. This will bring up the dialogue box shown below.

Select **Estimates **and then** Continue **button. This will return you to the **Linear Regression **dialogue box. Now click on the button OK. The output that will be produced is illustrated on the following pages.

**SPSS Output**

**Table-2:** Variables Entered/Removed

* a All requested variables entered.*

* b Dependent Variable: salary*

This table tells us about the predictor variables and the method used. Here we can see that all of our predictor variables were entered simultaneously (because we selected the Enter method)

**Table-3:** Model Summary

* a Predictors: (Constant), projects, training, journal, books, exp*

The multiple correlation coefficient R = 0.972 indicates that there is a strong correlation between the monthly salary of the teachers and the variables predicted by the regression model. The Adjusted R Square value tells us that our model accounts for 91.4% of variance – a very good model.

**Table-4:** ANOVA

* a Predictors: (Constant), projects, training, journal, books, exp*

* b Dependent Variable: salary*

ANOVA table provides an F-test for the null hypothesis that none of the predictor variable is related to monthly salary. Here we can clearly reject this null hypothesis (F (5, 9) = 30.616,P < 0.05), and it is concluded that at least one of the variables projects, training, journal, books, exp is related to monthly salary of the teachers.

**Table-5:** Coefficients

* a Dependent Variable: salary*

The t and Sig (p) values give a rough indication of the impact of each predictor variable – a big absolute t value and small p value suggest that a predictor variable having a large impact on the criterion variable.

The standardized regression coefficients measure the change in the dependent variable in units of its standard deviation when the independent variable increases by one standard deviation.

The unstandardized Beta Coefficients give a measure of the contribution of each variable to the model. A large value indicates that a unit change in this predictor variable has a large effect on the criterion variable while other predictors remain constant.

Unstandardized regression coefficients are used to estimate the regression line as follows:

** **Y = 8.810 + 0.855 X_{1}+ 9.622E-02X_{2 }+ 0.373X_{3 }+ 2.240E-02 X_{4 }+1.523X_{5}

⇒**Y** = 8.810 + 0.855 X_{1} **(exp)** + 0.0108 X2 **(books)** + 0.373X_{3} **(journal)** +0.1992X_{4} **(training)** + 1.523 X_{5} **(projects)**

##### INTERPRETATION OF RESULT

It is clear from the regression equation that all the variables are positively correlated with ‘monthly salary’. The coefficient (1.523) indicate that the variable X_{5} **(projects)** significantly influences the variable Y **(monthly salary**), whereas the impact of all other variables is insignificant. This is also evident from ‘p’ values of ‘projects’ (0.026). If the ‘p’ value is less than or equal to the level of significance (5%), then the variable is significant. The higher absolute value of‘t’ for projects is also the indication of significance of the variable on monthly salary. The results indicate that if the number of projects handled goes up by one unit, the salary would go up by 1.523 units while others (training, journal, books, and exp.) remain constant. The second most important independent variable is ‘exp’ with unstandardized coefficient 0.855.

**CASE ANALYSIS- 2**

**PROBLEM**

A TV manufacturing unit is interested to predict the demand (dependent variable) for its product in 18 selected territories. The independent variables for this purpose are: number of dealers in the market, advertising expenditure, number of existing customers in the territory and index of the competitive units.

**INPUT DATA**

**Dependent variable**

Y = Demand (in 00’s)

**Independent variable**

X_{1}= Number of dealers in the market, X_{2} = Advertising expenditure (in 000’s), X_{3} = Number of existing customers in the territory (in 00’s), X_{4} = Index of the competitive brand on a 5-point scale (1=low, 5=high)

The data for 15 territories have been collected and shown in the following table.

**Table-1:** Input Data

**SPSS Output**

**Table-2: **Variables Entered/Removed

* a All requested variables entered.*

* b Dependent Variable: Y*

**Table- 3:** Model Summary

* a Predictors: (Constant), X4, X3, X2, X1*

The multiple correlation coefficient R = 0.668 indicates that the relation between the demand of television and the variables predicted by the regression model is not significant. The Adjusted R Square value tells us that our model accounts for 27.6% of variance – not a very good model.

**Table-4:** ANOVA

* a Predictors: (Constant), X4, X3, X2, X1*

* b Dependent Variable: Y*

**Table-5:** Coefficients

* a Dependent Variable: Y*

The regression equation so formed is as follows.

**Y** = 22.744+1.078** (dealer)** – 0.240X2 **(expenses on advertisement)** + 7.917E-02X_{3} **(customers)** – 2.992X_{4} **(index)**

**INTERPRETATION**

The variables ‘number of customers’ and ‘number of dealers’ are positively correlated and the variables ‘expenses on advertisement’ and ‘index of the competitive brand’ are negatively correlated with the demand of television. The demand of TV is highly influenced by the number of existing dealers in the market.

**SPSS Command **

- Click on ANALYZE at the SPSS menu bar (in older versions of SPSS, click on STATISTICS instead of ANALYZE).
- Click on REGRESSION followed by LINEAR
- Select the dependent variable and move it to dependent box. Similarly select the independent variables and move them to independent box.
- Choose the method ENTER.
- Click on STATISTICS and select ESIMATES and MODEL FIT followed by CONTINUE.
- Select OK of the main dialogue box.