# Linear multiple regression

The general purpose of multiple regressions (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Linear Regression was being intensely carried out after its first use and is still used as the most popular techniques to calculate regressions.

#### Steps in Multiple Regressions:

The steps in multiple regressions are basically the same as in simple regression.

- State the research hypothesis.
- State the null hypothesis
- Gather the data
- Assess each variable separately first (obtain measures of central tendency and dispersion; frequency distributions; graphs); is the variable normally distributed?
- Assess the relationship of each independent variable, one at a time, with the dependent variable (calculate the correlation coefficient; obtain a scatter plot); are the two variables linearly related?
- Assess the relationships between all of the independent variables with each other (obtain a correlation coefficient matrix for all the independent variables); are the independent variables too highly correlated with one another?
- Calculate the regression equation from the data
- Calculate and examine appropriate measures of association and tests of statistical significance for each coefficient and for the equation as a whole
- Accept or reject the null hypothesis
- Reject or accept the research hypothesis
- Explain the practical implications of the findings

#### Regression Equation

A line in a two dimensional or two-variable space is defined by the equationY=a+b*X; this could be further simplified into saying as theYvariable which can be expressed in terms of a constant (a) and a slope (b) times theXvariable. The constant is also referred to as theintercept, and the slope as theregression coefficientorB coefficient.

The first part of the question requires calculating the regression equation for the items that influence sales.

The results obtained as per using Minitab are as follows:

When the response is set as the values for Y and the predictor is set to the values of X2 then the equation is:

Y = 965 + 105X

Here the sales are being influenced due to change in prices.

When the response is set as the values for Y and the predictor is set to the values of X3 then the equation is:

Y = 771 + 508X

Here the sales are being influenced due to change in marketing spend.

When the response is set as the values for Y and the predictor is set to the values of X4 then the equation is:

Y = - 3989 + 50.1X

Here the sales are being influenced due to change in Index of economic activity.

Can predictions from the above equation result in good forecast?

The above equations suggest that the marketing spend and prices would have a direct relation with the change in variable X but the index of economic activity suggest an inverse relation for some values of x and direct for others.

The prediction from the above equation can result in good forecast given that other variable don't change and that only the given variables may change. The general process for creating a prediction equation involves gathering relevant data from a large, representative sample from the population. While guidelines for general applications of regression are as small as 50 + 8*number of predictors (Tabachnick & Fidell, 1996), guidelines for prediction equations are more stringent due to the need to generalize beyond a given sample. While some authors have suggested that 15 subjects per predictor is sufficient (Park & Dudycha, 1974; Pedhazur, 1997), others have suggested minimum total sample (e.g., 400, Pedhazur, 1997), others have suggested a minimum of 40 subjects per predictor (Cohen and Cohen, 1983; Tabachnick & Fidell, 1996). Although, as the goal is a stable regression equation that is representative of the population regression equation, more is better. If one has good estimates of effect sizes, a power analysis might give a good estimate of the sample size.

Regardless of the method ultimately chosen by the researcher, it is important that the researcher study firstly, individual variables to ensure that only variables contributing significantly to the variance accounted for by the regression equation are included. Variables not accounting for significant portions of variance should be deleted from the equation, and the equation should be re-calculated. Further, they might want to examine excluded variables to see if their entry would significantly improve prediction (a significant increase in R-squared).

__Problems in the results:__

The problems that were being looked in the results were as to whether use the items that would do less or more variance. The prices would affect the sales more but the marketing spend on the other would be more appropriate to say as the item that would influence the sales largely.

To handle this difficulty the least square method was being used individually on both the data sets to check the viability of the solutions.

Just as with simple regression, multiple regressions will not be good at explaining the relationship of the independent variables to the dependent variables if those relationships are not linear.

Ordinary least squares linear multiple regression is used to predict dependent variables measured at the interval or ratio level. If the dependent variable is not measured at this level, then other, more specialized regression techniques must be used.

Ordinary least squares linear multiple regressions assume that the independent (X) variables are measures at the interval or ratio level. If the variables are not, then multiple regressions will result in more errors of prediction. When nominal level variables are used, they are called "dummy" variables. They take the value of 1 to represent the presence of some quality, and the value of zero indicate the absence of that quality (for example, smoker=1, non-smoker=0). Ordinal coefficients may indicate ranks (for example, staff=1, supervisor=2, manager=3). The interpretation of the coefficients is more problematic with independent variables measured at the nominal or ordinal level.

Regression with only one dependent and one independent variable normally requires a minimum of 30 observations. A good rule of thumb is to add at least an additional 10 observations for each additional independent variable added to the equation.

The number of independent variables in the equation should be limited by two factors. First, the independent variables should be included in the equation only if they are based on the researcher's theory about what factors influence the dependent variable. Second, variables that do not contribute very much to explaining the variance in the dependent variable (i.e., to the total R2), should be eliminated.

Many difficulties tend to arise when there are more than five independent variables in a multiple regression equation. One of the most frequent is the problem that two or more of the independent variables are highly correlated to one another. This is called multi co linearity. If a correlation coefficient matrix with all the independent variables indicates correlations of .75 or higher, then there may be a problem with multi co linearity.

When two variables are highly correlated, they are basically measuring the same phenomenon. When one enters into the regression equation, it tends to explain most of the variance in the dependent variable that is related to that phenomenon. This leave little variance to be explained by the second independent variable.

__Activity Two:__

__Multiple Regression:__

The two variables with highest explanatory power as independent variables are marketing spend and prices. The reason why these two have the influence on sales is very simple.

The quantity of sales is always directly proportionate to the prices, this being the economic law, as the prices may go up so does the sales goes up because the business would sell more when the prices goes up so that it can earn more profit by selling at a higher price

The marketing spend adds value to the product and therefore helps to increase the sales of the product. This means that it also has a direct relationship with sales. The more marketing spend is done the more the sales goes higher.

__Major Problems in use of Multiple Regression:__

In spite of the increased use of MR in management research, there have been issues raised regarding difficulties associated with the use of multiple regressions. Numerous researchers (e.g., Evans, 1985; Morris, Sherman & Mansfield, 1986) argue that tests of hypotheses pertaining to the effects of moderators often have very low statistical power. In the context of MR, power is the probability of rejecting a false null hypothesis of no moderating effect. If power is low, Type II statistical error rates are high and, thus, researchers may erroneously dismiss theoretical models that include moderating effects.

When choosing a predictor variable you should select one that might be correlated with the criterion variable, but that is not strongly correlated with the other predictor variables. However, correlations amongst the predictor variables are not unusual. The term multi co linearity (or co linearity) is used to describe the situation when a high correlation is detected between two or more predictor variables. Such high correlations cause problems when trying to draw inferences about the relative contribution of each predictor variable to the success of the model

When the model function is not linear in the parameters, the sum of squares must be minimized by a relative procedure. This introduces many complications.

__References:__

- Osborne, Jason W. (2000). Prediction in multiple regression.Practical Assessment, Research & Evaluation
- SPSS for Psychologists (http://www.palgrave.com/PDFs/0333734718.Pdf)
- Richard A. Berk,Regression Analysis: A Constructive Critique, Sage Publications (2004)
- David A. Freedman,Statistical Models: Theory and Practice, Cambridge University Press (2005)
- R. Dennis Cook; Sanford WeisbergCriticism and Influence Analysis in Regression,Sociological Methodology, Vol. 13. (1982)
- Galton, Francis (1989). "Kinship and Correlation (reprinted 1989)".Statistical Science4(2).
- Aldrich, John (2005). "Fisher and Regression".Statistical Science20(4)
- http://www.csulb.edu/~msaintg/ppa696/696regmx.htm