# Regression Part 2: Logistic Regression

In the previous post we discussed linear regression.

There, the dependent variable was a continuous variable.

What if the dependent variable is not continuous?

That’s where logistic regression comes in.

In logistic regression, the dependent variable is binary (it has two levels- dead/ alive; sick/ healthy; normal/ low birth weight, etc.)

As in linear regression, the independent variables could be continuous, categorical, or binary.

By now you must have realized that the same principles apply here, too:

If there is only one independent variable, it is called simple logistic regression;

If there are many independent variables, it is termed multiple logistic regression.

The equation retains the general form:

y= a + b(x)

Only, here the dependent variable  (y) is binary.

As in multiple linear regression, the equation for multiple logistic regression expands to accommodate the additional independent variables:

y= a + b0(x0) + b1(x1) + b2(x2) +….+ bn(xn)

The natural question at this stage would be, “Why do we need to know all this, anyway?”

The answer to this question is, “Because we need to use regression techniques whenever we wish to make a prediction about something”.

Let me give a few examples to illustrate this point:

We need regression techniques if we wish to predict:

1. The weight of a child, given a particular age or height

2. Whether a student will pass or fail, given performance in internal exams, attendance, etc.

However, when we perform regression analysis, the end result is a ‘model’ (another way of describing the regression equation) which (best) predicts the behavior of the dependent variable. This model is by no means perfect- many models may have to be generated and compared before settling upon a ‘final model’. Each ‘model’ will have to be tested extensively before acceptance. Models that work well in a given situation (population), may not work as well (may not predict well enough) in another situation.