Logistic regression is used to find the odds or event ratio in which a specific event occurs in relation to other events. This machine learning method also is used in regression problems that require to identify the probability of occurrence of a specific dependent variable. It has the advantage to find the relationship between a dependent variable and one or more independent variables by applying the log(odds). and their probability of occurrence in a specific problem. The logistic model’s output has limited values available.
Data scientists that require to classify a dependent variable assume that the sample data is bernoulli distributed to predict the probability of occurrence in a binary problem (pass/fail,yes/no).
How Coefficients are Calculated in a Linear Regression and Logistic Regression.
Logistic regression is similar to linear regression because both calculation methods use a line to establish the relationship between dependent and independent variables, however, the difference relies in that logistic regression’s coefficients are calculated based on the logg(odds) of their variables.
How to find the best Model in Logistic Regression?
Another difference relies in the way in which the data fits the logistic regression model. Its coefficients and regression line are calculated and iterated to converge towards the maximum likelihood. The likelihood of a multivarate logistic regression model is obtained as the producto of the probabilities of all the independent variables or features in our model. The addition of the log (odd) of each of the features or independent variables also lead you to get the likelihood as it is stated below:
Likelihood of data given in the squiggle=log(Po1)+log(Po2)+log(Po3)+log(Po4)+log(Po5)+log(Po6)
Likelihood of data given in the squiggle=Po1xPo2xPo3xPo4xPo5xPo6
On the other hand, the data fits the linear regression model until the least squares converge towards a minimum value.
How Logistic Regression Calculate the Probability of an Event Occurrences?
Parameters are found to fit a model with the data and then the probability is found from the equation or model with the known values of the independent variables.
In general, the log(odds) are turned into probabilities by following this equation per feature contained in our model:
p=exp(log(odds))/(1+exp(log(odds))
These probabilities form a squiggle when plotted into a graph that contain all the independent variables or features in our model.
Here is an example in which you can verify how logistic regression is used to find the probability of passing an exam based on the number of hours studied. Logistic regression could be calculated from data and represented with a model such as this one.
Probability of passing= 1/(1+exp(-(1.5046*hours-4.0777))
The equation that represents the relationship between dependent and independent variables could have this form:
p=1/(1+b-(Bo+B1x1+B2x2))
where:
p=probability of passing an exam
b=logarithmic base usually chosen as exponential (exp) or base 2 or base 10 (log)
B1,B2=parameters
Bo=intercept with axis
x1,x2=predictors
A solution could be found with in predicting the probability that a person pass with a grade higher than 3.1.