05.logistic regression
-
Classification is to predict categorical variables.
-
Logistic regression is one of the basic ways to perform classification.
-
Logistic regression allows us to solve classification problems, where we are trying to predict discrete categories.
- The convention for binary classification is to have two classes 0 and 1.
The result of the sigmoid or logistic function is between 0 and 1.
Evaluation
We train a logistic regression model on training data and we evaluate the model's performance on testing data.
We can use confusion matrix to evaluate classification models.
What constitues the 'good' matrics, will really depends on the situation.
In some cases, accuracy is really important. In some cases, precision and recall are important.
Accuracy = (TP + TN) / Total Accuracy tells how often the model is correct?
Misclassification Rate (Error Rate) = (FP + FN) / Total Error rate tells how often the model is wrong?
Type I Error -- False Positive
Type II Error -- False Negative
- Binary Classification has some of its own classification metrics.
- These include visualizations of metrics from the confusion matrix.
- The Receiver Operator Curve (ROC) curve was developed during World War II to help analyze radar data.