Logistic Regression Cost Function
What is this about?
This image shows the cost function used in logistic regression, which is a method for predicting binary outcomes (e.g., yes/no, spam/not spam).
In logistic regression, we are trying to predict the probability that a given example belongs to one of two classes, based on its features (input data). The cost function helps us measure how well our model is performing.
Key Terms:
Predicted Output ():
- The model predicts the probability that a given input belongs to the positive class (usually class 1).
- This is calculated using the sigmoid function () which outputs a probability between 0 and 1.
Where is the weighted sum of the features (input data).
True Output ():
- This is the actual value from the data. For binary classification, is either 0 or 1.
Loss Function:
- The loss function calculates how far the predicted value () is from the actual value ().
- The loss function for logistic regression is log loss or binary cross-entropy. The formula is:
- If (the true class is 1), we want to maximize (i.e., make the model confident in predicting 1). So, the loss is calculated as .
- If (the true class is 0), we want to minimize (i.e., make the model confident in predicting 0). So, the loss is calculated as .
Cost Function:
- The cost function is the average loss over all the examples in the dataset. It tells us how well the model is performing over the entire dataset.
- The cost function is:
Where:
- is the total number of examples.
- is the loss for each example .
Breaking it Down:
Predicted Output () is calculated using the sigmoid function:
- The sigmoid function squashes any number into a value between 0 and 1, which is perfect for probabilities. It’s written as:
Where is the weighted sum of the features.
Loss Function:
When (the true label is 1):
- We want to be large (close to 1).
- The loss is . If is close to 1, the loss will be small, meaning the prediction is correct.
When (the true label is 0):
- We want to be small (close to 0).
- The loss is . If is close to 0, the loss will be small, meaning the prediction is correct.
Cost Function:
- The cost function averages the loss over all examples. The goal is to minimize the cost, meaning we want to reduce the difference between the predicted values () and the actual values ().
Intuition:
- If the model predicts the correct outcome (e.g., predicts 1 for a positive example), the loss will be small.
- If the model predicts the wrong outcome (e.g., predicts 0 for a positive example), the loss will be large.
- The cost function is the average of these losses, and the goal of training is to minimize the cost, improving the model’s performance.
Summary:
- The cost function measures how well the model’s predictions match the actual outcomes.
- Logistic regression uses log loss (binary cross-entropy) to calculate how far off the predicted probabilities () are from the actual labels ().
- The cost function helps guide the optimization process (e.g., using gradient descent) to find the optimal model parameters ( and ) that minimize the error.
- This is the mathematical foundation for how logistic regression learns from data and improves its predictions.
Example 1: Email Spam Detection (Binary Classification)
Imagine you are building a logistic regression model to predict whether an email is spam (1) or not spam (0).
Let’s assume the model has a single feature: the number of times the word "offer" appears in the email.
Step 1: Predicted Probability ()
For each email, the model will give a probability that the email is spam. For example, let’s say:
- For email 1, the model predicts the probability of being spam is 0.8 (80% chance).
- For email 2, the model predicts the probability of being spam is 0.2 (20% chance).
Step 2: True Labels ()
Next, we look at the actual labels (what we know to be true about the emails):
- Email 1: True label is 1 (spam).
- Email 2: True label is 0 (not spam).
Step 3: Loss Function
Now, let’s calculate the loss for each email using the log loss formula:
For Email 1 (True label ):
So, the loss for email 1 is approximately 0.2231.
For Email 2 (True label ):
So, the loss for email 2 is also 0.2231.
Step 4: Cost Function (Average Loss)
The cost function is the average loss across all examples. In this case, we have two emails, so we calculate the average of the two losses:
Example 2: A Clear Misclassification
Let’s consider another example where the model performs poorly.
- Email 1: Predicted probability (predicted as spam with 90% certainty), but the true label is 0 (not spam).
- Email 2: Predicted probability (predicted as not spam with 10% certainty), but the true label is 1 (spam).
For Email 1 (True label ):
For Email 2 (True label ):
Step 5: Cost Function (Average Loss)
The cost function for these two emails is:
Key Takeaways:
- Log Loss penalizes the model when it is confident and wrong (e.g., predicting spam with 90% confidence when it's actually not spam).
- The cost function is the average of these losses, and the goal during training is to minimize the cost to improve the model's accuracy.
- In the first example, the model did well with relatively low loss. In the second example, the model made incorrect predictions with high loss.
This is how the logistic regression cost function works to guide the model towards making better predictions.