How to plot Predicted vs Actual Graphs and Residual Plots

Exploring the Iris dataset

3 min readAug 17, 2023

Hey flower enthusiasts and data lovers! 🌺

In the previous article, we briefly explored techniques to evaluate whether your regression model is performing good or bad.

Knowing Your Regression Model: Good or Bad?

When you create a regression model, you’re like a chef cooking up a recipe. You combine various ingredients (features)…

dooinnkim.medium.com

Today we’ll explore this fascinating relationship using two incredible plots: Predicted vs Actual graphs and Residual plots.

A Predicted vs Actual plot is a scatter plot that helps you visualize the performance of a regression model. The x-axis represents the actual values, and the y-axis represents the predicted values. Ideally, if the predictions are perfect, the points will lie along a straight line with a slope of 1.
A residual plot is another valuable diagnostic tool in regression analysis. It helps to visualize the difference between the observed values and the values predicted by the model, known as the residuals. The residuals are plotted against the predicted values or the independent variables. A good regression model will show residuals randomly scattered around the horizontal axis (y = 0). If there are patterns or trends in the plot, they may indicate that the model has not captured some underlying structure in the data.

In this article, we will use the Iris dataset which I would call as “the Bouqet of the data world”.

Since the Iris dataset is primarily used for classification tasks, we’ll take a slightly different approach. We’ll use one feature to predict another. It contains measurements of sepal and petal lengths and widths for three species of iris flowers. Let’s see how we can use petal length to predict petal width

Ready? Let’s bloom into action!

Step1. Import Libraries and Load Data

from sklearn.datasets import load_iris
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Load the Iris data
iris = load_iris()
X = iris.data[:, 2].reshape(-1, 1) # Petal length
y = iris.data[:, 3]                # Petal width

Step 2: Split the Data

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Create and Train the Model

# Create a Linear Regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Step 4: Make Predictions

# Make predictions
y_pred = model.predict(X_test)

Step 5: Plot Predicted vs Actual Graph

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Petal Width')
plt.ylabel('Predicted Petal Width')
plt.title('Actual vs Predicted Petal Width')
plt.show()

Step 6: Plot Residual Plot

residuals = y_test - y_pred
plt.scatter(y_pred, residuals)
plt.xlabel('Predicted Petal Width')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.axhline(y=0, color='r', linestyle='--')
plt.show()

Here are the graphs for the Iris dataset, using petal length to predict petal width:

1.Actual vs Predicted Petal Width: This scatter plot compares the actual petal widths with the predicted petal widths. The closer the points are to a straight line, the better the model’s predictions.

2. Residual Plot: This plot shows the residuals (differences between the predicted and actual petal widths) against the predicted values. A well-performing model will have residuals scattered randomly around zero (the red dashed line).

Let’s Wrap It Up 🎁

Predicted vs Actual graphs and Residual plots are like friendly guides in your data science journey. They help you understand what’s happening beneath those numbers and algorithms.

So the next time you’re lost in a sea of equations, remember these handy plots. They might just be the lighthouse guiding you to success!

Happy coding, and may your models always find their way! 🚀