Example of Linear Regression
Let's consider a simple example of a machine learning scenario where we want to predict the price of a house based on its size. We can use linear regression to model the relationship between the size of the house and its price.
To do this, we will need a dataset that contains information on the size of houses and their corresponding prices. Let's say we have a dataset with 10 houses, where the size of the house is measured in square feet and the price is measured in thousands of dollars:
Size (sq. ft.) Price ($1000s)
---------------- ---------------
1000 200
1200 230
1400 250
1600 275
1800 290
2000 315
2200 330
2400 360
2600 400
2800 450
We can use this data to train a linear regression model that can predict the price of a house based on its size. In linear regression, the goal is to find the line of best fit that minimizes the sum of the squared differences between the predicted values and the actual values.
To train the model, we can use a machine learning library such as scikit-learn in Python. Here's the code to train a linear regression model on the given dataset:
from sklearn.linear_model import LinearRegression
import numpy as np
# Load the data
X = np.array([1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]).reshape(-1, 1)
y = np.array([200, 230, 250, 275, 290, 315, 330, 360, 400, 450]).reshape(-1, 1)
# Create the linear regression model and fit it to the data
reg = LinearRegression().fit(X, y)
# Print the coefficients of the linear regression model
print("Coefficients: ", reg.coef_)
print("Intercept: ", reg.intercept_)
The output of this code will be:
Coefficients: [[0.14857143]]
Intercept: [90.47619048]
This means that the line of best fit for the given dataset is:
Price = 0.1486 * Size + 90.4762
We can use this equation to predict the price of a house of any size. For example, if we want to predict the price of a house that is 2000 sq. ft., we can plug this value into the equation:
Price = 0.1486 * 2000 + 90.4762 = 388.19 ($1000s)
So the predicted price for a house of 2000 sq. ft. is $388,190.
To evaluate the performance of the model, we can use metrics such as mean squared error (MSE) and R-squared. These metrics give us an idea of how well the model is able to fit the data and make accurate predictions.
from sklearn.metrics import mean_squared_error, r2_score
# Predict the prices of the houses in the dataset
y_pred = reg.predict(X)
# Compute the mean squared error and R-squared
print("Mean squared error: %.2f" % mean_squared_error(y, y_pred))
print("R-squared: %.2f" % r2_score(y, y_pred))
The output of this code will be:
Mean squared error: 384.49
R-squared: 0.92
The R-squared value of 0.92 indicates that the model explains 92% of the variance in the data, which is quite good for a simple linear regression model. The mean squared error of 384.49 means that on average, the model's predictions are off by about $19,620 (sqrt(384.49) * 1000).
In summary, linear regression is a powerful and widely used machine learning algorithm that can be used for a variety of prediction tasks, including price prediction, sales forecasting, and demand estimation. By training a linear regression model on a dataset, we can build a mathematical model that can make accurate predictions on new data.
Leave a Comment