Show List

Example of Linear Regression

Let's consider a simple example of a machine learning scenario where we want to predict the price of a house based on its size. We can use linear regression to model the relationship between the size of the house and its price.

To do this, we will need a dataset that contains information on the size of houses and their corresponding prices. Let's say we have a dataset with 10 houses, where the size of the house is measured in square feet and the price is measured in thousands of dollars:

yaml
Copy code
Size (sq. ft.) Price ($1000s) ---------------- --------------- 1000 200 1200 230 1400 250 1600 275 1800 290 2000 315 2200 330 2400 360 2600 400 2800 450

We can use this data to train a linear regression model that can predict the price of a house based on its size. In linear regression, the goal is to find the line of best fit that minimizes the sum of the squared differences between the predicted values and the actual values.

To train the model, we can use a machine learning library such as scikit-learn in Python. Here's the code to train a linear regression model on the given dataset:

yaml
Copy code
from sklearn.linear_model import LinearRegression import numpy as np # Load the data X = np.array([1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400, 2600, 2800]).reshape(-1, 1) y = np.array([200, 230, 250, 275, 290, 315, 330, 360, 400, 450]).reshape(-1, 1) # Create the linear regression model and fit it to the data reg = LinearRegression().fit(X, y) # Print the coefficients of the linear regression model print("Coefficients: ", reg.coef_) print("Intercept: ", reg.intercept_)

The output of this code will be:

lua
Copy code
Coefficients: [[0.14857143]] Intercept: [90.47619048]

This means that the line of best fit for the given dataset is:

makefile
Copy code
Price = 0.1486 * Size + 90.4762

We can use this equation to predict the price of a house of any size. For example, if we want to predict the price of a house that is 2000 sq. ft., we can plug this value into the equation:

javascript
Copy code
Price = 0.1486 * 2000 + 90.4762 = 388.19 ($1000s)

So the predicted price for a house of 2000 sq. ft. is $388,190.

To evaluate the performance of the model, we can use metrics such as mean squared error (MSE) and R-squared. These metrics give us an idea of how well the model is able to fit the data and make accurate predictions.

python
Copy code
from sklearn.metrics import mean_squared_error, r2_score # Predict the prices of the houses in the dataset y_pred = reg.predict(X) # Compute the mean squared error and R-squared print("Mean squared error: %.2f" % mean_squared_error(y, y_pred)) print("R-squared: %.2f" % r2_score(y, y_pred))

The output of this code will be:

yaml
Copy code
Mean squared error: 384.49 R-squared: 0.92

The R-squared value of 0.92 indicates that the model explains 92% of the variance in the data, which is quite good for a simple linear regression model. The mean squared error of 384.49 means that on average, the model's predictions are off by about $19,620 (sqrt(384.49) * 1000).

In summary, linear regression is a powerful and widely used machine learning algorithm that can be used for a variety of prediction tasks, including price prediction, sales forecasting, and demand estimation. By training a linear regression model on a dataset, we can build a mathematical model that can make accurate predictions on new data.


    Leave a Comment


  • captcha text