Simple Linear Regression for absolute beginners

Linear Regression is the most simple and basic model in machine learning. It may seem dull compared to advanced machine learning models, yet it is still a widely used statistical learning model. The importance of having a good understanding of linear regression before studying more complex methods cannot be overstated.
Definition
Liner Regression is a linear model that assumes a linear relationship between the input variables() and output variable(). Mathematically, this linear relationship can be represented as In above equation, and are two unknown constants that represent the intercept and slope terms of the linear model. Once we estimate the values of and using our training data, we can predict the output variable for new data by computing where indicates a prediction of on the basis of . represents estimated value for an unknown parameter or coefficient or predicted value of the response.
Estimating the coefficients
Before we make predictions, we must find out the values of and . Let represent observations pairs. The most common approach of estimating the coefficients is ordinary least scores criterion, which minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation.
Let be the prediction for based on the value of . Then represents the residual which is the difference between actual value and predicted value by our linear regression model. Residual sum of squares (RSS) can be defined as or equivalent to Our goal here is to choose and to minimize the value. Using calculus, we can show that the values are
where and are the sample means or we can say Eq. 2 and Eq. 3 defines the least squares coefficient estimates for simple linear regression
Cost function
Cost function is the average error of samples in the data. It can be written as: We can obtain the coefficients by minimizing the cost function. This can be done via:
- Closed form solution: differentiating the function and equating it to zero
- Iterative solution:
- first order: Gradient Descent
- second order: Newton’s Method
Let’s understand both the solutions with a simple function,
Closed form solution
Our approach here is to differentiate the function and equating it to zero. differential of . For fuction , minima is at
Gradient Descent
Gradient Descent is the most popular optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descents.
In this approach, we start at a random initial point, say and compute . Here is the learning rate, consider and substituting the same Similarly, we can calculate , as
For ,
As we see the the value is decreasing at every step. After a few iterations, we approach the minima of the function which is at .
Predictions and more
Once we have the estimates of the coefficients and , we can subsitute them in our Eq. 1 and predict the target variable for any given . Assuming that we have the coefficients and we predicted the values.
Looks very simple isn’t it?
While a basic linear regression involves deriving coefficients, substituting them into a straight line equation, and using this to predict a target variable, the process becomes more complex when more parameters are involved. Specifically, if we are working with a single parameter, it is known as Simple Linear Regression, whereas with multiple parameters, we refer to it as Multiple Linear Regression
My upcoming post will provide a comprehensive analysis of Multiple Linear Regression, including an exploration of the challenges that arise with an increased number of parameters and strategies for effectively managing them