Gradient Cost Function
Unit 8
Simple Gradient Descent Cost Function Exercise
In this gradient descent exercise, we closely examine the effects of changing parameter values in a simple gradient descent optimization process. Specifically, we fit a linear model to a set of 2-dimensional points using gradient descent to minimize the cost function.
Parameter Observations
Iterations
The number of iterations determines how many times the algorithm updates the model parameters (slope and intercept). Increasing the number of iterations allows the gradient descent algorithm more time to converge towards the global minimum of the cost function. With insufficient iterations, the algorithm may terminate prematurely, leading to an under-fitted model that does not adequately capture the underlying data pattern.
However, excessive iterations may result in redundant computations, particularly when the model parameters have already converged to near-optimal values. Therefore, it is crucial to choose an appropriate number of iterations to balance computational efficiency and model performance.
Learning Rate
The learning rate is a crucial hyperparameter that dictates the size of the steps taken towards minimizing the cost function during each iteration. A small learning rate ensures that updates to the model parameters are gradual and steady, promoting a stable convergence towards the global minimum. However, this may require a large number of iterations, potentially prolonging the optimization process.
Conversely, a large learning rate accelerates convergence by taking larger steps, reducing the number of iterations needed. However, this approach risks overshooting the minimum or causing oscillations around it, which can lead to divergence or suboptimal convergence. Properly tuning the learning rate is vital to achieving a balance between speed and stability.
Conclusion
Achieving an optimal balance between the number of iterations and the learning rate is critical for effective gradient descent optimization. Insufficient iterations or an excessively large learning rate can hinder convergence, while too many iterations or an overly small learning rate may lead to inefficient optimization. Careful calibration of these parameters is essential for ensuring both the accuracy and efficiency of the gradient descent algorithm.
Code
import numpy as np
import matplotlib.pyplot as plt
def gradient_descent(x, y):
m_curr = b_curr = 0
iterations = 100
n = len(x)
learning_rate = 0.08
m_history = []
b_history = []
cost_history = []
for i in range(iterations):
y_predicted = m_curr * x + b_curr
cost = (1/n) * sum((y - y_predicted) ** 2)
md = -(2/n) * sum(x * (y - y_predicted))
bd = -(2/n) * sum(y - y_predicted)
m_curr = m_curr - learning_rate * md
b_curr = b_curr - learning_rate * bd
# Store history for plotting
m_history.append(m_curr)
b_history.append(b_curr)
cost_history.append(cost)
return m_curr, b_curr, m_history, b_history, cost_history
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 9, 11, 13])
m_final, b_final, m_history, b_history, cost_history = gradient_descent(x, y)
plt.figure(figsize=(14, 5))
plt.subplot(1, 2, 1)
plt.scatter(x, y, color='blue', label='Data points')
plt.plot(x, m_final * x + b_final, color='red', label=f'Fitted line: y = {m_final:.2f}x + {b_final:.2f}')
plt.title('Data and Fitted Line')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(range(len(cost_history)), cost_history, color='green')
plt.title('Cost Function over Iterations')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.tight_layout()
plt.show()