Colab Setup
%config InlineBackend.figure_format='retina'Sets the output resolution for matplotlib and similar to retina.
Noise
n_data = 10000x_numpy = -3+6*np.random.rand(n_data,1)# y_numpy = np.exp(-(x_numpy**2))*np.cos(10*x_numpy)y_numpy = np.exp(-(x_numpy**2))*np.cos(10*x_numpy) + 3e-2*np.random.randn(n_data,1)plt.figure(figsize=(8,5))plt.plot(x_numpy,y_numpy,'r.',ms=2)plt.show()x_torch = torch.Tensor(x_numpy).to(device)y_torch = torch.Tensor(y_numpy).to(device)print ("Done.") The graph above shows the originally intended function. Let’s add noise to it. Noise is implemented by multiplying np.random.randn() by a small real value of 3e-2, as shown in the code above. 
Optimizer Comparison
  
Graphs showing how well each optimizer’s model approximates the function at epochs 500, 3500, and 9999. GT is the target function to approximate.
Adam was already approximating from the start. Very fast. SGD and Momentum look similar.