use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
device
device(type='cuda')

Linear regression model in Pytorch

Datasets and Dataloaders

We'll create a dataset that contains $(x,y)$ pairs sampled from the linear function $y = ax + b+ \epsilon$. To do this, we'll create a PyTorch's TensorDataset.

A PyTorch tensor is nearly the same thing as a NumPy array. The vast majority of methods and operators supported by NumPy on these structures are also supported by PyTorch, but PyTorch tensors have additional capabilities. One major capability is that these structures can live on the GPU, in which case their computation will be optimized for the GPU and can run much faster (given lots of values to work on). In addition, PyTorch can automatically calculate derivatives of these operations, including combinations of operations. These two things are critical for deep learning

linear_function_dataset[source]

linear_function_dataset(a, b, n=100, show_plot=False)

Creates a Pytorch's TensorDataset with n random samples of the linear function y = a*x + b. show_plot dcides whether or not to plot the dataset

a = 2
b = 3
n = 100
data = linear_function_dataset(a, b, n, show_plot=True)
test_eq(type(data), TensorDataset)

In every machine/deep learning experiment, we need to have at least two datasets:

  • training: used to train the model
  • validation: used to validate the model after each training step. It allows to detect overfitting and adjust the hyperparameters of the model properly
train_ds = linear_function_dataset(a, b, n=100, show_plot=True)
valid_ds = linear_function_dataset(a, b, n=20, show_plot=True)

A dataloader combines a dataset an a sampler that samples data into batches, and provides an iterable over the given dataset.

bs = 10
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=bs, shuffle=False)
for i, data in enumerate(train_dl, 1):
    x, y = data
    print(f'batch {i}: x={x.shape} ({x.device}), y={y.shape} ({y.device})')
batch 1: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 2: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 3: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 4: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 5: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 6: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 7: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 8: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 9: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 10: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)

Defining a linear regression model in Pytorch

The class torch.nn.Module is the base structure for all models in Pytorch. It mostly helps to register all the trainable parameters. A module is an object of a class that inherits from the PyTorch nn.Module class.

To implement an nn.Module you just need to:

  • Make sure the superclass init is called first when you initialize it.
  • Define any parameters of the model as attributes with nn.Parameter. To tell Module that we want to treat a tensor as a parameter, we have to wrap it in the nn.Parameter class. All PyTorch modules use nn.Parameter for any trainable parameters. This class doesn't actually add any functionality (other than automatically calling requires_grad). It's only used as a "marker" to show what to include in parameters:
  • Define a forward function that returns the output of your model.

class LinRegModel[source]

LinRegModel() :: Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

model = LinRegModel()
pa, pb = model.parameters()
pa, pa.shape, pb, pb.shape
(Parameter containing:
 tensor([-0.1623], requires_grad=True),
 torch.Size([1]),
 Parameter containing:
 tensor([-0.3689], requires_grad=True),
 torch.Size([1]))

Objects of this class behave identically to standard Python functions, in that you can call them using parentheses and they will return the activations of a model.

x = torch.randn(10, 1)
out = model(x)
x, x.shape, out, out.shape 
(tensor([[ 1.6960],
         [ 0.3434],
         [-0.5049],
         [-0.3151],
         [ 2.5839],
         [ 0.8094],
         [-0.8670],
         [-0.4008],
         [-1.6220],
         [ 1.1903]]),
 torch.Size([10, 1]),
 tensor([[-0.6443],
         [-0.4247],
         [-0.2870],
         [-0.3178],
         [-0.7884],
         [-0.5003],
         [-0.2282],
         [-0.3039],
         [-0.1056],
         [-0.5622]], grad_fn=<AddBackward0>),
 torch.Size([10, 1]))

Loss function and optimizer

The loss is the thing the machine is using as the measure of performance to decide how to update model parameters. The loss function is simple enough for a regression problem, we'll just use the Mean Square Error (MSE)

loss_func = nn.MSELoss()
loss_func(x, out)
tensor(2.4990, grad_fn=<MseLossBackward>)

We have data, a model, and a loss function; we only need one more thing we can fit a model, and that's an optimizer.

opt_func = torch.optim.SGD(model.parameters(), lr = 1e-3)

Training loop

During training, we need to push our model and our batches to the GPU. Calling cuda() on a model or a tensor this class puts all these parameters on the GPU:

model = model.to(device)

To train a model, we will need to compute all the gradients of a given loss with respect to its parameters, which is known as the backward pass. The forward pass is where we compute the output of the model on a given input, based on the matrix products. PyTorch computes all the gradients we need with a magic call to loss.backward. The backward pass is the chain rule applied multiple times, computing the gradients from the output of our model and going back, one layer at a time.

In Pytorch, Each basic function we need to differentiate is written as a torch.autograd.Function object that has a forward and a backward method. PyTorch will then keep trace of any computation we do to be able to properly run the backward pass, unless we set the requires_grad attribute of our tensors to False.

For minibatch gradient descent (the usual way of training in deep learning), we calculate gradients on batches. Before moving onto the next batch, we modify our model's parameters based on the gradients. For each iteration through our dataset (which would be called an epoch), the optimizer would perform as many updates as we have batches.

There are two important methods in a Pytorch optimizer:

  • zero_grad: In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. zero_grad just loops through the parameters of the model and sets the gradients to zero. It also calls detach_, which removes any history of gradient computation, since it won't be needed after zero_grad.
n_epochs = 10

train[source]

train(model, device, train_dl, loss_func, opt_func, epoch_idx)

Train model for one epoch, whose index is given in epoch_idx. The training loop will iterate through all the batches of train_dl, using the the loss function given in loss_func and the optimizer given in opt_func

for epoch in range(1, n_epochs+1):
    train(model, device, train_dl, loss_func, opt_func, epoch)
Train loss [Epoch 1]:  15.59)
Train loss [Epoch 2]:  14.98)
Train loss [Epoch 3]:  14.40)
Train loss [Epoch 4]:  13.84)
Train loss [Epoch 5]:  13.30)
Train loss [Epoch 6]:  12.78)
Train loss [Epoch 7]:  12.28)
Train loss [Epoch 8]:  11.81)
Train loss [Epoch 9]:  11.34)
Train loss [Epoch 10]:  10.90)

We can see how the parameters of the regression model are getting closer to the truth values a and b from the linear function.

L(model.named_parameters())
(#2) [('a', Parameter containing:
tensor([0.2301], device='cuda:0', requires_grad=True)),('b', Parameter containing:
tensor([0.2354], device='cuda:0', requires_grad=True))]

Validating the model

Validating the model requires only a forward pass, it's just inference. Disabling gradient calculation with the method torch.no_grad() is useful for inference, when you are sure that you will not call :meth:Tensor.backward().

validate[source]

validate(model, device, dl)

validate(model, device, valid_dl)
Valid loss:  11.92

In order to spot overfitting, it is useful to validate the model after each training epoch.

for epoch in range(1, n_epochs +1):
    train(model, device, train_dl, loss_func, opt_func, epoch)
    validate(model, device, valid_dl)
Train loss [Epoch 1]:  10.48)
Valid loss:  11.46
Train loss [Epoch 2]:  10.07)
Valid loss:  11.01
Train loss [Epoch 3]:  9.68)
Valid loss:  10.58
Train loss [Epoch 4]:  9.30)
Valid loss:  10.16
Train loss [Epoch 5]:  8.94)
Valid loss:  9.77
Train loss [Epoch 6]:  8.59)
Valid loss:  9.38
Train loss [Epoch 7]:  8.26)
Valid loss:  9.02
Train loss [Epoch 8]:  7.93)
Valid loss:  8.66
Train loss [Epoch 9]:  7.62)
Valid loss:  8.32
Train loss [Epoch 10]:  7.33)
Valid loss:  8.00

Abstracting the manual training loop: moving from Pytorch to fastai

from fastai.basics import *
from fastai.callback.progress import ProgressCallback

We can entirely replace the custom training loop with fastai's. That means you can get rid of train(), validate(), and the epoch loop in the original code, and replace it all with a couple of lines.

fastai's training loop lives in a Learner. The Learner is the glue that merges everything together (Datasets, Dataloaders, model and optimizer) and enables to train by just calling a fit function.

fastai's Learner expects DataLoaders to be used, rather than simply one DataLoader, so let's make that. We could just do dls = Dataloaders(train_dl, valid_dl), to keep the PyTorch Dataloaders. However, by using a fastai DataLoader instead, created directly from the TensorDataset objects, we have some automations, such as automatic pushing of the data to GPU.

dls = DataLoaders.from_dsets(train_ds, valid_ds, bs=10)
learn = Learner(dls, model=LinRegModel(), loss_func=nn.MSELoss(), opt_func=SGD)

Now we have everything needed to do a basic fit:

learn.fit(10, lr=1e-3)
epoch train_loss valid_loss time
0 6.368925 6.652071 00:00
1 6.233027 6.391272 00:00
2 6.081585 6.140684 00:00
3 5.927457 5.899893 00:00
4 5.789435 5.668490 00:00
5 5.640563 5.446169 00:00
6 5.487903 5.232544 00:00
7 5.326789 5.027249 00:00
8 5.176148 4.830004 00:00
9 5.014451 4.640473 00:00

Having a Learner allows us to easily gather the model predictions for the validation set, which we can use for visualisation and analysis

inputs, preds, outputs = learn.get_preds(with_input=True)
inputs.shape, preds.shape, outputs.shape
(torch.Size([20, 1]), torch.Size([20, 1]), torch.Size([20, 1]))
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')

Building a simple neural network

For the next example, we will create the dataset by sampling values from a non linear sample $y(x) = -\frac{1}{100}x^7 - x^4 - 2x^2 - 4x + 1$

config = dict(
    ntrain = 1000,
    nvalid = 200,
    nh1 = 200,
    nh2 = 100,
    nh3 = 50
)
run = wandb.init(config=config, project="TS-III")
wandb: Currently logged in as: vrodriguezf (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.10.22 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.21
Syncing run cerulean-sky-3 to Weights & Biases (Documentation).
Project page: https://wandb.ai/vrodriguezf/TS-III
Run page: https://wandb.ai/vrodriguezf/TS-III/runs/3fr7wgq3
Run data is saved locally in /home/victor/work/walk-with-deep-learning/nbs/wandb/run-20210310_164133-3fr7wgq3

nonlinear_function_dataset[source]

nonlinear_function_dataset(n=100, show_plot=False)

Creates a Pytorch's TensorDataset with n random samples of the nonlinear function y = (-1/100)x7 -x4 -2x*2 -4x + 1 with a bit of noise. show_plot decides whether or not to plot the dataset

n = 100
ds = nonlinear_function_dataset(n, show_plot=True)
x, y = ds.tensors
test_eq(x.shape, y.shape)

We will create the trainin and test dataset, and build the Dataloaders with them, this time directly in fastai, using the Dataloaders.from_dsets method.

train_ds = nonlinear_function_dataset(n=1000)
valid_ds = nonlinear_function_dataset(n=200)

Normalization in deep learning are used to make optimization easier by smoothing the loss surface of the network. We will normalize the data based on the mean and std of the train dataset

norm_mean = train_ds.tensors[1].mean()
norm_std = train_ds.tensors[1].std()
train_ds_norm = TensorDataset(train_ds.tensors[0],
                              (train_ds.tensors[1] - norm_mean)/norm_std)
valid_ds_norm = TensorDataset(valid_ds.tensors[0],
                              (valid_ds.tensors[1] - norm_mean)/norm_std)
dls = DataLoaders.from_dsets(train_ds_norm, valid_ds_norm, bs = 32)

We will build a Multi Layer Perceptron with 3 hidden layers. These networks are also known as Feed-Forward Neural Networks. The layers aof this type of networks are known as Fully Connected Layers, because, between every subsequent pair of layers, all the neurons are connected to each other.

Neural network architecture

The easiest way of wrapping several layers in Pytorch is using the nn.Sequential module. It creates a module with a forward method that will call each of the listed layers or functions in turn, without us having to do the loop manually in the forward pass.

class MLP3[source]

MLP3(n_in=1, nh1=200, nh2=100, nh3=50, n_out=1) :: Module

Multilayer perceptron with 3 hidden layers, with sizes nh1, nh2 and nh3 respectively.

x, y = dls.one_batch()
model = MLP3()
output = model(x)
output.shape
torch.Size([32, 1])

In order to have automatic logging of training parameters with Weights & Biases, we add the callback WandbCallback to the call to fit. This process would be similar in any other framework we use to train a neural network (e.g., Keras)

from fastai.callback.wandb import *
learn = Learner(dls, MLP3(), loss_func=nn.MSELoss(), opt_func=Adam, 
                cbs=[WandbCallback()])
learn.fit(10, lr=1e-3)
WandbCallback requires use of "SaveModelCallback" to log best model
WandbCallback was not able to prepare a DataLoader for logging prediction samples -> 'TensorDataset' object has no attribute 'items'
epoch train_loss valid_loss time
0 0.430214 0.358961 00:00
1 0.389660 0.382922 00:00
2 0.340610 0.254645 00:00
3 0.283868 0.225372 00:00
4 0.240199 0.194012 00:00
5 0.205221 0.199028 00:00
6 0.172748 0.100761 00:00
7 0.155810 0.199445 00:00
8 0.137792 0.080331 00:00
9 0.104510 0.063056 00:00
inputs, preds, outputs = learn.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')

Once we have finised training our model of interest, we can close the experiment tracker (Weights & Biases)

run.finish()

Waiting for W&B process to finish, PID 3043
Program ended successfully.
Find user logs for this run at: /home/victor/work/walk-with-deep-learning/nbs/wandb/run-20210310_164133-3fr7wgq3/logs/debug.log
Find internal logs for this run at: /home/victor/work/walk-with-deep-learning/nbs/wandb/run-20210310_164133-3fr7wgq3/logs/debug-internal.log

Run summary:


epoch10
train_loss0.10451
raw_loss0.01843
wd_00.01
sqr_mom_00.99
lr_00.001
mom_00.9
eps_01e-05
_runtime10
_timestamp1615394503
_step309
valid_loss0.06306

Run history:


epoch▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_loss█▆▆▅▅▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss▅▅▆█▅▄▃▇▅▅▂▅▃▃▄▃▂▅▄▂▃▂▂▄▂▁▃▂▂▂▄▃▃▁▁▃▁▂▁▁
wd_0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mom_0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_runtime▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████
_timestamp▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████
_step▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
valid_loss▇█▅▅▄▄▂▄▁▁

Synced 5 W&B file(s), 1 media file(s), 0 artifact file(s) and 0 other file(s)

Let's compare these results with the ones by our previous linear regression model

learn_lin = Learner(dls, LinRegModel(), loss_func=nn.MSELoss(), opt_func=Adam)
learn_lin.fit(20, lr=1e-3)
epoch train_loss valid_loss time
0 0.812581 0.725550 00:00
1 0.691438 0.584854 00:00
2 0.599790 0.499159 00:00
3 0.529202 0.455940 00:00
4 0.483361 0.433711 00:00
5 0.458033 0.425526 00:00
6 0.443160 0.421114 00:00
7 0.437787 0.419616 00:00
8 0.432641 0.418788 00:00
9 0.424229 0.417838 00:00
10 0.416352 0.418070 00:00
11 0.419004 0.417829 00:00
12 0.419980 0.417453 00:00
13 0.421106 0.417162 00:00
14 0.420523 0.417515 00:00
15 0.417929 0.417003 00:00
16 0.416771 0.417299 00:00
17 0.418802 0.417479 00:00
18 0.418064 0.417004 00:00
19 0.420583 0.417536 00:00
inputs, preds, outputs = learn_lin.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')