use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
device

device(type='cuda')

Linear regression model in Pytorch

Datasets and Dataloaders

We'll create a dataset that contains $(x,y)$ pairs sampled from the linear function $y = ax + b+ \epsilon$. To do this, we'll create a PyTorch's TensorDataset.

A PyTorch tensor is nearly the same thing as a NumPy array. The vast majority of methods and operators supported by NumPy on these structures are also supported by PyTorch, but PyTorch tensors have additional capabilities. One major capability is that these structures can live on the GPU, in which case their computation will be optimized for the GPU and can run much faster (given lots of values to work on). In addition, PyTorch can automatically calculate derivatives of these operations, including combinations of operations. These two things are critical for deep learning

a = 2
b = 3
n = 100
data = linear_function_dataset(a, b, n, show_plot=True)
test_eq(type(data), TensorDataset)

In every machine/deep learning experiment, we need to have at least two datasets:

training: used to train the model
validation: used to validate the model after each training step. It allows to detect overfitting and adjust the hyperparameters of the model properly

train_ds = linear_function_dataset(a, b, n=100, show_plot=True)
valid_ds = linear_function_dataset(a, b, n=20, show_plot=True)

A dataloader combines a dataset an a sampler that samples data into batches, and provides an iterable over the given dataset.

bs = 10
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=bs, shuffle=False)

for i, data in enumerate(train_dl, 1):
    x, y = data
    print(f'batch {i}: x={x.shape} ({x.device}), y={y.shape} ({y.device})')

batch 1: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 2: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 3: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 4: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 5: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 6: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 7: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 8: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 9: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)
batch 10: x=torch.Size([10, 1]) (cpu), y=torch.Size([10, 1]) (cpu)

Defining a linear regression model in Pytorch

The class torch.nn.Module is the base structure for all models in Pytorch. It mostly helps to register all the trainable parameters. A module is an object of a class that inherits from the PyTorch nn.Module class.

To implement an nn.Module you just need to:

Make sure the superclass init is called first when you initialize it.
Define any parameters of the model as attributes with nn.Parameter. To tell Module that we want to treat a tensor as a parameter, we have to wrap it in the nn.Parameter class. All PyTorch modules use nn.Parameter for any trainable parameters. This class doesn't actually add any functionality (other than automatically calling requires_grad). It's only used as a "marker" to show what to include in parameters:
Define a forward function that returns the output of your model.

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

model = LinRegModel()
pa, pb = model.parameters()
pa, pa.shape, pb, pb.shape

(Parameter containing:
 tensor([-0.1623], requires_grad=True),
 torch.Size([1]),
 Parameter containing:
 tensor([-0.3689], requires_grad=True),
 torch.Size([1]))

Objects of this class behave identically to standard Python functions, in that you can call them using parentheses and they will return the activations of a model.

x = torch.randn(10, 1)
out = model(x)
x, x.shape, out, out.shape

(tensor([[ 1.6960],
         [ 0.3434],
         [-0.5049],
         [-0.3151],
         [ 2.5839],
         [ 0.8094],
         [-0.8670],
         [-0.4008],
         [-1.6220],
         [ 1.1903]]),
 torch.Size([10, 1]),
 tensor([[-0.6443],
         [-0.4247],
         [-0.2870],
         [-0.3178],
         [-0.7884],
         [-0.5003],
         [-0.2282],
         [-0.3039],
         [-0.1056],
         [-0.5622]], grad_fn=<AddBackward0>),
 torch.Size([10, 1]))

Loss function and optimizer

The loss is the thing the machine is using as the measure of performance to decide how to update model parameters. The loss function is simple enough for a regression problem, we'll just use the Mean Square Error (MSE)

loss_func = nn.MSELoss()
loss_func(x, out)

tensor(2.4990, grad_fn=<MseLossBackward>)

We have data, a model, and a loss function; we only need one more thing we can fit a model, and that's an optimizer.

opt_func = torch.optim.SGD(model.parameters(), lr = 1e-3)

Training loop

During training, we need to push our model and our batches to the GPU. Calling cuda() on a model or a tensor this class puts all these parameters on the GPU:

model = model.to(device)

To train a model, we will need to compute all the gradients of a given loss with respect to its parameters, which is known as the backward pass. The forward pass is where we compute the output of the model on a given input, based on the matrix products. PyTorch computes all the gradients we need with a magic call to loss.backward. The backward pass is the chain rule applied multiple times, computing the gradients from the output of our model and going back, one layer at a time.

In Pytorch, Each basic function we need to differentiate is written as a torch.autograd.Function object that has a forward and a backward method. PyTorch will then keep trace of any computation we do to be able to properly run the backward pass, unless we set the requires_grad attribute of our tensors to False.

For minibatch gradient descent (the usual way of training in deep learning), we calculate gradients on batches. Before moving onto the next batch, we modify our model's parameters based on the gradients. For each iteration through our dataset (which would be called an epoch), the optimizer would perform as many updates as we have batches.

There are two important methods in a Pytorch optimizer:

zero_grad: In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. zero_grad just loops through the parameters of the model and sets the gradients to zero. It also calls detach_, which removes any history of gradient computation, since it won't be needed after zero_grad.

n_epochs = 10

for epoch in range(1, n_epochs+1):
    train(model, device, train_dl, loss_func, opt_func, epoch)

Train loss [Epoch 1]:  15.59)
Train loss [Epoch 2]:  14.98)
Train loss [Epoch 3]:  14.40)
Train loss [Epoch 4]:  13.84)
Train loss [Epoch 5]:  13.30)
Train loss [Epoch 6]:  12.78)
Train loss [Epoch 7]:  12.28)
Train loss [Epoch 8]:  11.81)
Train loss [Epoch 9]:  11.34)
Train loss [Epoch 10]:  10.90)

We can see how the parameters of the regression model are getting closer to the truth values a and b from the linear function.

L(model.named_parameters())

(#2) [('a', Parameter containing:
tensor([0.2301], device='cuda:0', requires_grad=True)),('b', Parameter containing:
tensor([0.2354], device='cuda:0', requires_grad=True))]

Validating the model

Validating the model requires only a forward pass, it's just inference. Disabling gradient calculation with the method torch.no_grad() is useful for inference, when you are sure that you will not call :meth:Tensor.backward().

validate(model, device, valid_dl)

Valid loss:  11.92

In order to spot overfitting, it is useful to validate the model after each training epoch.

for epoch in range(1, n_epochs +1):
    train(model, device, train_dl, loss_func, opt_func, epoch)
    validate(model, device, valid_dl)

Train loss [Epoch 1]:  10.48)
Valid loss:  11.46
Train loss [Epoch 2]:  10.07)
Valid loss:  11.01
Train loss [Epoch 3]:  9.68)
Valid loss:  10.58
Train loss [Epoch 4]:  9.30)
Valid loss:  10.16
Train loss [Epoch 5]:  8.94)
Valid loss:  9.77
Train loss [Epoch 6]:  8.59)
Valid loss:  9.38
Train loss [Epoch 7]:  8.26)
Valid loss:  9.02
Train loss [Epoch 8]:  7.93)
Valid loss:  8.66
Train loss [Epoch 9]:  7.62)
Valid loss:  8.32
Train loss [Epoch 10]:  7.33)
Valid loss:  8.00

Abstracting the manual training loop: moving from Pytorch to fastai

from fastai.basics import *
from fastai.callback.progress import ProgressCallback

We can entirely replace the custom training loop with fastai's. That means you can get rid of train(), validate(), and the epoch loop in the original code, and replace it all with a couple of lines.

fastai's training loop lives in a Learner. The Learner is the glue that merges everything together (Datasets, Dataloaders, model and optimizer) and enables to train by just calling a fit function.

fastai's Learner expects DataLoaders to be used, rather than simply one DataLoader, so let's make that. We could just do dls = Dataloaders(train_dl, valid_dl), to keep the PyTorch Dataloaders. However, by using a fastai DataLoader instead, created directly from the TensorDataset objects, we have some automations, such as automatic pushing of the data to GPU.

dls = DataLoaders.from_dsets(train_ds, valid_ds, bs=10)

learn = Learner(dls, model=LinRegModel(), loss_func=nn.MSELoss(), opt_func=SGD)

Now we have everything needed to do a basic fit:

learn.fit(10, lr=1e-3)

Having a Learner allows us to easily gather the model predictions for the validation set, which we can use for visualisation and analysis

inputs, preds, outputs = learn.get_preds(with_input=True)
inputs.shape, preds.shape, outputs.shape

(torch.Size([20, 1]), torch.Size([20, 1]), torch.Size([20, 1]))

show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')

Building a simple neural network

For the next example, we will create the dataset by sampling values from a non linear sample $y(x) = -\frac{1}{100}x^7 - x^4 - 2x^2 - 4x + 1$

config = dict(
    ntrain = 1000,
    nvalid = 200,
    nh1 = 200,
    nh2 = 100,
    nh3 = 50
)

run = wandb.init(config=config, project="TS-III")

wandb: Currently logged in as: vrodriguezf (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.10.22 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

n = 100
ds = nonlinear_function_dataset(n, show_plot=True)
x, y = ds.tensors
test_eq(x.shape, y.shape)

We will create the trainin and test dataset, and build the Dataloaders with them, this time directly in fastai, using the Dataloaders.from_dsets method.

train_ds = nonlinear_function_dataset(n=1000)
valid_ds = nonlinear_function_dataset(n=200)

Normalization in deep learning are used to make optimization easier by smoothing the loss surface of the network. We will normalize the data based on the mean and std of the train dataset

norm_mean = train_ds.tensors[1].mean()
norm_std = train_ds.tensors[1].std()

train_ds_norm = TensorDataset(train_ds.tensors[0],
                              (train_ds.tensors[1] - norm_mean)/norm_std)
valid_ds_norm = TensorDataset(valid_ds.tensors[0],
                              (valid_ds.tensors[1] - norm_mean)/norm_std)

dls = DataLoaders.from_dsets(train_ds_norm, valid_ds_norm, bs = 32)

We will build a Multi Layer Perceptron with 3 hidden layers. These networks are also known as Feed-Forward Neural Networks. The layers aof this type of networks are known as Fully Connected Layers, because, between every subsequent pair of layers, all the neurons are connected to each other.

Neural network architecture

The easiest way of wrapping several layers in Pytorch is using the nn.Sequential module. It creates a module with a forward method that will call each of the listed layers or functions in turn, without us having to do the loop manually in the forward pass.

x, y = dls.one_batch()
model = MLP3()
output = model(x)
output.shape

torch.Size([32, 1])

In order to have automatic logging of training parameters with Weights & Biases, we add the callback WandbCallback to the call to fit. This process would be similar in any other framework we use to train a neural network (e.g., Keras)

from fastai.callback.wandb import *

learn = Learner(dls, MLP3(), loss_func=nn.MSELoss(), opt_func=Adam, 
                cbs=[WandbCallback()])

learn.fit(10, lr=1e-3)

WandbCallback requires use of "SaveModelCallback" to log best model
WandbCallback was not able to prepare a DataLoader for logging prediction samples -> 'TensorDataset' object has no attribute 'items'

inputs, preds, outputs = learn.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')

Once we have finised training our model of interest, we can close the experiment tracker (Weights & Biases)

run.finish()

Let's compare these results with the ones by our previous linear regression model

learn_lin = Learner(dls, LinRegModel(), loss_func=nn.MSELoss(), opt_func=Adam)

learn_lin.fit(20, lr=1e-3)

inputs, preds, outputs = learn_lin.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')

epoch	train_loss	valid_loss	time
0	6.368925	6.652071	00:00
1	6.233027	6.391272	00:00
2	6.081585	6.140684	00:00
3	5.927457	5.899893	00:00
4	5.789435	5.668490	00:00
5	5.640563	5.446169	00:00
6	5.487903	5.232544	00:00
7	5.326789	5.027249	00:00
8	5.176148	4.830004	00:00
9	5.014451	4.640473	00:00

epoch	train_loss	valid_loss	time
0	0.430214	0.358961	00:00
1	0.389660	0.382922	00:00
2	0.340610	0.254645	00:00
3	0.283868	0.225372	00:00
4	0.240199	0.194012	00:00
5	0.205221	0.199028	00:00
6	0.172748	0.100761	00:00
7	0.155810	0.199445	00:00
8	0.137792	0.080331	00:00
9	0.104510	0.063056	00:00

epoch	10
train_loss	0.10451
raw_loss	0.01843
wd_0	0.01
sqr_mom_0	0.99
lr_0	0.001
mom_0	0.9
eps_0	1e-05
_runtime	10
_timestamp	1615394503
_step	309
valid_loss	0.06306

epoch	▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train_loss	█▆▆▅▅▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁
raw_loss	▅▅▆█▅▄▃▇▅▅▂▅▃▃▄▃▂▅▄▂▃▂▂▄▂▁▃▂▂▂▄▃▃▁▁▃▁▂▁▁
wd_0	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
sqr_mom_0	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
mom_0	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_0	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_runtime	▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████
_timestamp	▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████
_step	▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
valid_loss	▇█▅▅▄▄▂▄▁▁

epoch	train_loss	valid_loss	time
0	0.812581	0.725550	00:00
1	0.691438	0.584854	00:00
2	0.599790	0.499159	00:00
3	0.529202	0.455940	00:00
4	0.483361	0.433711	00:00
5	0.458033	0.425526	00:00
6	0.443160	0.421114	00:00
7	0.437787	0.419616	00:00
8	0.432641	0.418788	00:00
9	0.424229	0.417838	00:00
10	0.416352	0.418070	00:00
11	0.419004	0.417829	00:00
12	0.419980	0.417453	00:00
13	0.421106	0.417162	00:00
14	0.420523	0.417515	00:00
15	0.417929	0.417003	00:00
16	0.416771	0.417299	00:00
17	0.418802	0.417479	00:00
18	0.418064	0.417004	00:00
19	0.420583	0.417536	00:00

Deep learning 101 with Pytorch and fastai

Linear regression model in Pytorch

Datasets and Dataloaders

`linear_function_dataset`[source]

Defining a linear regression model in Pytorch

`class` `LinRegModel`[source]

Loss function and optimizer

Training loop

`train`[source]

Validating the model

`validate`[source]

Abstracting the manual training loop: moving from Pytorch to fastai

Building a simple neural network

`nonlinear_function_dataset`[source]

`class` `MLP3`[source]

Run summary:

Run history:

Deep learning 101 with Pytorch and fastai

Linear regression model in Pytorch

Datasets and Dataloaders

linear_function_dataset[source]

Defining a linear regression model in Pytorch

class LinRegModel[source]

Loss function and optimizer

Training loop

train[source]

Validating the model

validate[source]

Abstracting the manual training loop: moving from Pytorch to fastai

Building a simple neural network

nonlinear_function_dataset[source]

class MLP3[source]

Run summary:

Run history:

`linear_function_dataset`[source]

`class` `LinRegModel`[source]

`train`[source]

`validate`[source]

`nonlinear_function_dataset`[source]

`class` `MLP3`[source]