use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
device
We'll create a dataset that contains $(x,y)$ pairs sampled from the linear function $y = ax + b+ \epsilon$. To do this, we'll create a PyTorch's TensorDataset
.
A PyTorch tensor is nearly the same thing as a NumPy array. The vast majority of methods and operators supported by NumPy on these structures are also supported by PyTorch, but PyTorch tensors have additional capabilities. One major capability is that these structures can live on the GPU, in which case their computation will be optimized for the GPU and can run much faster (given lots of values to work on). In addition, PyTorch can automatically calculate derivatives of these operations, including combinations of operations. These two things are critical for deep learning
a = 2
b = 3
n = 100
data = linear_function_dataset(a, b, n, show_plot=True)
test_eq(type(data), TensorDataset)
In every machine/deep learning experiment, we need to have at least two datasets:
- training: used to train the model
- validation: used to validate the model after each training step. It allows to detect overfitting and adjust the hyperparameters of the model properly
train_ds = linear_function_dataset(a, b, n=100, show_plot=True)
valid_ds = linear_function_dataset(a, b, n=20, show_plot=True)
A dataloader combines a dataset an a sampler that samples data into batches, and provides an iterable over the given dataset.
bs = 10
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=bs, shuffle=False)
for i, data in enumerate(train_dl, 1):
x, y = data
print(f'batch {i}: x={x.shape} ({x.device}), y={y.shape} ({y.device})')
The class torch.nn.Module
is the base structure for all models in Pytorch. It mostly helps to register all the trainable parameters. A module is an object of a class that inherits from the PyTorch nn.Module
class.
To implement an nn.Module
you just need to:
- Make sure the superclass init is called first when you initialize it.
- Define any parameters of the model as attributes with nn.Parameter. To tell
Module
that we want to treat a tensor as a parameter, we have to wrap it in thenn.Parameter
class. All PyTorch modules usenn.Parameter
for any trainable parameters. This class doesn't actually add any functionality (other than automatically callingrequires_grad
). It's only used as a "marker" to show what to include in parameters: - Define a forward function that returns the output of your model.
model = LinRegModel()
pa, pb = model.parameters()
pa, pa.shape, pb, pb.shape
Objects of this class behave identically to standard Python functions, in that you can call them using parentheses and they will return the activations of a model.
x = torch.randn(10, 1)
out = model(x)
x, x.shape, out, out.shape
The loss is the thing the machine is using as the measure of performance to decide how to update model parameters. The loss function is simple enough for a regression problem, we'll just use the Mean Square Error (MSE)
loss_func = nn.MSELoss()
loss_func(x, out)
We have data, a model, and a loss function; we only need one more thing we can fit a model, and that's an optimizer.
opt_func = torch.optim.SGD(model.parameters(), lr = 1e-3)
During training, we need to push our model and our batches to the GPU. Calling cuda()
on a model or a tensor this class puts all these parameters on the GPU:
model = model.to(device)
To train a model, we will need to compute all the gradients of a given loss with respect to its parameters, which is known as the backward pass. The forward pass is where we compute the output of the model on a given input, based on the matrix products. PyTorch computes all the gradients we need with a magic call to loss.backward
. The backward pass is the chain rule applied multiple times, computing the gradients from the output of our model and going back, one layer at a time.
In Pytorch, Each basic function we need to differentiate is written as a torch.autograd.Function
object that has a forward
and a backward
method. PyTorch will then keep trace of any computation we do to be able to properly run the backward pass, unless we set the requires_grad
attribute of our tensors to False
.
For minibatch gradient descent (the usual way of training in deep learning), we calculate gradients on batches. Before moving onto the next batch, we modify our model's parameters based on the gradients. For each iteration through our dataset (which would be called an epoch), the optimizer would perform as many updates as we have batches.
There are two important methods in a Pytorch optimizer:
zero_grad
: In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes.zero_grad
just loops through the parameters of the model and sets the gradients to zero. It also callsdetach_
, which removes any history of gradient computation, since it won't be needed afterzero_grad
.
n_epochs = 10
for epoch in range(1, n_epochs+1):
train(model, device, train_dl, loss_func, opt_func, epoch)
We can see how the parameters of the regression model are getting closer to the truth values a
and b
from the linear function.
L(model.named_parameters())
Validating the model requires only a forward pass, it's just inference. Disabling gradient calculation with the method torch.no_grad()
is useful for inference, when you are sure that you will not call :meth:Tensor.backward()
.
validate(model, device, valid_dl)
In order to spot overfitting, it is useful to validate the model after each training epoch.
for epoch in range(1, n_epochs +1):
train(model, device, train_dl, loss_func, opt_func, epoch)
validate(model, device, valid_dl)
from fastai.basics import *
from fastai.callback.progress import ProgressCallback
We can entirely replace the custom training loop with fastai's. That means you can get rid of train()
, validate()
, and the epoch loop in the original code, and replace it all with a couple of lines.
fastai's training loop lives in a Learner
. The Learner is the glue that merges everything together (Datasets, Dataloaders, model and optimizer) and enables to train by just calling a fit
function.
fastai's Learner
expects DataLoaders to be used, rather than simply one DataLoader, so let's make that. We could just do dls = Dataloaders(train_dl, valid_dl)
, to keep the PyTorch Dataloaders. However, by using a fastai DataLoader
instead, created directly from the TensorDataset
objects, we have some automations, such as automatic pushing of the data to GPU.
dls = DataLoaders.from_dsets(train_ds, valid_ds, bs=10)
learn = Learner(dls, model=LinRegModel(), loss_func=nn.MSELoss(), opt_func=SGD)
Now we have everything needed to do a basic fit
:
learn.fit(10, lr=1e-3)
Having a Learner allows us to easily gather the model predictions for the validation set, which we can use for visualisation and analysis
inputs, preds, outputs = learn.get_preds(with_input=True)
inputs.shape, preds.shape, outputs.shape
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')
For the next example, we will create the dataset by sampling values from a non linear sample $y(x) = -\frac{1}{100}x^7 - x^4 - 2x^2 - 4x + 1$
config = dict(
ntrain = 1000,
nvalid = 200,
nh1 = 200,
nh2 = 100,
nh3 = 50
)
run = wandb.init(config=config, project="TS-III")
n = 100
ds = nonlinear_function_dataset(n, show_plot=True)
x, y = ds.tensors
test_eq(x.shape, y.shape)
We will create the trainin and test dataset, and build the Dataloaders with them, this time directly in fastai, using the Dataloaders.from_dsets
method.
train_ds = nonlinear_function_dataset(n=1000)
valid_ds = nonlinear_function_dataset(n=200)
Normalization in deep learning are used to make optimization easier by smoothing the loss surface of the network. We will normalize the data based on the mean and std of the train dataset
norm_mean = train_ds.tensors[1].mean()
norm_std = train_ds.tensors[1].std()
train_ds_norm = TensorDataset(train_ds.tensors[0],
(train_ds.tensors[1] - norm_mean)/norm_std)
valid_ds_norm = TensorDataset(valid_ds.tensors[0],
(valid_ds.tensors[1] - norm_mean)/norm_std)
dls = DataLoaders.from_dsets(train_ds_norm, valid_ds_norm, bs = 32)
We will build a Multi Layer Perceptron with 3 hidden layers. These networks are also known as Feed-Forward Neural Networks. The layers aof this type of networks are known as Fully Connected Layers, because, between every subsequent pair of layers, all the neurons are connected to each other.
The easiest way of wrapping several layers in Pytorch is using the nn.Sequential
module. It creates a module with a forward
method that will call each of the listed layers or functions in turn, without us having to do the loop manually in the forward pass.
x, y = dls.one_batch()
model = MLP3()
output = model(x)
output.shape
In order to have automatic logging of training parameters with Weights & Biases, we add the callback WandbCallback
to the call to fit
. This process would be similar in any other framework we use to train a neural network (e.g., Keras)
from fastai.callback.wandb import *
learn = Learner(dls, MLP3(), loss_func=nn.MSELoss(), opt_func=Adam,
cbs=[WandbCallback()])
learn.fit(10, lr=1e-3)
inputs, preds, outputs = learn.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')
Once we have finised training our model of interest, we can close the experiment tracker (Weights & Biases)
run.finish()
Let's compare these results with the ones by our previous linear regression model
learn_lin = Learner(dls, LinRegModel(), loss_func=nn.MSELoss(), opt_func=Adam)
learn_lin.fit(20, lr=1e-3)
inputs, preds, outputs = learn_lin.get_preds(with_input = True)
show_TensorFunction1D(inputs, outputs, y_hat=preds, marker='.')