Training callbacks

Various callbacks to customize training behavior

Fit just pct of an epoch, then stop

learn = synth_learner()
learn.fit(1, cbs=ShortEpochCallback())

learn = synth_learner()

epoch	train_loss	valid_loss	time
0	14.867975	00:00

GradientAccumulation(n_acc=32) ::

Accumulate gradients before updating weights

When the number of steps per accumulation is higher than the number of batches, the parameters (and therefore validation loss) don’t change at all:

learn = synth_learner()
learn.fit(1, lr=0.01, cbs=GradientAccumulation(n_acc=1000))
# ensure valid_loss didn't change
assert learn.recorder.values[-1][1] == learn.recorder.values[0][1]

Clip norm of gradients

Normally if we use a learning rate that is too high, our training will diverge. This even happens if we use mixed precision training, which avoid infinities by using dynamic loss scaling, but still diverges:


learn = synth_learner(lr=1.1, cuda=True)
learn.fit(3, cbs=fp16)

epoch	train_loss	valid_loss	time
0	38.214169	25.269012	00:00
1	377.146088	890.011780	00:00
2	839.391907	9965.712891	00:00

By adding the GradientClip callback, the gradient norm_type (default:2) norm is clipped to at most max_norm (default:1) using nn.utils.clip_grad_norm_, which can avoid loss divergence:

set_seed(99)
learn = synth_learner(lr=1.1, cuda=True)

`set_bn_eval`[source]

set_bn_eval(m:, use_eval=True)

Set bn layers in eval mode for all recursive children of m.

BnFreeze is useful when you’d like to train two separate models that have a common feature extractor / body. The only part of the model that’s different is the head that you attach for transfer learning.

) doesn’t suffice here as the BatchNorm layers are trainable by default, and running mean and std of batches are tracked. For feature extractors to fully match, you need to set train_bn=False and these stats need to be frozen as well, which is precisely the function of .

path = untar_data(URLs.MNIST_TINY)
dls  = ImageDataLoaders.from_folder(path, valid_pct=0.2)

We first demonstrate the mismatch of the running stats when using only train_bn=False, by creating a Learner…:

…and grab the first layer, and store its running mean:

m = learn1.model[0][1].running_mean.clone()

You can see that now that running mean has changed:

learn1.fit(1, lr=0.02)
test_ne(to_detach(learn1.model[0][1].running_mean), m)

epoch	train_loss	valid_loss	time
0	1.152701	0.468892	00:02

When we use the BnFreeze callback, the running statistics will not be changed during training. This is often important for getting good results from transfer learning.

learn1 = cnn_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False, cbs=BnFreeze)
m = learn1.model[0][1].running_mean.detach().clone()
learn1.fit(1, lr=0.02)
test_eq(to_detach(learn1.model[0][1].running_mean), m)

©2021 fast.ai. All rights reserved.
Site last generated: Mar 31, 2021

Training

Training callbacks

set_bn_eval[source]

`set_bn_eval`[source]