Training callbacks

    Various callbacks to customize training behavior

    Fit just pct of an epoch, then stop

    1. learn = synth_learner()
    2. learn.fit(1, cbs=ShortEpochCallback())
    1. learn = synth_learner()
    epochtrain_lossvalid_losstime
    014.86797500:00

    GradientAccumulation(n_acc=32) ::

    Accumulate gradients before updating weights

    When the number of steps per accumulation is higher than the number of batches, the parameters (and therefore validation loss) don’t change at all:

    1. learn = synth_learner()
    2. learn.fit(1, lr=0.01, cbs=GradientAccumulation(n_acc=1000))
    3. # ensure valid_loss didn't change
    4. assert learn.recorder.values[-1][1] == learn.recorder.values[0][1]

    Clip norm of gradients

    Normally if we use a learning rate that is too high, our training will diverge. This even happens if we use mixed precision training, which avoid infinities by using dynamic loss scaling, but still diverges:

    1. learn = synth_learner(lr=1.1, cuda=True)
    2. learn.fit(3, cbs=fp16)
    epochtrain_lossvalid_losstime
    038.21416925.26901200:00
    1377.146088890.01178000:00
    2839.3919079965.71289100:00

    By adding the GradientClip callback, the gradient norm_type (default:2) norm is clipped to at most max_norm (default:1) using nn.utils.clip_grad_norm_, which can avoid loss divergence:

    1. set_seed(99)
    2. learn = synth_learner(lr=1.1, cuda=True)

    set_bn_eval[source]

    set_bn_eval(m:, use_eval=True)

    Set bn layers in eval mode for all recursive children of m.

    BnFreeze is useful when you’d like to train two separate models that have a common feature extractor / body. The only part of the model that’s different is the head that you attach for transfer learning.

    ) doesn’t suffice here as the BatchNorm layers are trainable by default, and running mean and std of batches are tracked. For feature extractors to fully match, you need to set train_bn=False and these stats need to be frozen as well, which is precisely the function of .

    1. path = untar_data(URLs.MNIST_TINY)
    2. dls = ImageDataLoaders.from_folder(path, valid_pct=0.2)

    We first demonstrate the mismatch of the running stats when using only train_bn=False, by creating a Learner…:

    …and grab the first layer, and store its running mean:

    1. m = learn1.model[0][1].running_mean.clone()

    You can see that now that running mean has changed:

    1. learn1.fit(1, lr=0.02)
    2. test_ne(to_detach(learn1.model[0][1].running_mean), m)
    epochtrain_lossvalid_losstime
    01.1527010.46889200:02

    When we use the BnFreeze callback, the running statistics will not be changed during training. This is often important for getting good results from transfer learning.

    1. learn1 = cnn_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False, cbs=BnFreeze)
    2. m = learn1.model[0][1].running_mean.detach().clone()
    3. learn1.fit(1, lr=0.02)
    4. test_eq(to_detach(learn1.model[0][1].running_mean), m)

    ©2021 fast.ai. All rights reserved.
    Site last generated: Mar 31, 2021