Dealing with NaNs

    Most frequently, the cause would be that some of the hyperparameters, especiallylearning rates, are set incorrectly. A high learning rate can blow up your wholemodel into NaN outputs even within one epoch of training. So the first andeasiest solution is try to lower it. Keep halving your learning rate until youstart to get reasonable output values.

    Other hyperparameters may also play a role. For example, are your trainingalgorithms involve regularization terms? If so, are their correspondingpenalties set reasonably? Search a wider hyperparameter space with a few (one ortwo) training epochs each to see if the NaNs could disappear.

    Some models can be very sensitive to the initialization of weight vectors. Ifthose weights are not initialized in a proper range, then it is not surprisingthat the model ends up with yielding NaNs.

    Run in NanGuardMode, DebugMode, or MonitorMode

    DebugMode can also help. Run your code in DebugMode with flagmode=DebugMode,DebugMode.check_py=False. This will give you clue about whichop is causing this problem, and then you can inspect that op in more detail. Fordetails of using , please refer to .

    Theano’s MonitorMode provides another helping hand. It can be used to stepthrough the execution of a function. You can inspect the inputs and outputs ofeach node being executed when the function is called. For how to use that,please check “How do I Step through a Compiled Function?”.

    After you have located the op which causes the problem, it may turn out that theNaNs yielded by that op are related to numerical issues. For example, may result in NaNs for those nodes who have learned toyield a low probability p(x) for some input x.

    Algorithm Related

    The Theano flag nvcc.fastmath=True can genarate NaN. Don’t setthis flag while debugging NaN.

    NaN Introduced by AllocEmpty

    AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory using a multiplication before applying an operation could cause NaN if NaN is already present in the memory, since 0 * NaN => NaN.

    Using replaces _AllocEmpty by Alloc{0}, which is helpful to diagnose where NaNs come from. Please note that when running in NanGuardMode, this optimizer is not included by default. Therefore, it might be helpful to use them both together.