BACKPROPAGATION AND OPTIMIZER

    In oneflow.optim, there are various optimizers that simplify the code of back propagation.

    This article will first introduce the basic concepts of back propagation and then show you how to use the oneflow.optim class.

    In order to make it easier for readers to understand the relationship between backpropagation and autograd, a training process of a simple model implemented with numpy is provided here:

    output:

    1. 50/500 loss:0.0034512376878410578
    2. 100/500 loss:1.965487399502308e-06
    3. 150/500 loss:1.05524122773204e-09
    4. 200/500 loss:3.865352482534945e-12
    5. 250/500 loss:3.865352482534945e-12
    6. 300/500 loss:3.865352482534945e-12
    7. 350/500 loss:3.865352482534945e-12
    8. 400/500 loss:3.865352482534945e-12
    9. 450/500 loss:3.865352482534945e-12
    10. 500/500 loss:3.865352482534945e-12
    11. w:[[2.000001 ]
    12. [2.9999993]]

    Note that the loss function expression we selected is , so the code for gradient of loss to parameter w is:

    1. def gradient(x, y, y_pred):
    2. return np.matmul(x.T, 2 * (y_pred - y))

    In summary, a complete iteration in the training includes the following steps:

    1. The model calculates the predicted value based on the input and parameters (y_pred)
    2. Calculate the gradient of loss to parameter
    3. Update parameter(s)

    1 and 2 are forward propagation process; 3 and 4 are back propagation process.

    Hyperparameters are parameters related to model training settings, which can affect the efficiency and results of model training.As in the above code ITER_COUNT,LR are hyperparameters.

    Using the optimizer class in for back propagation will be more concise.

    First, prepare the data and model. The convenience of using Module is that you can place the hyperparameters in Module for management.

    1. import oneflow as flow
    2. x = flow.tensor([[1, 2], [2, 3], [4, 6], [3, 1]], dtype=flow.float32)
    3. y = flow.tensor([[8], [13], [26], [9]], dtype=flow.float32)
    4. class MyLrModule(flow.nn.Module):
    5. def __init__(self, lr, iter_count):
    6. super().__init__()
    7. self.w = flow.nn.Parameter(flow.tensor([[2], [1]], dtype=flow.float32))
    8. self.lr = lr
    9. self.iter_count = iter_count
    10. def forward(self, x):
    11. return flow.matmul(x, self.w)
    12. model = MyLrModule(0.01, 500)
    1. loss = flow.nn.MSELoss(reduction="sum")

    The logic of back propagation is wrapped in optimizer. We choose here, You can choose other optimization algorithms as needed, such as Adam and .

    When the optimizer is constructed, the model parameters and learning rate are given to SGD. Then the optimizer.step() is called, and it automatically completes the gradient of the model parameters and updates the model parameters according to the SGD algorithm.

    When the above preparations are completed, we can start training:

    1. y_pred = model(x)
    2. l = loss(y_pred, y)
    3. if (i + 1) % 50 == 0:
    4. print(f"{i+1}/{model.iter_count} loss:{l.numpy()}")
    5. optimizer.zero_grad()
    6. l.backward()
    7. optimizer.step()
    8. print(f"\nw: {model.w}")

    output:

    1. 50/500 loss:0.003451163647696376
    2. 100/500 loss:1.965773662959691e-06
    3. 150/500 loss:1.103217073250562e-09
    4. 200/500 loss:3.865352482534945e-12
    5. 250/500 loss:3.865352482534945e-12
    6. 300/500 loss:3.865352482534945e-12
    7. 350/500 loss:3.865352482534945e-12
    8. 400/500 loss:3.865352482534945e-12
    9. 450/500 loss:3.865352482534945e-12
    10. 500/500 loss:3.865352482534945e-12

    Please activate JavaScript for write a comment in LiveRe