rainbow_agent.RainbowAgent

    A compact implementation of a simplified Rainbow agent.

    Methods

    Initializes the agent and constructs the components of its graph.

    Args:

    • sess: tf.Session, for executing ops.
    • num_actions: int, number of actions the agent can take at any
      state.
    • vmax: float, the value distribution support is [-vmax, vmax].
    • gamma: float, discount factor with the usual RL meaning.
    • update_horizon: int, horizon at which updates are performed, the
      ‘n’ in n-step update.
    • min_replay_history: int, number of transitions that should be
      experienced before the agent begins training its value function.
    • update_period: int, period between DQN updates.
    • target_update_period: int, update period for the target network.
    • : function expecting 4 parameters: (decay_period, step,
      warmup_steps, epsilon). This function should return the epsilon value used
      for exploration during training.
    • epsilon_train: float, the value to which the agent’s epsilon is
      eventually decayed during training.
    • epsilon_eval: float, epsilon used when evaluating the agent.
    • epsilon_decay_period: int, length of the epsilon decay schedule.
    • replay_scheme: str, ‘prioritized’ or ‘uniform’, the sampling scheme
      of the replay memory.
    • tf_device: str, Tensorflow device on which the agent’s graph is
      executed.
    • use_staging: bool, when True use a staging area to prefetch the
      next training batch, speeding training up by about 30%.
    • optimizer: tf.train.Optimizer, for training the value function.

    begin_episode

      Returns the agent’s first action for this episode.

      Args:

      • observation: numpy array, the environment’s initial observation.

      Returns:

      int, the selected action.

      This is used for checkpointing. It will return a dictionary containing all
      non-TensorFlow objects (to be saved into a file by the caller), and it saves all
      TensorFlow objects into a checkpoint file.

      Args:

      • checkpoint_dir: str, directory where TensorFlow objects will be
        saved.
      • : int, iteration number to use for naming the
        checkpoint file.

      Returns:

      A dict containing additional Python objects to be checkpointed by the
      experiment. If the checkpoint directory does not exist, returns None.

      end_episode

      1. end_episode(reward)

      Signals the end of the episode to the agent.

      We store the observation of the current time step, which is the last observation
      of the episode.

      Args:

      • reward: float, the last reward from the environment.

      We store the observation of the last time step since we want to store it with
      the reward.

      Args:

      • reward: float, the reward received from the agent’s most recent
        action.
      • observation: numpy array, the most recent observation.

      Returns:

      int, the selected action.

      unbundle

      1. unbundle(
      2. checkpoint_dir,
      3. iteration_number,
      4. bundle_dictionary
      5. )

      Restores the agent from a checkpoint.

      Restores the agent’s Python objects to those specified in bundle_dictionary, and
      restores the TensorFlow objects to those specified in the checkpoint_dir. If the
      checkpoint_dir does not exist, will not reset the agent’s state.

      Args:

      • checkpoint_dir: str, path to the checkpoint saved by tf.Save.
      • : int, checkpoint version, used when restoring
        replay buffer.

      Returns: