dqn_agent.DQNAgent

    Methods

    Initializes the agent and constructs the components of its graph.

    Args:

    • sess: tf.Session, for executing ops.
    • gamma: float, discount factor with the usual RL meaning.
    • update_horizon: int, horizon at which updates are performed, the
      ‘n’ in n-step update.
    • min_replay_history: int, number of transitions that should be
      experienced before the agent begins training its value function.
    • update_period: int, period between DQN updates.
    • target_update_period: int, update period for the target network.
    • epsilon_fn: function expecting 4 parameters: (decay_period, step,
      warmup_steps, epsilon). This function should return the epsilon value used
      for exploration during training.
    • epsilon_train: float, the value to which the agent’s epsilon is
      eventually decayed during training.
    • : float, epsilon used when evaluating the agent.
    • epsilon_decay_period: int, length of the epsilon decay schedule.
    • tf_device: str, Tensorflow device on which the agent’s graph is
      executed.
    • use_staging: bool, when True use a staging area to prefetch the
      next training batch, speeding training up by about 30%.
    • max_tf_checkpoints_to_keep: int, the number of TensorFlow
      checkpoints to keep.
    • optimizer: tf.train.Optimizer, for training the value function.

    begin_episode

      Returns the agent’s first action for this episode.

      Args:

      • observation: numpy array, the environment’s initial observation.

      Returns:

      int, the selected action.

      Returns a self-contained bundle of the agent’s state.

      Args:

      • checkpoint_dir: str, directory where TensorFlow objects will be
        saved.
      • iteration_number: int, iteration number to use for naming the
        checkpoint file.

      Returns:

      A dict containing additional Python objects to be checkpointed by the
      experiment. If the checkpoint directory does not exist, returns None.

      1. end_episode(reward)

      Signals the end of the episode to the agent.

      We store the observation of the current time step, which is the last observation
      of the episode.

      Args:

      • reward: float, the last reward from the environment.

      Records the most recent transition and returns the agent’s next action.

      Args:

      • reward: float, the reward received from the agent’s most recent
        action.
      • observation: numpy array, the most recent observation.

      Returns:

      int, the selected action.

      unbundle

      1. unbundle(
      2. checkpoint_dir,
      3. iteration_number,
      4. bundle_dictionary
      5. )

      Restores the agent from a checkpoint.

      Restores the agent’s Python objects to those specified in bundle_dictionary, and
      restores the TensorFlow objects to those specified in the checkpoint_dir. If the
      checkpoint_dir does not exist, will not reset the agent’s state.

      Args:

      • checkpoint_dir: str, path to the checkpoint saved by tf.Save.
      • bundle_dictionary: dict, containing additional Python objects owned
        by the agent.

      Returns:

      bool, True if unbundling was successful.