dqn_agent.DQNAgent

Methods

Initializes the agent and constructs the components of its graph.

Args:

sess: tf.Session, for executing ops.
gamma: float, discount factor with the usual RL meaning.
update_horizon: int, horizon at which updates are performed, the
‘n’ in n-step update.
min_replay_history: int, number of transitions that should be
experienced before the agent begins training its value function.
update_period: int, period between DQN updates.
target_update_period: int, update period for the target network.
epsilon_fn: function expecting 4 parameters: (decay_period, step,
warmup_steps, epsilon). This function should return the epsilon value used
for exploration during training.
epsilon_train: float, the value to which the agent’s epsilon is
eventually decayed during training.
: float, epsilon used when evaluating the agent.
epsilon_decay_period: int, length of the epsilon decay schedule.
tf_device: str, Tensorflow device on which the agent’s graph is
executed.
use_staging: bool, when True use a staging area to prefetch the
next training batch, speeding training up by about 30%.
max_tf_checkpoints_to_keep: int, the number of TensorFlow
checkpoints to keep.
optimizer: tf.train.Optimizer, for training the value function.

`begin_episode`

Returns the agent’s first action for this episode.

Args:

observation: numpy array, the environment’s initial observation.

Returns:

int, the selected action.

Returns a self-contained bundle of the agent’s state.

Args:

checkpoint_dir: str, directory where TensorFlow objects will be
saved.
iteration_number: int, iteration number to use for naming the
checkpoint file.

Returns:

A dict containing additional Python objects to be checkpointed by the
experiment. If the checkpoint directory does not exist, returns None.

end_episode(reward)

Signals the end of the episode to the agent.

We store the observation of the current time step, which is the last observation
of the episode.

Args:

reward: float, the last reward from the environment.

Records the most recent transition and returns the agent’s next action.

Args:

reward: float, the reward received from the agent’s most recent
action.
observation: numpy array, the most recent observation.

Returns:

int, the selected action.

`unbundle`

unbundle(
    checkpoint_dir,
    iteration_number,
    bundle_dictionary
)

Restores the agent from a checkpoint.

Restores the agent’s Python objects to those specified in bundle_dictionary, and
restores the TensorFlow objects to those specified in the checkpoint_dir. If the
checkpoint_dir does not exist, will not reset the agent’s state.

Args:

checkpoint_dir: str, path to the checkpoint saved by tf.Save.
bundle_dictionary: dict, containing additional Python objects owned
by the agent.

Returns:

bool, True if unbundling was successful.