1. 程式人生 > 其它 >Representation and General Value Functions——General Value Functions(GVFs)

Representation and General Value Functions——General Value Functions(GVFs)

https://sites.ualberta.ca/~pilarski/docs/theses/Sherstan_Craig_D_202009_PhD.pd 原文連結

General value functions (GVFs) make two relaxations to the value function definition we have already considered (Sutton, Modayil, et al., 2011). First, we are free to choose any signal available to the agent as the prediction target, not just reward

. We refer to the prediction target as the cumulant(n. [數] 累積量,累積數), C. Secondly, the discount parameter, γ, is replaced by a transition dependent continuation function: γt+1 ≡ γ(St,At,St+1) (White, 2017) (Note that given this definition γ need not lie in [0,1], and can even be complex valued (De Asis et al., 2018)). This function is referred to by several names in the literature including the continuation function, discount and timescale. With these two generalizations we define the return as

Like a value function, a GVF is defined by three components: the policy, the timescale, and the prediction target. GVFs allow the agent to express representation elements in the form of predictive questions. Consider the following examples for a mobile robot: