1 INTRODUCTION
In the literature of game theory, there are two types of game models: a zero-sum model and a nonzero-
sum game model. We know that in the zero-sum two-person game, one player tries to maximize his/her
payoff and another tries to minimize his/her payoff, whereas in the nonzero-sum game, both players
try to minimize their payoff. We can study game theory either in discrete-time or in continuous-time.
In continuous time, the players observe the state space continuously, whereas in discrete-time, they
observe the state space in discrete-time. Also, there are two types of game models with respect to
the risk measure; one is a risk-neutral game, and another is a risk-sensitive game. Risk-sensitive, or
“exponential of integral utility” cost criterion is popular, particularly in finance (see, e.g., Bielecki and
Pliska (1999)) since it has the property to capture the effects of more than first order (expectation)
moments of the cost.
There are large number of literatures for the risk-neutral utility cost criterion for continuous-time
controlled Markov decision processes (CTCMDPs) with different setup, see Guo (2007), Guo and
Hernandez-Lerma (2009), Guo et al. (2015), Guo et al. (2012), Guo and Piunovskiy (2011), Huang
(2018), Piunovskiy and Zhang (2011), Piunovskiy and Zhang (2014) for single controller model and
Guo and Hernandez-Lerma (2003), Guo and Hernandez-Lerma (2005), Guo and Hernandez-Lerma
(2007), wei and Chen (2016), Zhang and Guo (2012) for game models. Players want to ignore risk in
risk-neutral stochastic games because of the additive feature of this criterion. If the variance is high,
the risk-neutral criterion is not useful since there can be issues with optimal control. Regarding risk
preferences, different controllers may exhibit various perspectives. So, risk preferences are considered
by the decision-makers to be the performance criterion. Bell (1995) gave a model containing the
interpretation of risk-sensitive utility. This paper considers finite-horizon risk-sensitive two-person
zero-sum dynamic games for controlled CTMDPs with unbounded rates (transition and payoff rates)
under admissible feedback strategies. State and action spaces are considered to be Borel spaces. The
main target of this manuscript is to find the solution of the optimality equation
(6)
(Shapley equation),
to provide the proof of the existence of game’s value, and to give a proof of complete characterization
of saddle-point equilibrium.
The finite-horizon optimality criterion generally comes up in real-life scenarios. where the cost
criterion may not be risk-neutral. For finite-horizon risk-neutral CTMDPs, see Guo et al. (2015),
Huang (2018) while for the corresponding game, see Wei and Chen (2016) and its references. In this
context, for risk-sensitive finite-horizon controlled CTMDP, one can see Ghosh and Saha (2014), Guo
et al. (2019), Wei (2016), while the research for infinite-horizon risk-sensitive CTMDP are available
in, Ghosh and Saha (2014), Golui and Pal (2022), Guo and Zhang (2018), Kumar and Pal (2013),
Kumar and Pal (2015), Zhang (2017) and the references therein. At the same time the corresponding
finite/infinite-horizon dynamic games are studied in Ghosh et at. (2022), Golui and Pal (2021a), Golui
and Pal (2021b), Golui et al. (2022), wei (2019). Study on CTMDPs for risk-sensitive control on a
denumerable state space are available greatly, see Ghosh and saha (2014), Guo and Liao (2019), Guo et
al. (2019) but some times we see the countable state space dose not help to study some models specially
in chemical reactions problem, water reservoir management problem, inventory problem, cash-flow
problem, insurance problem etc. We see that the literature in controlled CTMDPs considering on
general state space is very narrow. Some exceptions for single controller are Golui and Pal (2022),
Guo et al. (2012), Guo and Zhang (2019), Pal and Pradhan (2019), Piunovskiy and Zhang (2014),
Piunovskiy and Zhang (2020) and for corresponding stochastic games are Bauerle and Rieder (2017),
Golui and Pal (2021b), Guo and Hernandez-Lerma (2007), Wei (2017). So, it is very interesting and
very important to consider the game problem in some general state space. In Guo and Zhang (2019), the
authors studied the same as in Guo et al. (2019) but on general state space, whereas in Wei (2017), the
finite-horizon risk-sensitive zero-sum game for a controlled Markov jump process with bounded costs
and unbounded transition rates was studied. Where in Ghosh et al. (2016), the authors studied dynamic
games on the infinite-horizon for controlled CTMDP by considering bounded transition and payoff rates.
However this boundedness condition is a restrictive conditon for many real life scenarios. Someone may
note queuing and population processes for the requirement of unboundedness in transition and payoff
functions. In Golui and Pal (2021a), finite-horizon continuous-time risk-sensitive zero-sum games for
https://doi.org/10.17993/3cemp.2022.110250.76-92
unbounded transition and payoff function on countable state space is considered. But the extension
of the same results to a general Borel state space were unknown to us. We solve this problem in this
paper. Here we are dealing with finite-horizon risk-sensitive dynamic games employing the unbounded
payoff and transition rates in the class of all admissible feedback strategies on some general Borel
state space, whose results were unknown until now. In this paper, we try to find the solution to the
risk-sensitive finite-horizon optimality equation and, at the same time, try to obtain the existence of
an optimal equilibrium point for this jump process. We take homogeneous game model. In Theorem
4, we prove our final results, i.e., we show that if the cost rates are real-valued functions, then the
Shapley equation (6), has a solution. The existence of optimal-point equilibria is proved by using the
measurable selection theorem in Nowak (1985). The claim of uniqueness of the solution is due to the
well known Feynman-Kac formula. The value of the game has also been established.
The remaining portions of this work are presented. Section 2 describes the model of our stochastic
game, some definitions, and the finite-horizon cost criterion. In Section 3, preliminary results, conditions,
and the extension of the Feynman-Kac formula are provided. Also, we establish the probabilistic
representation of the solution of the finite horizon optimality equation (6) there. The uniqueness of this
optimal solution as well as the game’s value are proved in section 4. We also completely characterize
the Nash equilibrium among the class of admissible Markov strategies for this game model here. In
Section 5, we verify our results with an example.
2 THE ZERO-SUM DYNAMIC GAME MODEL
First, we introduce a time-homogeneous continuous-time zero-sum dynamic game model in this section,
which contains the following:
G:= {X, U, V, (U(x)⊂U, x ∈X),(V(x)⊂V,x ∈X),q(·|x, u, v),c(x, u, v),g(x)}.(1)
Here
X
is our state space which is a Borel space and the corresponding Borel
σ
-algebra is
B
(
X
).
The action spaces are
U
and
V
for first and second players, respectively, and are considered to be
Borel spaces. Their corresponding Borel
σ
-algebras are, respectively,
B
(
U
)and
B
(
V
). For each
x∈X
,
the admissible action spaces are denoted by
U
(
x
)
∈B
(
U
)and
V
(
x
)
∈B
(
V
), respectively and these
spaces are assumed to be compact. Now let us define a Borel subset of
X×U×V
denoted by
K:= {(x, u, v)|x∈X,u∈U(x),v ∈V(x)}.
Next, for any (
x, u, v
)
∈K
, we know that the transition rate of the CTMDPs denoted by
q
(
·|x, u, v
)
is a signed kernel on
X
such that
q
(
D|x, u, v
)
≥
0where (
x, u, v
)
∈K
and
x/∈D
. Also,
q
(
·|x, u, v
)is
assumed to be conservative i.e., q(X|x, u, v)≡0, as well as stable i.e.,
q∗(x) := sup
u∈U(x),v∈V(x)
[qx(u, v)] <∞∀x∈X,(2)
qx
(
u, v
) :=
−q
(
{x}|x, u, v
)
≥
0
for all
(
x, u, v
)
∈K.
Our running cost is
c
, assumed to be measurable on
K
and the terminal cost is
g
, assumed to be measurable on
X
. These costs are taken to be real-valued.
The dynamic game is played as following. The players take actions continuously. At time moment
t≥
0, if the system’s state is
x∈S
, the players take their own actions
ut∈U
(
x
)and
vt∈V
(
x
)
independently as their corresponding strategies. As a results the following events occurs:
•
the first player gets an reward at rate
c
(
x, ut,v
t
)immediately and second player gives a cost at a
rate c(x, ut,v
t); and
•
staying for a random time in state
x
, the system leaves the state
x
at a rate given by the quantity
qx
(
ut,v
t
), and it jumps to a set
D
,(
x/∈D
) with some probability determined by
q(D|x, ut,v
t)
qx(ut,v
t)
(for details, see Proposition B.8 in Guo and Hernandez-Lerma (2009), p. 205 for details).
Now suppose the system is at a new state
y
. Then the above operation is replicated till the fixed time
ˆ
T>
0. Moreover, at time
ˆ
T
if the system occupies a state
yˆ
T
, second player pays a terminal cost
g
(
yˆ
T
)
to first player.
https://doi.org/10.17993/3cemp.2022.110250.76-92
3C Empresa. Investigación y pensamiento crítico. ISSN: 2254-3376
Ed. 50 Vol. 11 N.º 2 August - December 2022
78