Discrete Markov Chains

Markov Chains: Definitions and Representations

A stochastic process X={x(t):tT}X = \{ x(t): t\in T\} is a collection of random variables.

There are two elements:

  • Time tt:
    • discrete time (TT is a countably infinite set; under this case, we call ‘Markov chain’)
    • continuous time (under this case, we call ‘Markov process’)
  • Space Ω\Omega:
    • discrete space (XtX_{t} comes from a countably infinite set)
    • continuous space.

Markov chain is a discrete-time process for which the future behaviour, given the past and the present, only depends on the present and not on the past.

Markov process is the continuous-time version of a Markov chain.

Definition 1. [Markov chain] A discrete time stochastic process $ X_0, X_1, X_2, $. . . is a Markov chain if

P(Xt=atXt1=at1,Xt2=at2,...,X0=a0)=P(Xt=atXt1=at1)=Pat1,atP(X_{t} = a_t | X_{t-1} = a_{t-1}, X_{t-2} = a_{t-2}, ..., X_0 = a_0) = P(X_{t} = a_t | X_{t-1} = a_{t-1}) = P_{a_{t-1}, a_{t}}

Remark 1: This is time-homogeneous markov chain, for t\forall t, for at1,atΩ\forall a_{t-1}, a_{t} \in \Omega, the transition probability Pat1,atP_{a_{t-1}, a_{t} } is the same.

Remark 2: In DDPM, it is not a time-homogeneous chain, as the transition probability at t is obtained by a network(t).

The state XtX_{t} depends on the previous state Xt1X_{t-1} but is independent of the particular history Xt2,Xt3,...X_{t-2}, X_{t-3},.... This is called the Markov property or memoryless property.

The Markov property does not imply that XtX_{t} is independent of the random variables X0X_{0}, X1X_{1},…, Xt2X_{t-2}; it just implies that any dependency of XtX_{t} on the past is captured in the value of Xt1X_{t-1}.

The Markov chain is uniquely defined by the one-step transition probability matrix P:

P=(P0,0P0,1P0,jPi,0Pi,1Pi,j)P = \begin{pmatrix} P_{0,0} & P_{0, 1} & \cdots & P_{0, j} & \cdots\\ \vdots & \vdots & \ddots & \vdots& \vdots\\ P_{i,0} & P_{i, 1} & \cdots & P_{i, j} & \cdots\\ \vdots & \vdots & \ddots & \vdots& \vdots\\ \end{pmatrix}

where Pi,jP_{i,j} is the probability of transition from state ii to state jj. Pi,j=P(Xt=jXt1=i),i,jΩP_{i,j} = P(X_{t} = j| X_{t-1} = i), i,j \in \Omega. For i\forall i, j0Pi,j=1\sum_{j \geq 0} P_{i,j} = 1.

Classification of States

For simplicity, we assume that the state space Ω\Omega is finite.

Communicating class

Definition 2. [Communicating class] A state jj is reachable from state ii if there exists a positive integer nn such that Pi,j(n)>0P_{i,j}^{(n)} > 0. We write iji \rightarrow j. If jj is reachable
from ii, and ii is reachable from jj, then the states ii and jj are said to communicate, denoted by iji \leftrightarrow j. A communicating class CC is a maximal set of states that communicate with each other. No state in CC communicates with any state not in CC.

Irreducible

Definition 3. A Markov chain is irreducible if all states belong to one communicating class.

This means that any state can be reached from any other state. For i,jΩ\forall i, j \in \Omega, Pi,j>0P_{i,j} > 0.

Lemma 1. A finite Markov chain is irreducible if and only if its graph representation is a strongly connected graph.

Transient vs Recurrent states

Let ri,jtr_{i,j}^{t} denote the probability that the chain, starting at state ii, the first time transition to state jj occurs at time tt. That is,

ri,jt=P(Xt=j,Xsj,1st1X0=i)r_{i,j}^{t} = P(X_{t} = j, X_{s} \neq j, \forall 1 \leq s \leq t-1 | X_{0} = i)

Definition 4. A state is recurrent if t1ri,it=1\sum_{t \geq 1} r_{i,i}^{t} = 1 and it is transient if t1ri,it<1\sum_{t \geq 1} r_{i,i}^{t} < 1. A Markov chain is recurrent if every state in the chain is recurrent.

  • If state i is recurrent then, once the chain visits that state, it will (with probability 1)
    eventually return to that state. Hence the chain will visit state ii over and over again,
    infinitely often.

  • A transient state has the property that a Markov chain starting at this state returns to this state only finitely often, with probability 1.

  • If one state in a communicating class is transient (respectively, recurrent) then all states in that class are transient (respectively, recurrent).

Definition 5. An irreducible Markov chain is called recurrent if at least one (equivalently, every) state in this chain is recurrent. An irreducible Markov chain is called transient if at least one (equivalently, every) state in this chain is transient.

Let μi=t1tri,it\mu_{i} = \sum_{t \geq 1} t \cdot r_{i,i}^{t} denote the expected time to return to state ii when starting at state ii.

Definition 6. A state ii is positive recurrent if μi<\mu_{i} < \infty and null recurrent if μi=\mu_{i} = \infty.

Here we give an example of a Markov chain that has null recurrent states. Consider the following markov chain whose states are the positive integers.

Fig. 1. An example of a Markov chain that has null recurrent states

Starting at state 1, the probability of not having returned to state 1 within the first tt steps is

j=1tjj+1=1t+1.\prod_{j=1}^{t} \frac{j}{j+1} = \frac{1}{t+1}.

The probability of never returning to state 1 from state 1 is 0, and state 1 is
recurrent. Thus, the probability of the first time transition to state 11 occurs at time tt is

r1,1t=1t1t+1=1t(t+1).r_{1,1}^{t} = \frac{1}{t} \cdot \frac{1}{t+1} = \frac{1}{t(t+1)}.

The expected number of steps until the first return to state 1 when starting at state 1 is

μ1=t=1tr1,1t=t=11t+1=.\mu_{1} = \sum_{t = 1}^{\infty} t \cdot r_{1,1}^{t} = \sum_{t = 1}^{\infty} \frac{1}{t+1} = \infty.

State 1 is recurrent but null recurrent.

Lemma 2. In a finite Markov chain:

  1. at least one state is recurrent; and
  1. all recurrent states are positive recurrent.

Thus, all states of a finite, irreducible Markov chain are positive recurrent.

Periodic vs Aperiodic states

Definition 7. A state jj in a discrete time Markov chain is periodic if there exists an integer k>1k>1 such that P(Xt+s=jXt=j)=0P(X_{t+s}= j | X_t = j) = 0 unless ss is divisible by kk. A discrete time Markov chain is periodic if any state in the chain is periodic. A state or chain that is not periodic is aperiodic.

A state ii is periodic means that for s=k,2k,3k,...s = k, 2k, 3k,..., P(Xt+s=jXt=j)>0P(X_{t+s}= j | X_t = j) > 0.

NB: k > 1

Ergodic

Definition 8. An aperiodic, positive recurrent state is an ergodic state. A Markov chain is ergodic if all its states are ergodic.

Corollary 1. Any finite, irreducible, and aperiodic Markov chain is an ergodic chain.

Stationary distribution

Consider the two-state “broken printer” Markov chain:

Transition diagram for the two-state broken printer chain

There are two state (0 and 1) in this Markov chain, and assume that the initial distribution is

P(X0=0)=βα+β,P(X0=1)=αα+β.P(X_0 = 0) = \frac{\beta}{\alpha+\beta}, \qquad P(X_0 = 1) = \frac{\alpha}{\alpha+\beta}.

Then, according to the transition probability matrix PP, after one step, the distribution is

\begin{align*} P(X_1 = 0) &= P(X_0 = 0)P(X_1 = 0 | X_0 = 0) + P(X_0 = 1)P(X_1 = 0 | X_0 = 1) \\ &= \frac{\beta}{\alpha+\beta} \cdot (1-\alpha) + \frac{\alpha}{\alpha+\beta} \cdot \beta = \frac{\beta}{\alpha+\beta}, \\ P(X_1 = 1) &= P(X_0 = 0)P(X_1 = 1 | X_0 = 0) + P(X_0 = 1)P(X_1 = 1 | X_0 = 1) \\ &= \frac{\beta}{\alpha+\beta} \cdot \alpha + \frac{\alpha}{\alpha+\beta} \cdot (1-\beta) = \frac{\alpha}{\alpha+\beta}. \end{align*}

Apparently, the distribution of X1X_1 is the same as the initial distribution. Similarly, we can prove that the distribution of XtX_t is the same as the initial distribution for any tt. Here, π=(βα+β,αα+β)\pi = (\frac{\beta}{\alpha+\beta}, \frac{\alpha}{\alpha+\beta}) is called stationary distribution.

Definition 9. A probability distribution π=(πi)\pi = (\pi_i), iΩπi=1\sum_{i \in \Omega} \pi_i = 1(row vector) on the state space Ω\Omega is called a stationary distribution (or an equilibrium distribution) for the Markov chain with transition probability matrix PP if π=πP\pi = \pi P, equivalently, πj=iΩπiPi,j\pi_j = \sum_{i \in \Omega}\pi_i P_{i,j} for all jΩj \in \Omega.

  • One interpretation of the stationary distribution: if we started off a thousand Markov chains, choosing each starting position to be state ii with probability πi\pi_i, then(roughly) 1000πj1000 \pi_j of them would be in state jj
    at any time in the future – but not necessarily the same ones each time.

  • If a chain ever reaches a stationary distribution then it maintains that distribution for all future time, and thus a stationary distribution represents a steady state or an equilibrium in the chain’s behavior.

Finding a stationary distribution

Consider the following no-claims discount Markov chain with state space Ω={1,2,3}\Omega = \{1,2,3\} and transition matrix

P=(143401403401434)P = \begin{pmatrix} \frac{1}{4} & \frac{3}{4} & 0\\ \frac{1}{4} & 0 & \frac{3}{4}\\ 0 & \frac{1}{4} & \frac{3}{4} \end{pmatrix}

  • Step 1:
    Assume $\pi = {\pi_1, \pi_2, \pi_3} $ is a stationary distribution. According to the definition 9 of stationary distribution, we need to solve the following equations:

\begin{align*} \pi_1 &= \frac{1}{4}\pi_1 + \frac{1}{4}\pi_2, \\ \pi_2 &= \frac{3}{4}\pi_1 + \frac{1}{4}\pi_3, \\ \pi_3 &= \frac{3}{4}\pi_2 + \frac{3}{4}\pi_3. \end{align*}

Adding the normalising condition π1+π2+π3=1\pi_1 + \pi_2 + \pi_3 = 1, we get four equations in three unknown parameters.

  • Step 2:
    Choose one of the parameters, say π1\pi_1, and solve for the other two parameters in terms of π1\pi_1. We get

π1=14π1+14π2π2=3π1,π3=3π2=9π1.\pi_1 = \frac{1}{4}\pi_1 + \frac{1}{4}\pi_2 \Rightarrow \pi_2 = 3\pi_1, \qquad \pi_3 = 3\pi_2 = 9\pi_1.

  • Step 3:
    Combining with the normalising condition, we get

π1+3π1+9π1=1π1=113,π2=313,π3=913.\pi_1 + 3\pi_1 + 9\pi_1 = 1 \Rightarrow \pi_1 = \frac{1}{13}, \qquad \pi_2 = \frac{3}{13}, \qquad \pi_3 = \frac{9}{13}.

Finally, we get the stationary distribution π=(113,313,913)\pi = (\frac{1}{13}, \frac{3}{13}, \frac{9}{13}).

Existence and uniqueness

Given a Markov chaine, how can we know whether it has a stationary distribution? If it has, is it unique? At this part, we will answer these questions.

Some notations:

  • Hitting time to hit the state jj: Hj=min{t{0,1,2,...}:Xt=j}H_{j} = \min \{ t \in \{0, 1, 2,...\}: X_t = j\}. Note that here we include time t=0t = 0.

  • Hitting probability to hit the state jj staring from state ii: hi,j=P(Xt=j,for some t0X0=i)=P(Hj<X0=i)=t0ri,jth_{i,j} = P(X_t = j, \text{for some} \ t \geq 0 | X_0 = i) = P(H_{j} < \infty | X_0 = i) = \sum_{t \geq 0} r_{i,j}^{t}.

Note that this is different from ri,jtr_{i,j}^{t}, which denotes the probability that the chain, starting at state ii, the first time transition to state jj occurs at time tt.

We also have

hi,j={kΩPi,khk,j,ifji,1,ifj=i.h_{i,j} = \begin{cases} \sum_{k \in \Omega}P_{i,k}h_{k,j} & , & \text{if} \quad j \ne i, \\ 1 & , & \text{if} \quad j = i. \end{cases}

  • Expected hitting time: ηi,j=E(HjX0=i)=t0tri,jt\eta_{i,j} = E(H_{j} | X_0 = i) = \sum_{t \geq 0} t \cdot r_{i,j}^{t}. The expected time until we hit state jj starting from state ii.
    We also have

ηi,j={1+kΩPi,kηk,j,ifji,0,ifj=i.\eta_{i,j} = \begin{cases} 1 + \sum_{k \in \Omega}P_{i,k}\eta_{k,j} & , & if j \ne i, \\ 0 & , & if j = i. \end{cases}

(For the first case, we add 1 because we need to consider the first step from state ii to state kk.)

  • Return time: Mi=min{t{1,2,...}:Xt=i}M_i = \min \{ t \in \{1, 2,...\}: X_t = i\}. It is different from HiH_{i}, as we exclude time t=0t = 0. It is the first time that the chain returns to state ii after t=0t = 0.

  • Return probability: mi=P(Xt=i for some n1X0=i)=P(Mi<X0=i)=t>1ri,it.m_{i} = P(X_t = i \ \text{for some} \ n \geq 1 | X_0 = i) = P(M_i < \infty | X_0 = i) = \sum_{t>1}r_{i,i}^{t}.

  • Expected return time: μi=E(MiX0=i)=t1tri,it\mu_{i} = E(M_i | X_0 = i) = \sum_{t \geq 1} t \cdot r_{i,i}^{t}. The expected time until we return to state ii starting from state ii.

mi=jΩPi,jhj,i,μi=1+jΩPi,jηj,i.m_{i} = \sum_{j \in \Omega} P_{i,j}h_{j,i}, \qquad \mu_{i} = 1 + \sum_{j \in \Omega} P_{i,j}\eta_{j,i}.

Theorem 1. Consider an irreducible Markov chain (finite or infinite), (1) if it is positive recurrent, \exists an unique stationary distribution π\pi, such that πi=1μi\pi_i = \frac{1}{\mu_{i}}. (2) if it is null recurrent or transient, no stationary distribution exists.

Remark 3: If the chain is finite irreducible, it must be positive recurrent, thus it has an unique stationary distribution.

Remark 4: If the Markov chain is not irreducible, we can decompose the state space into several communicating classes. Then, we can consider each communicating class separately.

  • If none of the classes are positive recurrent, then no stationary distribution exists.

  • If exactly one of the classes is positive recurrent (and therefore closed), then there exists a unique stationary distribution, supported only on that closed class.

  • If more the one of the classes are positive recurrent, then many stationary distributions will exist.

Now, we give the proof of Theorem 1. We first prove that if a Markov chain is irreducible and positive recurrent, then there exists a stationary distribution. Next, we will prove the stationary distribution is unique. Since the second part with the null recurrent or transitive Markov chains is less important and more complicated, we will omit it. If you are interested in it, you can refer to the book Markov Chains by James Norris.

Proof.
(1) Suppose that (X0,X1...)(X_0, X_1 ...) is a recurrent Markov chain, which can be positive recurrent or null recurrent. Then we can desigh a stationary distribution as follows. (If we can desigh a stationary distribution, then it must be existed.)

Let νi\nu_i be the expected number of visits to ii before we return back to kk,

\begin{align*} \nu_i &= \mathbb{E}(\# \text{visits to $i$ before returning to } k | X_0 = k) \\ &= \mathbb{E}\sum_{t=1}^{M_k} P(X_t = i | X_0 = k) \\ &= \mathbb{E}\sum_{t = 0}^{M_k - 1} P(X_t = i | X_0 = k) \end{align*}

The last equation holds because of $ P(X_0 = i | X_0 = k) = 0$ and $ P(X_{M_k} = i | X_0 = k) = 0$.

If we want design a stationary distribution, it must statisfy πP=π\pi P = \pi and iΩπi=1\sum_{i \in \Omega}\pi_i = 1.

(a) We first prove that νP=ν\nu P = \nu.

\begin{align*} \sum_{i \in \Omega} \nu_i P_{i,j} &= \mathbb{E}\sum_{i \in \Omega} \sum_{t = 0}^{M_k - 1} P(X_t = i, X_{t+1} = j | X_0 = k) \\ &= \mathbb{E}\sum_{t = 0}^{M_k - 1} \sum_{i \in \Omega} P(X_t = i, X_{t+1} = j | X_0 = k) \\ &= \mathbb{E} \sum_{t = 0}^{M_k - 1} P(X_{t+1} = j | X_0 = k) \\ &= \mathbb{E} \sum_{t = 1}^{M_k } P(X_{t} = j | X_0 = k) \\ &= \mathbb{E} \sum_{t = 0}^{M_k - 1} \nu_i \\ &= \nu_j. \end{align*}

(b) Next, what we need to do is to normalize ν\nu to get a stationary distribution. We have

iΩνi=iΩEt=0Mk1P(Xt=iX0=k)=Et=0Mk1iΩP(Xt=iX0=k)=E(MkX0=i)=μk.\sum_{i \in \Omega} \nu_i = \sum_{i \in \Omega} \mathbb{E} \sum_{t = 0}^{M_k - 1} P(X_t = i | X_0 = k) =\mathbb{E} \sum_{t = 0}^{M_k - 1} \sum_{i \in \Omega} P(X_t = i | X_0 = k) = E(M_k | X_0 = i) = \mu_k.

Thus, we can define πi=νi/μk\pi_i = \nu_i/\mu_k, π={πi,iΩ}\pi = \{\pi_i, i \in \Omega\} is one of the stationary distribution.

(2) Next, we prove that if a Markov chain is irreducible and positive recurrent, then the stationary distribution is unique and is given by πj=1μj\pi_j = \frac{1}{\mu_j}.
Given a stationary distribution π\pi, if we prove that for all ii, πj==1μj\pi_j == \frac{1}{\mu_j}, then we prove that the stationary distribution is unique.

Remember that the expected hitting time:

ηi,j=1+kΩPi,kηk,j,ji(eq:1)\eta_{i,j} = 1 + \sum_{k \in \Omega}P_{i,k}\eta_{k,j}, j \ne i \qquad (eq:1)

We multiply both sides of (eq:1) by πi\pi_i and sum over i(ij)i (i \ne j) to get

ijπiηi,j=ijπi+ijkΩπiPi,kηk,j\sum_{i \ne j} \pi_i \eta_{i,j} = \sum_{i \ne j} \pi_i + \sum_{i \ne j} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j}

Since ηj,j=0\eta_{j,j} = 0, we can rewrite the above equation as

iΩπiηi,j=ijπi+ijkΩπiPi,kηk,j.(eq:2)\sum_{i \in \Omega} \pi_i \eta_{i,j} = \sum_{i \ne j} \pi_i + \sum_{i \ne j} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j}. \qquad (eq:2)

(The above equality lacks jj, and we also want to design πj=1/μj\pi_j = 1/\mu_j.)
Remember that the expected return time:

μj=1+iΩPj,iηi,j.(eq:3)\mu_{j} = 1 + \sum_{i \in \Omega} P_{j,i}\eta_{i,j}. \qquad (eq:3)

We multiply both sides of (eq:2) by πj\pi_j to get

πjμj=πj+kΩπjPj,kηk,j(eq:4) \pi_j \mu_{j} =\pi_j + \sum_{k \in \Omega} \pi_j P_{j,k}\eta_{k,j} \qquad (eq:4)

Adding (eq:2) and (eq:4), we get

\begin{align*} \sum_{i \in \Omega} \pi_i \eta_{i,j} + \pi_j \mu_{j} &= \sum_{i \in \Omega} \pi_i + \sum_{i \in \Omega} \sum_{k \in \Omega} \pi_i P_{i,k}\eta_{k,j} \\ &= 1 + \sum_{k \in \Omega} \sum_{i \in \Omega} \pi_i P_{i,k}\eta_{k,j} \\ &= 1 + \sum_{k \in \Omega} \pi_k \eta_{k,j} \qquad (\text{since} \sum_{i \in \Omega} \pi_i P_{i,k} = \pi_k) \\ \end{align*}

Since the Markov chain is irreducible and positive recurrent, that means all states belong to a communication class and the expected return time of each state is finite. Thus, the space Ω\Omega is a finite dimensional space. We can substract kΩπkηk,j\sum_{k \in \Omega} \pi_k \eta_{k,j} and $\sum_{i \in \Omega} \pi_i \eta_{i,j} $ (equal) from both sides of the above equation to get

πjμj=1,\pi_j \mu_{j}=1,

which means πj=1/μj\pi_j = 1/\mu_j. Similarly, we can prove that πi=1/μi\pi_i = 1/\mu_i for all iΩi \in \Omega.

Theorem 2. (Limit theorem) Consider an irreducible, aperiodic Markov chain (maybe infinite), we have limtPi,jts=1μj\lim\limits_{t \to \infty} P_{i,j}^{t} s= \frac{1}{\mu_{j}}. Specially,

(1) Suppose the Markov chain is positive recurrent. Then limtPi,jt=πj=1μj\lim\limits_{t \to \infty} P_{i,j}^{t} = \pi_j = \frac{1}{\mu_{j}}.

(2) Suppose the Markov chain is null recurrent or transient. Then there is no limite probability.

Three conditions for convergence to an equilibrium probability distribution: irreducibility, aperiodicity, and positive recurrence. The limit probability

P=(π1π2πNπ1π2πNπ1π2πN)P = \begin{pmatrix} \pi_1 & \pi_2 & \cdots & \pi_N\\ \pi_1 & \pi_2 & \cdots & \pi_N\\ \vdots & \vdots & \ddots & \vdots\\ \pi_1 & \pi_2 & \cdots & \pi_N\\ \end{pmatrix}

where each row is identical.

Define Vi,jt={n<tXn=j}V_{i,j}^{t} = |\{ n < t | X_n = j\}|.
Vi,jtV_{i,j}^{t} is the number of visits to state jj before time tt starting from state ii. Then we can interpret Vi,jt/tV_{i,j}^{t}/t as the proportion of time up to time tt spent in state jj.

Theorem 3. [Ergodic theorem] Consider an irreducible Markov chain, we have limtVi,jt/t=1μj\lim\limits_{t \to \infty} V_{i,j}^{t}/t = \frac{1}{\mu_{j}} almost surely. Spectially,

(1) Suppose the Markov chain is positive recurrent. Then limtVi,jt/t=πj=1μj\lim\limits_{t \to \infty} V_{i,j}^{t}/t = \pi_j = \frac{1}{\mu_{j}} almost surely.

(2) Suppose the Markov chain is null recurrent or transient. Then $ V_{i,j}^{t}/t \to 0$ almost surely for all jj.

almost surely means that the convergence probability of the event is 1.

Theorem 4. [Detailed balance condition] Consider a finite, irreducible, and ergodic Markov chain with transition matrix PP. If there are nonnegative numbers πˉ=(π0,π1,...,πn)\bar{\pi} = (\pi_0, \pi_1, ..., \pi_n) such that i=0nπi=1\sum_{i=0}^{n} \pi_i = 1 and if, for any pair of states i,ji, j,

πiPi,j=πjPj,i,\pi_i P_{i,j} = \pi_{j} P_{j,i},

then πˉ\bar{\pi} is the stationary distribution corresponding to PP.

Proof.

iπiPi,j=iπjPj,i=πj\sum_{i} \pi_i P_{i,j} = \sum_{i}\pi_{j} P_{j,i} = \pi_{j}

Thus, πˉ=πˉP\bar{\pi} = \bar{\pi}P. Since this is a finite, irreducible, and ergodic Markov chain, πˉ\bar{\pi} must be the unique stationary distribution of the Markov chain.

Remark 5: Theorem 2 is a sufficient but not necessary condition.

Reference

  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2021-2026 Xue Yu
  • Visitors: | Views: