$30
Module 4 – Self-Check
As noted in the video lectures, it is possible to solve value iteration for a 1-d
Markov Decision Process in a spreadsheet by explicitly modeling the π, V, and Q
arrays. You have been provided with a starter spreadsheet (module-4-selfcheck.xlsx)
For the self-check, you have been provided with a starter spreadsheet (module4-self-check.xlsx). Fill in the formulas for Value Iteration (stochastic version) so
that it looks like the following:
1. the cell B1 is named “discount_rate”, you can refer to it by that name in
cell formulas, for example, “=discount_rate*10”
2. the cell B2 is named “planned”, you can refer to it by that name in
formulas.
3. The cell B3 is named “surprise”, you can refer to it by that name in
formulas.
4. if(condition,then,else) can be used to test values in cells. It can also be
nested (the first one will need an =).
5. =max(cellref1, cellref2) will return the maximum value.
6. You do not need to fill in formulas for the goal states 1 or 7 except in the
case of V(s).
7. Use “<” for “left”, “?” for “pick random” and “>” for “right” in your policy
formula.
8. There isn’t a good way to control the number of iterations automatically, so
epsilon is there to let you know when you should stop copying.
You should get the same results as above and converge in 10 iterations.