Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Document related concepts

Nash equilibrium wikipedia , lookup

Rock–paper–scissors wikipedia , lookup

Strategic management wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Porter's generic strategies wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Transcript

LP. Lecture Game theory Chapter 11: game theory I matrix games I optimal strategies I von Neumann’s minmax theorem I connection to LP I useful LP modeling of (certain) minmax and maxmin problems 1 / 11 Example: Paper-Scissors-Rock (= saks-pose-stein) The game: I I Two persons independently choose one of the three options: Paper, Scissors or Rock Rules: Paper beats Rock, Rock beats Scissors, Scissors beats Paper. Payoff matrix: 0 −1 1 0 −1 A= 1 −1 1 0 I I Row player (R) chooses a row i, the Column player (K) chooses a column j, and the payoff is the entry aij : the row player pays the column player aij kroner (NOK). Similar for a general m × n matrix A = [aij ]; this is called a Matrix game. 2 / 11 Pure strategies The choice above is called a pure strategy (or deterministic strategy): choose a row (or column). Note: in Paper-Scissors-Rock no pure strategy will be guaranteed to win (if the game is repeated), e.g., if R always chooses Paper K will soon choose Scissors. I Goal: analyse Matrix games in general Define PR (i) = maxj≤n aij : largest payoff for R using strategy i I PK (j) = mini≤m aij : smallest payoff for K using strategy j V∗ = maxj≤n PK (j) : largest guaranteed payoff to K V∗ = mini≤m PR (i) : smallest guaranteed payoff from R If PK (j) = V∗ then j is called a pure maxmin strategy. If PR (i) = V ∗ then i is called a pure minmax strategy. If V∗ = V ∗ , we say that the game has a value, namely V := V∗ = V ∗ . In Paper-Scissors-Rock: V∗ = −1 < 1 = V ∗ . 3 / 11 Example: Consider the matrix game 5 2 A= 1 2 1 4 given by 7 6 2 0 3 3 Then PK (1) = 1, PK (2) = 2, PK (3) = 2, PK (4) = 0, so V∗ = maxj PK (j) = 2. Furthermore: PR (1) = 7, PR (2) = 2, PR (3) = 4, so V ∗ = mini PR (i) = 2. Therefore V∗ = V ∗ = V = 2. A pure maxmin strategy for K is j = 3 since PK (3) = 2 = V , and a pure minmax strategy for R is i = 2 since PR (2) = 2 = V . Proposition (i) PK (j) ≤ aij ≤ PR (i) (i ≤ m, j ≤ n) (ii) PK (j) ≤ V∗ ≤ V ∗ ≤ PR (i) (i ≤ m, j ≤ n) 4 / 11 Proof. PK (j) = mink akj ≤ aij ≤ maxk aik = PR (i). And (ii) follows from (i) by first taking max over j, which gives PK (j) ≤ V∗ ≤ PR (i) and then taking min over i; this gives (ii). A pair (r , s) of strategies (for R and K) is called a saddle point if arj ≤ ars ≤ ais for all i, j so r is the best choice for R when K chooses s, and s is the best choice for K when R chooses r . Note: ars is smallest in its column, and largest in its row. Example: (r , s) = (2, 1) is saddlepoint in 3 5 A= 2 1 In the example on the previous page both (2, 2) and (2, 3) are saddlepoints. Some matrices have a saddlepoint, others do not. 5 / 11 Theorem The game has a value, player R has a pure minmax strategy r and player K has a pure maxmin strategy s if and only if (r , s) is a saddlepoint in A. In that case the value is V = ars . Proof. (i) Assume the game has a value V , player R has a pure minmax strategy r and player K has a pure maxmin strategy s. Then ais ≥ PK (s) = V∗ = V = V ∗ = PR (r ) ≥ arj (i ≤ m, j ≤ n) In particular, for i = r , j = s, we get ars ≥ V ≥ ars , so V = ars , and (again from the inequalities) ais ≥ ars ≥ arj for all i, j, which means that (r , s) is a saddlepoint. (ii) Assume (r , s) is a saddlepoint, so arj ≤ ars ≤ ais for alle i, j Then V∗ = max PK (j) ≥ PK (s) = min ais = ars j i V∗ and similarly = mini PR (i) ≤ PR (r ) = maxj arj = ars , so V∗ ≥ V ∗ . But, by the Proposition, V∗ = V ∗ and the equations imply that V∗ = V ∗ = ars , r is a pure minmax strategy for R and s is a pure maxmin strategy for K. 6 / 11 Randomized strategies I I I I The choice studied above is called a deterministic strategy: choose one row (or column). In Paper-Scissors-Rock no deterministic strategy can always win (if the game is played repeatedly), e.g., if R always chooses Paper, soon K will choose Scissors. May be better to use a randomized strategy: R chooses row i with probability yi , and, independently, K chooses column j with probability xj . So: Pm yi = 1, yi ≥ 0 (i ≤ m) Pi=1 m j=1 xj = 1, xj ≥ 0 (j ≤ n) The Expected payoff from R to K is (recall probability theory!): XX yi aij xj = y T A x i j 7 / 11 Which strategy to use? If player K chooses (randomized) strategy x, then the best choice for player R is to choose y so that y T A x is minimized (since R has to pay this amount). Therefore the best choice for K is to choose an x which is optimal in the problem max min y T A x x y This is called a maxmin strategy. Similarly analysis from player R’s perspective: the best choice for R is a y which is optimal in the problem min max y T A x y x This is called a minmax strategy. In the (simple) Paper-Scissors-Rock game it follows from symmetry that (1/3, 1/3, 1/3) is both a maxmin strategy (for K) and a minmax strategy (for R). 8 / 11 The maxmin problem: strategy for player K Let ei be the ith coordinate vector and e the all ones vector (of suitable size). Note that anPLP with the feasible set being the standard simplex S = {y : i yi = 1, y ≥ O} is easy, so we get: v ∗ = max min y T A x = max min eiT A x x y x i Therefore player K’s strategy problem may be written as the LP problem X max{v : v ≤ eiT A x (i ≤ m), xj = 1, x ≥ O} j n with variables v ∈ IR, x ∈ IR ; or in matrix notation: (LP-K) max s.t. v ve − A x ≤ O eT x = 1 x ≥O Thus: we can find an optimal strategy x for K efficiently by solving this LP. 9 / 11 The minmax problem: strategy for player R Similar analysis for player R: u ∗ = min max y T A x = min max y T A ej y x y j So, player R’s strategy problem becomes the LP problem X min{u : u ≥ y T A ej (j ≤ n), yi = 1, y ≥ O} i with variables u ∈ IR, y ∈ IRm ; which is (LP-R) min s.t. u ue − AT y ≥ O eT y = 1 y ≥O 10 / 11 The minmax theorem Theorem [John von Neumann(1928)] Let x ∗ be an optimal strategy for player K and y ∗ an optimal strategy for player R. Then v ∗ = max (y ∗ )T A x = min y T A x ∗ = u ∗ x y i.e., miny maxx y T A x = maxx miny y T A x. Proof. One can check that problem LP-R is the dual LP of problem LP-K. (Exercise!) So, by the duality theorem of LP the optimal value v ∗ of LP-K equals the optimal value u ∗ of LP-R, and this proves the theorem. I I I The common value v ∗ = u ∗ is called the value of the game: this is the expected payoff when both players play optimally It is also possible to prove the LP duality theorem from von Neumann’s theorem Solve the LP’s above, for some selected A’s, using OPL-CPLEX. 11 / 11