Warpstone Flux: Comments on AlphaGo vs Lee Sedol

In today's post, we're going to go a little bit off the usual fare of Warhammer, painting, and roleplaying games, and talk a bit about the board game of Go. In the world of Artificial Intelligence, the game of Go occupies a high position. This is because unlike other games (Chess, Draughts / Checkers), there are a huge number of potential plays (legal moves) and possible board positions. What does huge mean in this context? Well, in Chess the typical number of legal moves per board position is about 35 and the depth of the game (the sheer number of turns required to complete a game) is about 85. This compares to values of 250 and 150 (respectively) for Go (Silver et al., 2015, Nature, 529, 484).

In turn, this means that it is a practical impossibility to "crack" the game of Go in the same way that Draughts / Checkers has been solved: by exploring every possible board position and finding the optimal strategy through brute force. Chess and Go, although not solved in this manner per se, can be helped by reducing the search tree of both the breadth and depth of the possible moves to those that are more likely to help. This worked spectacularly well for games of Deep Blue against G. Kasparov (Chess Grandmaster at the time).

But Go seemed out of reach. Until the past few days.

Enter Google and the DeepMind team who created AlphaGo. By using both policy networks (move selections) and value networks (is this particular board position / predicted board position any good for me?), coupled with supervised learning and reinforced learning, AlphaGo has now (at the time of writing) beat one of the 9-Dan rank champions of Go twice, Lee Sedol. There are 3 games remaining in the series.

This is absolutely remarkable.

To put it in perspective, there are more possible Go positions than there are atoms in the Universe. Despite really simple rules, Go is a highly complex game that few ever master.

What this means is that to defeat a Go grandmaster, a computer must start thinking like a human. It must discard what it thinks are bad moves and think only about ones that will prove productive for it. In other words, it needs what can be described as "intuition" to play effectively. A gut instinct that a given play will lead to a better outcome. That's the level of AI we're talking about here.

All this said, something else really struck me whilst I was watching the live broadcasts of these games. And that is the way the commentators are talking. They are talking as if the AlphaGo program is a human player. At certain (rare) junctures, they comment that "this move is weak" or "that move is unexpected" about AlphaGo's plays. I didn't think much of these comments at the start (why should I? - they know much better than I do about "good plays" in Go).

Until this morning. When they interviewed one of the programmers.

What was very striking about this interview was how they crafted the program. Rather than taking a purely human approach of building up a sizeable lead, or generating a "comfortable margin" (for lack of better words) in terms of the game's points, all the program seeks to do is win. The nature of the win - by a single point, or by many - is not relevant in the slightest.

Therefore, if it comes across a move that might seem "weak" to a human player, it must be borne in mind that what it has done is optimised the probability of winning. Put another way: it could make a "strong" move and optimise the number of points it wins by but have a lower probability of winning, or it could make a "weak" move to optimise the probability of winning overall. What this means is that the computer is not playing like a human. It plays to win. But it doesn't care if it wins by 1 point, or by 100 points. All that matters is that a win is secured. Therefore, apparently sub-optimal plays (as judged by not building a comfy margin) are actually optimising the winning probability. It is this that I found truly profound.

I then got wondering about 2 things.

(1) Could AlphaGo's human opponent, Lee Sedol, exploit this in some fashion?;

(2) And can we (and do we actually already) apply this principle to other games -- even games that involve elements of chance like Warhammer 40,000?

I'll be watching the next few games with interest and pondering if Lee can come up with some magic to defeat the intuitive program somehow - especially if he finds a way to exploit its own methods against it.

[Image: used as part of Google's press pack download]

Thursday, March 10, 2016

Comments on AlphaGo vs Lee Sedol

No comments:

Post a Comment