This article is on a closely linked theme to the partner article in this issue. Developing a game playing program, with AI, seems to have boundless potential pitfalls. This topic has already had a treatment in "Lessons from the road - Championship level AI development" referenced in the other article, but here we take this further to identify more issues and solutions, in particular, AI outside linear evaluation.
Contrary to expectation, "improving" AI offers so many different ways to get it wrong. From "Lessons from the road - Championship level AI development" a particular insidious pitfall is the looped illusion of progress from testing the current version with the previous one. "Progress" can simply be an illusion.
In this the new version finds some means of exploiting the weakness of the previous version giving the impression that it is stronger, but leaving the underlying weakness intact. This can progress through generations, incrementally further exploiting such weaknesses and each time it would beat the previous version. This compounds the illusion of progress, leading to an expectation that the mutated final version is much stronger than the first version.
However from direct experience and also seeing a colleague suffer from this, which in particular saw their version get better in steps over a period of a year, only to find that a re-test of the version from a year earlier exposed that not only was the new version not much stronger, but was actually significantly weaker. This was a sobering revelation, dispiritingly exposed just before a World Championship.
At such bleak times it is not hard to imagine that some malevolent spirit is looking over your shoulder, messing around with your sanity.
However this, with a wide panoply of malaises designed to make development as accident prone as it can get, is just the start. In all the previously cited examples the underlying principle was that this all depended on linear evaluations, which deliver some pueudo-exact value to a position.
Having such a value massively helps the developer track down problems, even if they are deeply buried in some tree, as they at least offer some exact explanation for some failed evaluation.
However issues ratchet up if there is no such linear evaluation, as follows:
AI Factory in recent years has been exploiting Monte Carlo Tree Search (MCTS), which grows trees that steer the search towards lines of play that are more successful. Unlike minimax with linear evaluation, the choice of move is not from any evaluation value, but from the number of votes cast for that move. Put another way, the move selected is the move most often explored.
This is a nebulous means of assessing the value of a move and, unlike minimax, a decision cannot be finally pinned down to exactly one single branch that delivered the key refutation or winning line, with an exact evaluation that ripples down to the root. Instead it is the combination of many branches. This still allows a tool to be created to show where the searches went in a tree so that, for a game like chess, the developer can still follow which moves succeeded and which failed. We have had to apply this to our Gomoku, where we can export such a tree and explore it after the search is complete.
Such a search is dependent on win/loss success and is hard to mix that with methods that use any evaluation. If the actual result were dependant on a returned linear evaluation, then with minimax a tweaked or added linear term could exactly completely switch the choice of move when the required threshold is reached. With MCTS any tweak that can be applied will simply just increase the chance of the desired move being selected. This whole process is an annoyingly indirect way of achieving the final desired goal. This is reminiscent of the Chinese maxim "Beat the Grass to Startle the snakes", attributed to Wang Lu during the Tang dynasty (listed among the Chinese essay "36 Stratagems" for use in politics and War). i.e. you do not apply a modification at some core of the issue, but apply methods that achieve the result indirectly.
However AI Factory has also been deep into ISMCTS "Information Set Monte Carlo Tree Search (see "Reducing the burden of knowledge: Simulation-based methods in imperfect information games" ), which adds a further obfuscating layer to this, as follows:
At least in MCTS and minimax, the developer could export a given tree after the search that could be reasonably analysed and the reason for some move tracked down.
However with imperfect information games and ISMCTS it feels like maybe you could do the same, but you cannot. The reason is that each branch of the tree starts with a different "guessed" game position. Analysing imperfect information games (such as Spades and Bridge) could still apply some prescriptive calculation to determine a move, but common experience has shown that it is better to make multiple informed guesses of what the situation is and analyse the position as-if the imperfect information was exposed. The final move is then selected by another vote which determines which play was most successful over multiple guessed starting positions. This is the basis of ISMCTS.
However this is real fog to analyse. The final search tree that was explored is paradoxically full of illegal moves, linked to guessed positions found during the search. For all intents and purposes any chosen move in the tree bears no exact relevance to any actual exact position. This is disquieting. Therefore analysis depends on following single branches for one "guessed" position (a "determinisation" in the previous article). You may see some incorrect line, but to examine it requires debugging tracing turned on when that position is repeated (we can do this), but actually the ideal idea is to extract out that exact variation with the particular guessed game position and analyse that for a full independent search (which we currently cannot do).
This messes up with your mind as you follow each new branch of the ISMCTS search as the analysed position changes each time. For each branch you have to reset your expectation of the position as each time it morphs into a new version of the same position. This is taxing to analyse and throws away the simple premise of MCTS and minimax with perfect information games.
We have a rich set of tools to spike when some event occurs, allowing the developer to switch on nice colour-coded diagnostics to expose the reasons for decisions. This is helpful and generic, so is always available. We can also examine the tree searched, but again this is for a merged set of all positions examined, so its interpretation is unclear.
To further add complexity the value returned through the ISMCTS tree is not necessarily win/lose but a bounded value linked to a real linear evaluation. This does not ripple down directly to the root but simply impacts the choice of moves at each parent node.
The end product of all this is a vote on thousands of best moves from similar positions. This is far removed from the explicitly linked and rippled down linear evaluation of minimax.
All this is hard work and so I'm sure for many it is widely tempting to instead follow the alternative and dubious method of progressing, to treat the whole system simply as a black box (even for minimax) and just add and fiddle with the controls to see what move pops out. This seductive practice is hard to resist and I'm sure commonly adopted.
This actually winds back to the conclusion of the sister article "Keeping on the Tightrope" in that the path to success is to create easy ways to determine the validity and integrity of the moves chosen. This was already hard with minimax and is made much harder with ISMCTS. It requires more creative and effective tools to allow the developer to see inside these complex black boxes.
Key to success is getting these tools right. They have to be easy and natural to apply. As soon as they become cumbersome or difficult then they will be neglected, inviting failure.
This is not the end of this particular theme. There is still more to say about analysing not just individual games but also sets of games. The key is to improve visibility of what your program is actually doing
This is a work-in-progress for AI Factory.
Jeff Rollason - October 2015