Writing cpu intensive AI without multi-threading - AI Factory newsletter article Spring 2005

Writing cpu intensive AI without multi-threading:

Authors: Jeff Rollason and Dan Orme

A general problem with processor intensive games is sharing the cpu resource between the UI and game engine. In times past, simple turn-based games solved this by having the program freeze during move calculation, only to return when calculation was complete. Of course some games polled the keyboard to check if the user was signalling for a move to complete, so that it could end the move calculation.

Things have moved on and increasingly games expect much more. Users expect the UI to be fully functional during thinking. On a basic level, they expect the mouse pointer to be active and to be able to activate menus. If the game has a 3D view, then they expect to be able to change the viewpoint. Items like clocks need to be updated when the program is thinking.

The problem behind this is that the relationship between UI and engine becomes more complex. This may have one of the following consequences:

1. The UI and engine are kept on separate execution threads,

2. The engine has embedded support for the UI to provide callback

Neither of these requirements is universally convenient.

Problems with multiple threads:

Having the engine on a separate thread may not be an option on many simple platforms and, where it is available, it increases the complexity of the application. With multi--threading you need to be able to synchronise data access with shared memory, requiring interlocks. If the 2nd thread is competing for the processor, then it may be necessary to modify the priority of the thread to get good control of the processor usage. For example, if the user starts panning the 3D view, it does not want the frame-rate to drop too low. This requires modification of the thread priority, or even suspending the thread.

Problems with embedded support for the UI to provide callback:

This is untidy as the embedded code will be platform specific. If using callback, then the overhead may be significant. This is not what you want if you want clean program sources that can be moved between different platforms without awkward hand-tailoring. As soon as some 3rd party is forced to modify the engine source, then they get out of sync with the original source, and so it gets harder to maintain updates. This then makes it hard for the engine authors to diagnose problems caused when the 3rd party UI developer modifies the engine.

The Ideal World

In this, the engine sources would contain no platform specific code and the UI would be able to fulfil all its processor management requirements using a simple single thread architecture. The advantage of the latter is that this simplifies product development. Many platforms are either difficult to work directly on, or have emulators which do not accurately reproduce the behaviour of multi-threads on the target platform. With a single serial thread, debugging is much easier.

Given a game engine that allowed move calculation to be broken into tiny sections, then it would be possible to quickly and painlessly switch between UI and move calculation tasks. Ideally a move calculate of the following form:

Calculate_Move(); // calculate the entire move in one big chunk

would be replaced by:

while ( Calculate_Move()==FALSE ) { // calc tiny bit of move. Return false if not finished

// calculation not finished

// so we can do something else in the meantime

}

Of course, in a real game application this would be structurally re-arranged (see below).

A solution:

One of the most common culprits for consuming processor resource for move calculation is "tree search". This is commonly the depth-first alpha-beta minimax search used by programs such as Chess, Shogi, Checkers, Othello and similar games. These do not conveniently break up into small parts as the minimax algorithm is expressed as a recursive tree search call. Jumping out requires the stack to be preserved. Also the flow of alpha-beta search does not conveniently fit a regime which frequently exits.

Breaking up Alpha-beta

This is a key area to consider, and so has been a core undertaking of our attempts to provide a true simple single thread engine architecture. To do this requires the unravelling of the recursive nature of alpha beta search. The end product (which is shared by all our game engines) uses the format above, using the generic call Fb_CalculateMove(). This requires the data held by the recursive call to be moved to a data structure, indexed by search depth. When the Fb_CalculateMove() function is called, it evaluates one node and then records the search state and exits. When the function is re-entered, the search state is tested and the appropriate point re-entered in the tree search. The current search depth is recorded and used to index into the current search stack.

This architecture has some small overhead for processing, but this is very small. It also has the added advantage of greatly reducing the stack requirement, as there is no recursive call. This is a critical advantage on many platforms where the available stack is very limited. Of course the algorithmic complexity increases, but this code is written as a generic module, so need only be written once.

Managing the processor share between engine and UI

This has to consider some constraints. The simplest solution is to embed the move calculation in a simple round-robin loop, as follows:

while (TRUE) {

// perform some UI function

...

// get another node result, if calculating

if ( calculating_move ) {

if ( Fb_CalculateMove() ) {

// move calculation complete

calculating_move = FALSE;

}

This is theoretically fine, but actually is probably unbalanced as it does not take into account the time taken to calculate a single node. It is likely that this is much less than the cost of passing through the UI loop, so actually the engine will calculate rather slowly. A solution is to actually run through the Fb_CalculateMove() loop a number of times each time move calculation is attempted. A reasonable figure for chess may be 1000 times for this "inner loop". This is very flexible as it allows the exact balance of the engine and UI to be finely tuned.

Of course there will be times when the UI requires close attention and times that it can afford to let the engine take almost all the processor resource.

Intelligent balancing between engine and the UI

Take this idea one stage further, and the UI can dynamically change the length of the inner loop to Fb_CalculateMove() directly. For example the UI could fix a desired frame rate, and tune the "inner loop" length to achieve this. When the user activates menus or pans the view, the framerate may instantly drop. The UI can immediately drop the loop length, so reserving the processor for the UI. In an extreme case, the loop could reduce to zero, in which case move calculation would actually stop. This dynamic mechanism could be used to make sure that the same product performs well on any platform type (e.g. fast or slow PC), or even when the same machine is sharing with other competing programs, without the UI interface dropping in accessibility.

Conclusion:

This unravelled alpha beta search is used in all our engines and allows these to be easily shipped to any platform, without any inconvenient platform tailoring of the engine.

Of course, this architecture can be used like a conventional recursive alpha beta, as the same looped access to Fb_CalculateMove() can sit on its own thread, if the developer requires. So you have all possibilities available.

Jeff Rollason and Dan Orme 2005