This article, by Cameron and Stephen, looks at "Long Life", a miniature 8x8 version of Conway's Game of Life and explores using the computer's natural architecture based on binary numbers to achieve a fast parallel implementation of this simulation. Each bit of a binary word can be regarded as an array of single bit variables. Given this, many of the scalar bit-wise operators become fast parallel processing instructions.
The enticing property of binary-based machines to achieve limited parallel processing within single machine words has been known and exploited many times. A notable example is Slate and Atkin's World champion program Chess 4.6 from 1977, which exploited every nook and cranny of the CDC 7600, with its then massive 60-bit words. It not only used the standard bitwise AND, OR, NOT, XOR and Shift instructions but also took advantage of the curious properties of instructions such as floating point normalize to convert a bit position into a usable index number, a million miles from its intended use.
This technical article shows how these principles can be applied to this classic simulation to achieve impressive gains over conventional integer-based solutions.
The Game of Life is a cellular automata or solitaire "simulation game" devised by mathematician John Conway in 1970 [1]. The player specifies an initial arrangement of "live" cells in a grid, then observes how this pattern changes over a number of generations as a given set of rules are applied to each state. The standard game is played on a square grid with the following rules:
1. Live cells with two or three live neighbours survive to the next generation,
else they die.
2. Dead cells with exactly three live neighbours become live cells in the
next generation.
The rules for the standard game are described as "B3/S23", meaning that dead cells with 3 live neighbours are Born and live cells with 2 or 3 live neighbours Stay alive.
Figure 1. A glider (left) and its next generation (right).
For example, Figure 1 shows a well-known pattern called the glider composed of five live cells (left), and the resulting pattern after the B3/S23 rules are applied (right). The numbers show how many live neighbours each live cell had in the previous state.
We introduce the term "Long Life" to describe a miniature 8x8 version
of the game, in which each board state is encoded as a single 64-bit
long
integer. For example the glider shown above is encoded as the following hexadecimal
number:
long
state = 0x1C10080000L;// bits describing glider
Each on-bit in this number corresponds to a live cell, as shown in Figure 2.
Figure 2. 64-bit encoding of a glider on the 8x8 grid.
The 8x8 grid wraps around at the edges, so that edge cells count as neighbours for the corresponding cells on opposite edges, making the grid topologically equivalent to the surface of a torus. This allows repeated cycles to occur, such as the glider's period 32 cycle shown below in Figure 3.
Figure 3. The 32-generation cycle of a glider in Long Life.
The glider repeats its shape every four generations. It also moves one step diagonally every four generations, which is described as a having a speed of c/4, hence the glider's cycle period on the 8x8 grid is 8x4 = 32 generations. Note that the glider wraps around the board edges to re-enter on the opposing sides, in toroidal fashion.
This version of the game is also called "Long Life" because its format came about through an investigation into creative search spaces defined by 64 bits, and the desire to seek the longest and most interesting Life cycles that can occur within this space.
We now present two algorithms for applying one generation of the B3/S23 rules to a given 64-bit state; a naïve iterative approach and a much faster bitwise-parallel approach.
The obvious way to calculate the next generation from a given state is to visit each cell, count its neighbours, and apply the S3/B23 rules:
for each row for each column count live neighbours (with wraparound) cell' <= (count==3 or (cell==1 and count==2)) ? 1 : 0
While this is easy to express, its implementation is complicated by the need
to handle neighbours that wrap around board edges, and the fact that cell
values are encoded as bits within an integer. The following function nborCount()
returns the number of live neighbours for the cell at location [row
, col
],
wrapping around board edges as appropriate:
final int
[][]adj
= { {-1,-1}, {-1,0}, {-1,1}, {0,1}, {1,1}, {1,0}, {1,-1}, {0,-1} };int
nborCount(final long
state,final int
row,final int
col) {int
count = 0;for
(int
nbor = 0; nbor < 8; nbor++) {final int
rr = (row +adj
[nbor][0] +dim
) %dim
;final int
cc = (col +adj
[nbor][1] +dim
) %dim
; count += (state >>> (rr *dim
+ cc)) & 1; }return
count; }
It's then just a matter of stepping through each cell, counting its neighbours, and turning on those bits that correspond to cells that satisfy the B3/S23 rules:
final int
dim
= 8;long
step(final long
state) {long
result = 0;for
(int
row = 0; row <dim
; row++)for
(int
col = 0; col <dim
; col++) {final int
count = nborCount(state, row, col);final long
bit = 1L << (row *dim
+ col);final boolean
on = (state & bit) != 0;if
(count == 3 || on && count == 2) result |= bit; }return
result; }
The above algorithm is correct and straightforward, but can be significantly improved by exploiting the single-integer representation, to perform the Life calculation over all cells at once in a bitwise-parallel manner. First, we define a mask for each set of edge cells, as shown in Figure 4:
static final long
LEFT
= 0x8080808080808080L;static final long
RIGHT
= 0x0101010101010101L;static final long
TOP
= 0x00000000000000FFL;static final long
BOTTOM
= 0xFF00000000000000L;static final long
NOT_LEFT
= ~LEFT
;static final long
NOT_RIGHT
= ~RIGHT
;
Figure 4. The four edge cell sets.
Next, we define three variables bit1
, bit2
and bit3
that act as binary registers
to hold the result of the live neighbour count for each cell. This forms a
bit plane adder in which each bit position within the registers accumulates
the count for a particular grid cell:
long
bit1
;long
bit2
;long
bit3
;
It is not necessary to count any higher than three bits, as any sum that flows over into the third bit will represent a count of at least 22 = 4 live neighbours, which means that that cell cannot live in the next generation. For example, Figure 5 shows how the bit values are accumulated for a given cell (cell 22 in this case) to give a binary value of 110 (= 6 in decimal). Cell 22 has six live neighbours, so will not itself live in the next generation.
Figure 5. The count for cell 22 is binary 110, indicating six live neighbours.
The following function add()
adds a neighbour value (0 or 1) to the count for each cell, and stores the result in the bit plane adder defined by bit1
, bit2
and bit3
, carrying values as needed. This is performed over all cells at once in a bitwise-parallel fashion, similar to Niemiec's suggestion of assigning three bits to each cell and applying a parallel adder [2]:
void
add(final long
cXX) {final long
carry1 =bit1
& cXX;final long
carry2 =bit2
& carry1;bit1
^= cXX;bit2
^= carry1;bit3
|= carry2; }
The following function brings this all together by: 1) shifting the eight neighbours into position relative to input state c11
, 2) accumulating neighbour counts in the three bit registers, and 3) applying a simple calculation that enforces the B3/S23 rules over all cells of the result at once:
long
step(final long
c11) {// Shift neighbors into position, with wraparound
final long
c10 = c11 >>> 8 | ((c11 &TOP
) << 56);final long
c12 = c11 << 8 | ((c11 &BOTTOM
) >>> 56);final long
c00 = (c10 &NOT_LEFT
) << 1 | ((c10 &LEFT
) >>> 7);final long
c01 = (c11 &NOT_LEFT
) << 1 | ((c11 &LEFT
) >>> 7);final long
c02 = (c12 &NOT_LEFT
) << 1 | ((c12 &LEFT
) >>> 7);final long
c20 = (c10 &NOT_RIGHT
) >>> 1 | ((c10 &RIGHT
) << 7);final long
c21 = (c11 &NOT_RIGHT
) >>> 1 | ((c11 &RIGHT
) << 7);final long
c22 = (c12 &NOT_RIGHT
) >>> 1 | ((c12 &RIGHT
) << 7);// Reset the bit registers
bit1
= 0;bit2
= 0;bit3
= 0;// Accumulate live neighbor counts
add(c00); add(c01); add(c02); add(c10); add(c12); add(c20); add(c21); add(c22);// Return live cases
return
((c11 |bit1
) &bit2
& ~bit3
); }
The input value c11 represents the unshifted cell values of the current state,
while c00... c22
represent their relative neighbours, shifted as appropriate,
as shown in Figure 6.
Figure 6. A cell and its eight neighbours.
After the eight neighbour values are accumulated in the bit1
, bit2
and bit3
registers, then live cells in the following generation will satisfy one of the following cases:
c11 | bit1 | bit2 | bit3 | |||
0 | 1 | 1 | 0 | // dead with 3 live neighbours |
||
1 | 0 | 1 | 0 | // live with 2 live neighbours |
||
1 | 1 | 1 | 0 | // live with 3 live neighbours |
These cases, and only these cases, are given by:
(c11 |bit1
) &bit2
& ~bit3
The iterative algorithm performs around 7,000 binary operations (ops) per generation - those nested loops do add up! - while the bitwise-parallel algorithm performs 82 ops per generation in the format shown above. A slightly optimised version of this code performs 71 ops per generation.
The iterative algorithm performs around 15,000 generations per second on a single thread of a standard MacBook laptop with i5 processor, while the bitwise-parallel method performs over 1,500,000 generations per second. This approximately 100 times speed-up is important when searching for interesting cyclic patterns, as such patterns are few and far between, and many generations must be calculated to test each one.
In terms of computational complexity, the iterative approach is ostensibly polynomial time due to its nested loops while the bitwise-parallel approach is constant time. However, this analysis should be taken with a grain of salt, as the data sizes involved are small and are in fact constant - the grid will always be 8x8 and the each cell will always have eight neighbours - so the performance comparisons given above are more meaningful.
The fast bitwise-parallel algorithm was devised by the second author, based on his earlier Fast Life algorithm for the Life Flow iPad app [3]. The Fast Life algorithm uses the triple-bit register approach to process entire rows at once in a bitwise-parallel manner, and applies to grids larger than 8x8. The Long Life algorithm shown here is the next step in efficiency, as it exploits the single integer representation to process the entire grid at once rather than row-by-row.
Other fast bitwise approaches for the Game of Life include those by Niemiec [2], Pepicelli [4], Hansel [5], Finch [6] and Cherniavsky [7], although none of these describe a method tailored to a single data type that processes the entire grid at once with wraparound. Our approach has potential application to an analogue implementation of the game for an 8x8 LED matrix, which has recently become a popular exercise for electronics students and hobbyists [8].
Complete Java code for the Long Life algorithm described in this article
can be found at:
http://www.cameronius.com/research/LongLife.java
This article shows how the representation of a problem can have a profound effect on its implementation. Choosing an appropriate representation, i.e. 64-bits encoded as a single long
integer, allows an algorithmic improvement that performs around 100 times fewer operations than an iterative approach to yield a 100 times speed-up, a greater improvement than is typically achieved with the incremental optimisation of code. This demonstrates the power of lateral thinking - literally in this case, as the bit plane adders that count live neighbours are orthogonal to the data types that they are adding.
[1] Gardner, M. (1970) "The fantastic combinations of John Conway's new solitaire game 'life'", Scientific American, 223, October, 120-123.
[2] Niemiec, M. (1979) "Life Algorithms", BYTE, January, 90-97.
[3] Browne, C. (2010) Life Flow, iPad app, https://itunes.apple.com/us/app/life-flow-for-ipad/id365729200?mt=8.
[4] Pepicelli, G. (2005) "Bitwise Optimization in Java: Bitfields, Bitboards, and Beyond", http://onjava.com/pub/a/onjava/2005/02/02/bitsets.html.
[5] Hensel, A. (1999) "About my Conway's Game of Life Applet", http://www.ibiblio.org/lifepatterns/lifeapplet.html.
[6] Finch, T. (2003) "An Implementation of Conway's Game of Life", http://dotat.at/prog/life/life.html.
[7] Cherniavsky, B. (2003) "BitLife - A Bitwise Stack of Life Games", http://bitlife.sourceforge.net.
[8] Ian (2010) "Arduino Game of Life on 8x8 LED Matrix", http://brownsofa.org/blog/archives/170.
// Edge cell masks
static final long
LEFT
= 0x8080808080808080L;static final long
RIGHT
= 0x0101010101010101L;static final long
TOP
= 0x00000000000000FFL;static final long
BOTTOM
= 0xFF00000000000000L;static final long
NOT_LEFT
= ~LEFT
;static final long
NOT_RIGHT
= ~RIGHT
;// Registers for bit plane adder
long
bit1
;long
bit2
;long
bit3
;// Add the count for neighbour cXX to the bit registers
void
add(final long
cXX) {final long
carry1 =bit1
& cXX;final long
carry2 =bit2
& carry1;bit1
^= cXX;bit2
^= carry1;bit3
|= carry2; }// Perform one step of the B3/S23 rules to state c11
long
step(final long
c11) {// Shift neighbors into position, with wraparound
final long
c10 = c11 >>> 8 | ((c11 &TOP
) << 56);final long
c12 = c11 << 8 | ((c11 &BOTTOM
) >>> 56);final long
c00 = (c10 &NOT_LEFT
) << 1 | ((c10 &LEFT
) >>> 7);final long
c01 = (c11 &NOT_LEFT
) << 1 | ((c11 &LEFT
) >>> 7);final long
c02 = (c12 &NOT_LEFT
) << 1 | ((c12 &LEFT
) >>> 7);final long
c20 = (c10 &NOT_RIGHT
) >>> 1 | ((c10 &RIGHT
) << 7);final long
c21 = (c11 &NOT_RIGHT
) >>> 1 | ((c11 &RIGHT
) << 7);final long
c22 = (c12 &NOT_RIGHT
) >>> 1 | ((c12 &RIGHT
) << 7);// Reset the bit registers
bit1
= 0;bit2
= 0;bit3
= 0;// Accumulate live neighbor counts
add(c00); add(c01); add(c02); add(c10); add(c12); add(c20); add(c21); add(c22);// Return live cases
return
((c11 |bit1
) &bit2
& ~bit3
); }
Cameron Browne and Stephen Tavener - Imperial College London - March 2013