Our most recent product is the card-playing game Euchre, which deals 5 cards per player from which the player needs to determine whether they can make a bid for 3 or 5 tricks. After bidding the gameplay is control by ISMCTS. The bidding needs something different. We experimented with a number of different systems, which are detailed in this article.
We will not fully explain the rules here, which are best followed up in Wikipedia here. However the essential elements are as follows:
1 | 4 Players organised as 2 partnerships each dealt 5 cards. | |
2. | Card deck only includes 9->A with the Jack of the trump suit counted as the highest card, with the Jack of the same colour suit the second highest. Not all cards are dealt to players, with one card exposed as a trump candidate turn-up and 3 cards left in the deck. | |
3. | Player can bid "accept" if they believe their partnership can make 3 tricks using the turn-up as trump, and "Go alone" if they think they can win 5 tricks with the partner sitting out. However that player can still win points even if they only make 3 tricks. | |
4. | If you make your bid you win points. If you lose then the opposing partnership makes points. | |
5. | The game ends when one of the two partnerships reaches a specified points total. |
This glosses over the fact that bidding can have two stages, but it is not necessary to know this to understand this article.
We already have a Spades program and our Euchre inherited the original probability system used by this. This was relatively exotic and used logic to calculate the probabilities that each card might win a trick by simply applying the probability that the opposing partnership would be able to win the trick. This was not exact, as the calculation of whether you can win a trick with a card depends on much complex interplay and whether that card is lead or is played at the end of the trick. The chance of winning is enhanced for a trump card if the hand is void, or near void in any suit.
From this the bid became the sum of the fractional tricks won by each card to give a net probability of the number of tricks that would be won.
e.g. With a card X of 33% probability of winning a trick and card Y probability of 90%, then these together would expect to win 123% tricks, i.e. 1.23 tricks.
This was tuned in block games to determine the best threshold for each possible bid. In this case "Accept" (accepting the turn-up card as trumps) and "Go Alone".
This system was not unsuccessful but depended heavily on the logic of the probabilities being correct. It is very linear and cannot handle the range of complexities that different card distributions might deliver.
We decided to adopt another, more exotic empirical method, as follows:
To address the concern that bidding was entirely theoretical we tried a more empirical approach, as follows.
Each hand was divided into 3 groups, as follows:
1 | Trump group | |
2. | Trump shadow. This is the same colour suit as trump but missing the Jack. If Spades are trumps, then this will be Clubs. | |
3. | The other colour suits. If Spades is trumps, then this will be Hearts plus Diamonds. |
Using the existing probabilities from method (1) the 3 total probabilities were calculated for these 3 separate groups. The game engine was run with all possible bids for a very large number of games to determine the outcome of a large number of such profiles. These were then coded into a Profile ROM table and then, when bidding was needed, the profile of the current hand was matched with all of these to see how similar it was. Table entries that were very similar were weighted high and dissimilar ones weighted low. In practice the net bid value was a combination of all values from this table, but with a very strong skew to very similar profiles.
This seemed a much better idea as it had a more empirical validity. It played better.
However it was still tied to the original probability calculations, which were unproven. We switched to a different way of looking at this.
This was an easy step to make and allowed us to jettison the calculated probabilities (note that this engine is still used to maintain probabilities during play). The alternative was to simply do the 3-part classification above but then determine the probability that each card within that profile could win a trick from bulk testing. This resulted in a highly believable table, as follows:
const TInt KCardWinRate[] ={ // Combined other colour suits /* 0, */ 255, // 0% Nine /* 1, */ 394, // 1% Ten /* 2, */ 645, // 1% Jack /* 3, */ 1456, // 4% Queen /* 4, */ 4397, // 13% King /* 5, */ 15857, // 48% Ace /* 6, */ 0, /* 7, */ 0, // Shadow Trump /* 8, */ 359, // 1% Nine /* 9, */ 508, // 1% Ten /* 10, */ 927, // 2% Queen /* 11, */ 3008, // 9% King /* 12, */ 13127, // 40% Ace /* 13, */ 0, /* 14, */ 0, /* 15, */ 0, // Trump /* 16, */ 13833, // 42% Nine /* 17, */ 13745, // 42% Ten /* 18, */ 13186, // 40% Queen /* 19, */ 14738, // 45% King /* 20, */ 17722, // 54% Ace /* 21, */ 21129, // 64% Jack shadow suit /* 22, */ 32512, // 100% Jack trump /* 23, */ 0 };
Note that the shadow trump group is smaller as it has lost the Jack. What is highly believable is that the Ace of the shadow trump suit is less likely to win a trick than the Ace of the other colour suits, which is what you would expect as there are less shadow suit cards so the opportunity to trump that suit is greater.
The profile used to look up the previous table is calculated by simple creating the 3-part profile using the exact values above. This is obviously fast and has some proven validity. The Profile ROM table of the previous section was then re-determined from scratch with new bulk test runs.
This improved play, but note that each probability above, although proven, has no game context information attached to each card. Note also that matching similar profiles to the one above still has no proven validity: it just seems to be a plausible way to proceed in that it depends on the performance of similar profiles.
There are a lot of maybe's here. This system is relatively exotic but maybe too complex and still unproven.
We decided to take a final step to create an entirely empirical system, as follows:
The idea here is to perform a bulk run, testing all contexts. This stored results in table with the current structure:
// indexed | [wolf] | [lead] | [part] | [#trumps] | [#voids] | [card index] |
const TInt KCardWinRate_full | [2] | [2] | [2] | [6] | [4] | [24] = { |
} | ||||||
where lead=1 if player leads, else 0, and part=1 if partner picked up discard. "wolf" is 1 if the played bid is "Go Alone". |
This is a six dimensional table where the final index corresponds to the classification used in KCardWinrate[] above. It has 4608 entries, dividing the success of each possible card into 192 contexts that the card was played. This is more satisfactory as it factors the context into the success of the card. A Queen of trumps will have a much better chance of securing a trick if the user had 3 or more other trumps, but if it was a singleton trump then its chances are poor.
This made it viable to dispense with the approximate matching used with the Profile ROM table used in the two previous methods. It still has a slight weakness that it does not account what other cards the player has but instead assigns a value for each card linked to the general context.
If we had tried to measure each card in its exact context, then this would need a table of 63 million entries. A huge issue with that would be the extraordinary time needed to fill this. To match the context of all opponent cards as well would be impossible but also irrelevant as the player does not have access to this anyway. Instead it would be necessary to run a limited set of (say) 100 deals for each context and measure the average result. If each such game could be completed in 100mS (which is very fast!), then this would still need 20 years to calculate
The final method adopted is simple and empirical and appears to play pretty well. Early releases needed re-updates as the table is unbalanced in its completeness. i.e. unlikely contexts had few sample entries so it was necessary to interpolate entries where there was a lack of data.
Overall this method is easy to assess as it does not depend on the vagaries
of approximate profiles, so easier to validate. It also plays well.
Jeff Rollason - February 2015