|
Many impressive and instructive postings are inaccessible now on GoL Forum because the server keeps them only for a few months. Here, I save some of those postings mainly for my own sake, but if you (particularly, posters) think that I violate copyrights publishing these postings, please let me know. |
| Online Dice | | | Checker Play (Match) | | | Checker Play (Money) | | | Cube Decision (Money) | | | Cube Decision (Match) | | | Cube Handling |
| Index | | | Woolsey | | | ThirdMoves | | | Trice | | | Mec26 | | | Kazaross | | | Post Crawford | | | Take Point | | | Cube | | | Response | | | Hyper | | | Quotes | | | g11 |
|
The standard Match Equity Table (MET) is the Woolsey (or Woolsey-Heinrich) table which has been around for around 10 years now.
Many others exist and have their followers: Jacobs-Trice and Snowie 2.1, for example: Because of algorithms developed and memorization already implemented there is a lot of inertia among individuals (myself included) to just stick with what we already have. But is there a better way? Some questions: 1) How much does the choice of MET matter? 2) Does the choice of MET depend upon the two players? If so, how much? 3) If none of the current MET's is optimal, how does one go about building/choosing the proper one? 4) Should one memorize/formulize tables of doubling points and takepoints instead of or in addition to memorizing an MET/algorithm? 5) What are the practical constraints? (For example, Kleinman argues for 3 significant figure tables, but are there very many who can actually accurately memorize these, and are they worth it?) 6) How big of a table should one memorize/formulize? Sure, ideally you would have a 25-point match table in your head, but that doesn't seem practical, so what is a good cutoff? --Chuck Bower 1) For most scores, there is little difference between the recommended actions of different METs. There are some exceptions, most notably those involving large cubes, positions within 2 points of the end of the game, and 3-away 4-away, at which I think the WH table is wrong. There has been some testing of gnu versus gnu with different match equity tables, producing differences of about 50-100 15 point matches in 100,000, IIRC. The variance reduction system used was good enough that this might be statistically significant, but it simultaneously suggests that it might not matter much if you use the wrong table well. 2) It may depend quite a bit on the playing styles of the players. Some people just don't know how to play for the gammon. If two of those play each other, a small lead means more, and leading Crawford 4-away is a much larger advantage than between bots. Again, though, the main cube decisions affected are when the cube is large or one player is close to victory. 3) I don't think it is worth it for most people. I'm debating whether it is worth it for me! (I'm doing it anyway, but not with as much effort as if I thought it really important.) I start with DMP = 50% mwc, and work backward using real cubeful backgammon play based on the earlier parts of the table, simultaneously tweaking the match play algorithm. This takes a lot of time. Using a bot that would not understand how the cube should be used is little better than using a bad model of backgammon. 4) I think it is much better to memorize or have a feel for cubeful take points and gammon prices, and not doubling points or exact MET entries. For the racing take points on 4-cubes and 8-cubes, I ran a contest to find a formula that best approximates these for Snowie's table, and the winners were close fits and not too complicated although I don't remember them. Huge cubes arise in tournament play, and it is valuable not to blunder with them. I recommend learning a rough approximation so that if you need to work things out for a large cube you can. 5) Three significant figures are essential for calculations done far from the end of the match, or else the roundoff error swamps everything. I don't think they are worth memorizing, and they are harder for most people to work with than whole number percentages. I think the solution is to do the work ahead of time, and remember that at this patch of match scores the racing take point is x% higher than for money. You won't figure this out with much confidence at the table, partly because it helps to interpolate between results at neighboring match scores. 6) That depends on the lengths of matches you play most often. I play a lot of 3, 5, and recently 7-point matches, so I have most of those situations memorized now. Before a serious tournament, I review take points and gammon prices outside the part of the MET I know well. -- Douglas Zare For 100,000 matches between nearly equal opponents, the standard deviation [ sqrt(0.5*0.5*100,000) ] is 158. So I would interpret 50-100 matches as not statistically significant. Or are you saying that a difference of 50-100 wasn't the actual result of the competition, but instead a bot analysis of who should have won? But regardless, if 300 matches is two standard deviations, then with 95% confidence we can say that the choice of MET helps no more than 300/100,000 = 3 parts in 1000 = 0.3%. This assumes you know which MET to use, and that the advantage is this much regardless of the opponent. It looks like when you actually get a position where two different MET's disagree on the proper cube action, the *error* made in choosing the wrong cube decision must be rather small. -- Chuck Bower That would be the standard deviation without any variance reduction. However, the variance reduction used was to play pairs of matches with an identical stream of rolls with the opponents (METs) reversed (rather than hedging). In some matches, there would be no decision that depended on the MET, and these would be split exactly evenly. These would not contribute to the variance. When there was a divergence, the rest of the match might not notice the parallel dice, but the divergence may well mean that one side had a 90% chance to win in one match and an 11% chance to win in the other. Only two wins or two losses would cause a deviation from 50%. By some rough calculations, I figured that the differences were indeed statistically significant. At the same time, they were a statistically significant rejection of the proposition that there was a huge difference in the playing strength from using one of those METs rather than the other. It is possible that using another MET would produce different results, perhaps producing a significant advantage or a significant disadvantage. I think that the latter would result if one blindly used a MET which rounded to the nearest percent. What happens in real life is that people don't consult a MET rounded to the nearest percent to determine the take point on an initial double at 13-away 15-away, but you have to be careful what you tell a bot to do. There is a strange MET proposed by Ortega and Kleinman whose methodology is absurd. They interpolate between Kleinman's table and the WH table, but by uneven amounts. I think the result looks to the naked eye as though it should have the smoothness of a 3-digit table (and it makes the memorization and calculations about as hard), while having only the precision of the 2-digit table. -- Douglas Zare My personal opinions are: 1) How much does the choice of MET matter? It doesn't matter much at all. All of the MET tables will be within 1 or 2 percent of each other for any given score, which is quite adequate to come to any reasonable cube decision. If only we could estimate our winning chances in a position so accurately. 2) Does the choice of MET depend upon the two players? If so, how much? Jake Jacobs has written an excellent book "Can a fish taste twice as good" which goes into this topic. My feeling is that most cube decisions shouldn't be dependent upon perceived relative skill (particularly since many players may have a warped perception about the relative skill of them and their opponent in the first place). Differing skill level is usually relevant only if the cube decision may be the final decision of the match. Then one may make a revised estimate of match-winning chances if the conservative route is taken. However, this doesn't depend on the MET used. 3) If none of the current MET's is optimal, how does one go about building/choosing the proper one? It isn't easy. The difficulty involves assigning proper cube leverage at different match scores. When I developed my table several years ago, I used a combination of empirical results of over 1000 matches, some mathematical analysis, and a lot of judgment. The methodology could hardly be called rigorous, but the results have proven to be quite practical. 4) Should one memorize/formulize tables of doubling points and takepoints instead of or in addition to memorizing an MET/algorithm? I have never done so. The problem with trying to assign doubling points and takepoints is that they may depend upon the potential recube vig, so a purely mathematical approach can lead to some ridiculous results. I find it more meaningful to just use the match equities along with with logic of the position to derive my own doubling points and take points when I need them. 5) What are the practical constraints? (For example, Kleinman argues for 3 significant figure tables, but are there very many who can actually accurately memorize these, and are they worth it?) It definitely isn't worth it. To begin with, any attempt to construct a MET will have some flaws which are greater than 3 significant figure accuracy. Furthermore, even if someone had access to a perfect MET with 3 figure accuracy, the extra accuracy wouldn't be worth anything unless he were able to estimate the equity of the position in question with the same accuracy. Since most of us are quite happy if we can get within 2% on that estimate, fine-tuning the MET to 3 significant digits simply isn't worth anything as a practical matter. 6) How big of a table should one memorize/formulize? Sure, ideally you would have a 25-point match table in your head, but that doesn't seem practical, so what is a good cutoff? For sheer memorization, learning all the equities for a 7-point match will be sufficient for almost all problems you will find in actual play. In addition, there are easy to use formulas which require no memorization at all. I think the best of these is Neil's numbers. Personally, I have memorized my MET only for the scores where the leader is 1 away or 2 away, since the Neil's numbers formula is sometimes inaccurate for those scores. For any other scores I rely on the formula when I need it in actual play, which takes a few seconds to apply and is as accurate as the MET itself. -- Kit Woolsey Thread 2 (Mec 26 Met) What were the results (with confidence intervals) against the other tables, including the recursively computed table? The 50.06% figure against the W-H MET is about 1 elo point (0.75 elo points, if I interpolate correctly, since 50 elo points should be worth 54%), but the confidence interval is pretty wide. Most of the difference might be explained by the 3-away 4-away entry, but perhaps the Mec 26 table was unlucky, and is really 1.5 elo points better than the W-H MET. Or, the difference might be more like 0.3 elo points. (I don't recall the confidence interval.) I would be surprised to see a large difference between Mec 26 and Snowie's table, since all of the entries within the 7-point match are within 0.5%, and many are within 0.1%. My guess is that since Snowie's table disagrees with Mec 26 by perhaps 1/4 as much as the W-H table disagrees with the Mec 26 table, the difference in performance should be between 1/4 and 1/16 of the 0.75 elo points. I would be surprised if 500,000 matches would find a difference. I suspect that memorizing a new MET (or rather, new tables of take points and gammon prices) is not the easiest way to improve my game by a fraction of an elo point. -- Douglas Zare The decision to change tables is entirely up to the player. I do not recommend it, nor dissuade against it. You are 100% correct in pointing out that the MWC difference is minimal, yet all the same, a pro whom I showed this to was very enthusiastic and insisted that even if it shows only 0.12% difference it is significant. On the other hand, one must also note that some scores bear significant differences such as a few that I highlighted in the article. Most players will not be able to make use of even the 2% differences presented (1.9 to be exact), yet some will, and some positions are also know down to the exact percentage where this may be very useful. As to the exact results compared to the Snowie table, I would have to ask Heled who conducted the tests, yet the point is clear: here is a long unappreciated table that is clearly superior to some of the tables around. I cannot say whether the CI is sufficient to show superiority to the Snowie table, and I'll say right out that if you know the Snowie MET, I doubt it would be of any practical use to learn this one instead. I presented the minuscule MWC difference, and even Heled's explanation for it. After that it is up to the player to make a decision. The tests done weren't superficial and did serve to justify the claim that the Mec26 MET may very well be the best one to date, yet whether that will make a big difference in one's game results is doubtful. Still, I'd prefer to have the best information available whenever possible. -- Albert Silver The Woolsey-Heinrich MET was not a serious contender for the best MET. Joseph Heled tested this against other tables to see how much of a difference it would make to use a table assuming too few gammons and only using 2 digits of accuracy. The difference was shockingly small, even against this straw man, although statistically significant. Without other evidence, I would expect that the difference between the Snowie table and the mec26 table is vanishingly small, not at all "clear." Maybe both are clearly better than the Friedman table using a 35% gammon rate, but I haven't seen evidence for that, and no one uses it. I don't think the test found a clear benefit over even the W-H table. That the mec26 table is "unappreciated" may have to do with the very close agreement with Snowie's table. They are so close that I suspect they used essentially the same method, differing perhaps on the treatment of backgammons and the free drop. Snowie has extensive calculations built in, in the theory panel, which probably explains why Snowie users would not use a slightly different table. Since the two tables are so close, the side effect of the test between W-H and mec26 is that Snowie's MET is probably stronger than the W-H MET by about 1 elo point, too. How large of a difference is clear? I think a human player stronger than another by at least 50 elo points is "clearly" better, but I think this difference would not be apparent to opponents of the two in the same backgammon club for years. I don't think 25 elo points is enough to say player A is clearly better than player B. Match equity tables can be distinguished more closely than that, but 1 elo point is very small. I'm unconvinced that there is evidence that Mec26 table is the best available. If you can't cite the statistics, perhaps you misheard something. Joseph Heled said the Snowie MET was bad because it only includes entries for a 15 point match. However, I believe that applied to gnu's implementation. The program, of course, reports match equities for a 65-point match (and perhaps more, but I haven't checked). These entries have not been entered into gnubg, and last I checked, gnu did not use the full precision, either, but only down to tenths of a percent. Since the test Heled uses is complicated, he was unable to calculate the confidence intervals by himself initially. For the mec26 vs. WH run you cite, I believe the confidence interval was later calculated to be 50.06 +/- 0.10% (!?), but there was a second test that extended that to 50.11 +/- 0.07 (0.5 to 2 elo points). Since I would expect the mec26 table to be either better or worse than the Snowie table by not 0.11%, but at most 0.028% (0.3 elo points), it might require closer to 10,000,000 matches to recognize the difference. Then again, maybe the variance reduction would work better. -- Douglas Zare I'm aware the Snowie's table and Mec26 table are based on a 26% gammon rate. It also seems that both of these tables are purely 'theoretical', i.e. no large database of matches was reviewed to see how many matches were won from a particular score, rather you just build every number assuming 26% gammons at all times. The Woolsey table is based on a 22% gammon rate and adjustments were made when Heinrich evaluated a large database of expert matches. So there was some empirical evidence involved to make some adjustments to the numbers (if what I've heard is correct about the process behind the Woolsey-Heinrich table). The 26% gammon rate is what you see when you let Snowie play vs. Snowie a lot, and I assume you can get about the same number when Gnu plays Gnu. My question is: Do you think that real players garner the same number of gammons that the bots do? I've always felt that bots win more gammons than 'people' becuase 1) they blitz better and 2) they don't double when they are too good. I'm not sure if 22% or 26% is the most accurate number for human play, but I have a suspicion that the bot vs bot trials come up with a higher gammon rate than is achieved by real players. The assumed gammon rate is the main difference between Mec26 and Woolsey's MET. -- Gregg Cattanach Cube decision influenced by opponent Would it not be right to say that when making a decision to double, or to play on for a gammon, that it makes a difference who your opponent is? If I was playing in a tournament, and happened to be paired against Kit. I think that I would be a little quicker to double with an advantage. My feeling is that due to the talent of my opponent, I have a lower chance of winning each game from the very beginning. Therefore, I would count one point on the score sheet higher than the chance to play on for a gammon. I would also be quicker to double with a small advantage, figuring that if I played every game for two points, I have more chance of being lucky and winning. -- Morris Pearl It makes a huge difference who your opponent is, particularly in money play. In chouettes I often see successful incorrect doubles that provoke a bad pass, a bad take, or a bad beaver when doubling is an error and I'm confident that the doubler knows it, and does not expect the stronger players in the chouette to misjudge the position. I think it is very common for an opponent's supposed weakness to be an invalid excuse for bad play. Some people will make a blunder to provoke a smaller error some of the time. However, if you pay attention to the risk and reward, and to the tendencies of your opponents, you can use bad doubles and failures to double to provoke far worse errors. A mathematical treatment of this... is insufficient. The problem is that mathematically similar positions in terms of volatility, equity, and gammon rates may differ tremendously in complexity and psychology. There are some positions in which you need only 5% passes to "justify" a double, but "no one" would pass: you can estimate that your opponent has less than a 1% chance of passing, and indeed, you may have no market losers. In other positions, you may need 25% passes, and you will find them. Within classes of positions, the percentage of mistakes necessary is meaningful, but immediate errors need to be weighed against potential future errors. I recently blundered to get an error rate of 5.8 mppm, with 5.0 mppm coming from two doubles, one somewhat bad and one truly hideous. This provoked an error rate of over 30 mppm from a player rated over 1900 on FIBS, mainly from two extremely bad passes. The key was: 1) Timing. I don't mean playable pips. I mean seconds, and fractions of a second. I don't want to elaborate now, but there is good reason that there are elaborate rules regarding timing in bridge, to try to avoid passing illicit information, and I think these rules are insufficient, and must be combined with ethics. Does anyone have a good reference to material written on this for poker? 2) After the first bad pass, the second double was much more reasonable. It aimed for a bad pass, and it didn't give up as much equity as the first bad pass lost. 3) The second position was confusing, and required proper estimation of slimy ways to win, and acceptance of instant death a substantial fraction of the time. Some people can't be sufficiently objective in such positions. Nevertheless, the second double was a mistake. I didn't realize the double was quite so bad when I did it. When you plan to make small errors, you are more likely to make huge errors. -- Douglas Zare I appreciate your insites. On timing, Kit once said that sometimes he will sit and think for a minute or two about an obvious take or pass, just to try to give the impression that it is a closer decision. I assume that this is the kind of thing you are thinking about? What I was actually thinking about with my original query was the match equity table. Say the score is 2-away, 2-away. I am playing against someone who (for whatever reason) I think plays better than I do. Assume that I have a small advantage, and there is some joker with which I could lose my market. Now, the MET (Kit's) starts out: 50%, 70%, however, because my opponent is the world champion, I feel that it should be more like 45%, 55%. In that case it could make sense to turn the cube and play for the match with an advantage, then to wait, and have a small chance of losing my market, and end up with less match equity. -- Morris Pearl If you think your opponent is stronger, and instead of winning 70% from Crawford 2-away, you would win 55%, then your notion of market loser should change. You would lose your market when your winning chances exceed 55%! You really ought to double almost immediately. Ordinarily, you gain when your opponent passes a take, but that would mean that you would get your stronger opponent to pass something you would win less than 55% of the time. It's hard to imagine, as that is a quarter of the advantage you are supposed to need in order to get a pass. In a complicated position, you might realize that you are lost, or that you will not be able to extract enough equity from it. Suppose you have a great backgame with 3 points, and your opponent's position is getting brittle. However, you know that you haven't studied containment positions, and with so many checkers back your offense is imperfect, and you don't think you can get close to the full equity out of the position. If you somehow know that the position is worth 65%, but that you would only win 50%, then you might aim for a bad pass, in order to improve your equity to 55%. I think few human players would want to pass against a backgame before leaving a shot, though. I think it is more important to focus on how to exploit mistakes of weaker players rather than covering up your own mistakes against stronger players. The standard method at 2-away 2-away is to delay the double, risking a market loss, until you get to a confusing position with perhaps a lot of gammon wins but only 60% wins, and your opponent may pass, giving up a huge chunk of equity. Another method at 2-away 2-away would be to pass when you have 35% winning chances in the position (taking into account skill differences), but feel that you would win more than 35% from Crawford 2-away. Note that this is the opposite side of what I described in the first paragraph: The stronger player's take point is higher, so to avoid market losers the weaker player must double even sooner. These issues show up at other scores, but they are less clear there. Walter Trice and Jake Jacobs wrote a book on asymmetric match equity tables, "Can a Fish Taste Twice as Good?" I haven't yet gone through enough details to incorporate their work into my game, though. I just hope I'm getting it right. -- Douglas Zare |