Certification of Tournament Pairing Software

billbrock · January 6, 2011, 2:41pm

artichoke:

billbrock:

artichoke:

billbrock:

… I’ve suggested an extreme acceleration: computer simulations would suggest which variation would be most effective.

Yes I agree that the use of pairing points after the first round is extreme.

But anyway, how would the computer evaluate effectiveness?

Monte Carlo simulations: average ELO spread by round for each quartile?

I’ll agree with the MC idea but could you flesh out the objective? You want the average ELO spread to be small or large to suggest effectiveness? It’s easy to get a small ELO spread next round: give lots of pairing points so that each “score group” is actually a rating group. Then do 1 vs. 2 pairings. But I guess we should be looking instead several rounds ahead after the manipulations have ceased, and look at average ELO spread with top half vs. bottom half pairings? I dunno, just random thoughts off the top of my head, you tell me what you mean.

Effectiveness in the early rounds: average spread of around 200 points, tapering more slowly than in a normal Swiss. Try to do the “normal” round 3 matchups immediately in round 1, and have more of them. If one gets to 1 vs 2 too early, one might not accomplish more than moving rounds 8/9 to round 5/6

Fewer gross mismatches, fewer outliers: success.

artichoke · January 6, 2011, 3:53pm

hmb:

Why, then, aren’t most people as strong as their chess computers by now?

artichoke:

Tim Just:

… There is a theory out there that says the lower rated player can only improve by playing higher rated players (thus the “playing up” phenomenon). If this theory is adhered to then the lower player also wants a maximum difference in the point value between them and their opponent. I don’t know if the theory works or not but I do see a lot of players try to practice it.

I have recently become aware of empirical evidence that this works. One’s rating has some tendency to converge to the average rating of one’s opponents, setting performance-ratings aside. For example if, according to a reasonable rating system, two players gain 100 rating points (or keep the same rating, or lose 100 rating points) , one by stomping lower rated players and the other by surviving against some higher rated players, the second player will later perform better.

The test is vulnerable to selection bias (as so many are): the ambitious improving player, confident in his strength, is likely more eager to “play up” than the guy who doesn’t feel that the Force is with him.

Because they lack the hardware. The human chess IQ is too low to play like a decent computer. Our brains are defective in that they cannot calculate a million positions per second with perfect memory.

But computers must raise the standard of play. I remember Alexei Shirov saying he had to relearn chess to compete with the young computer-trained kids. I’m sure we humans have learned from computers, but our pace of learning will probably slow down as we run into our hardware limits.

artichoke · January 6, 2011, 4:10pm

billbrock:

artichoke:

billbrock:

artichoke:

billbrock:

… I’ve suggested an extreme acceleration: computer simulations would suggest which variation would be most effective.

Yes I agree that the use of pairing points after the first round is extreme.

But anyway, how would the computer evaluate effectiveness?

Monte Carlo simulations: average ELO spread by round for each quartile?

I’ll agree with the MC idea but could you flesh out the objective? You want the average ELO spread to be small or large to suggest effectiveness? It’s easy to get a small ELO spread next round: give lots of pairing points so that each “score group” is actually a rating group. Then do 1 vs. 2 pairings. But I guess we should be looking instead several rounds ahead after the manipulations have ceased, and look at average ELO spread with top half vs. bottom half pairings? I dunno, just random thoughts off the top of my head, you tell me what you mean.

Effectiveness in the early rounds: average spread of around 200 points, tapering more slowly than in a normal Swiss. Try to do the “normal” round 3 matchups immediately in round 1, and have more of them. If one gets to 1 vs 2 too early, one might not accomplish more than moving rounds 8/9 to round 5/6

Fewer gross mismatches, fewer outliers: success.

Gross mismatch: not a very interesting game but protects the higher rated player.
1 vs. 2 pairing: interesting game but does not protect either player.

With a 200 point rating difference you’re splitting the difference: a slightly interesting game but not so interesting that it really puts the higher rated player at much risk. If you want to protect the higher rated, isn’t it simpler to do it the way we do it now with outright gross mismatches? After all, any real interest you put in the game comes at the risk of having the higher rated player draw or lose.

billbrock · January 6, 2011, 4:56pm

If the odds for the higher-rated player at 200 point increment are roughly 70% win, 20% loss, 10% draw (draw % will vary by class), then the average higher-rated player is (very slightly) more likely NOT to have a perfect score after only two rounds of such competition. I’d suggest that’s somewhat more effective than the pairings we currently have in US Open-type events.

RandyShane · January 6, 2011, 6:06pm

The pure Swiss system was not designed to produce a tournament of competitive games. It was designed to produce, as far as possible, a single winner from a possibly large field. With that goal in mind, early perfect scores aren’t a design bug, they are a deliberate feature.

RandyShane · January 6, 2011, 6:14pm

How many big US Open-style events are there anyway? Aren’t nearly all of the big tournaments heavily sectioned?

artichoke · January 6, 2011, 6:50pm

It occurs to me that the typical top half vs. bottom half pairing gives the maximum expected total score to the top half of the score group.

Consider a 4 player scoregroup with ratings of 1400, 1500, 1600 and 1700. Top half vs. bottom half pairings are:

1700 vs. 1500
1600 vs. 1400

where both have fairly high winning probabilities

and if we transposed the bottom (top half vs. reversed bottom half, what I got by maximizing R*E) the pairings would be

1700 vs. 1400
1600 vs. 1500

This makes it almost certain the 1700 will win but gives the 1600 a very tough game. If we care equally about those two (rather than caring more about the 1700) we maximize their total expected score with the first pairings, not with the second.

jwiewel · January 6, 2011, 6:56pm

As long as thread drift has resulted in unusual ideas just being tossed into the ring, how about resurrecting the busyman option idea in the US Open and apply it to all players? Make it a 12-round event with the first three rounds consisting of unplayed byes (keeping the 9-day schedule). 2400+ would get full/full/full for the three byes. 2200-2399 would get full/full/half. Experts would get full/half/half. A would get half/half/half. B would get half/half/zero. C would get half/zero/zero. D and below would get zero/zero/zero.

All class players would compete on an equal basis for their class prizes. The minimal half-point deficit would allow 2200-2399 masters to still have a shot at the overall prize(though it would remain unlikely that they would take it) because the higher percentage of draws at the top rating levels would give them the chance of making up ground.

This method could be used even if a pairing TD was using cards instead of a computer.

The first round becomes almost class pairings with winners being paired in the second round to players that lost two classes up (theoretically a 400-point difference but more likely closer to a 300-point difference as the top of the lower class plays the bottom of the higher class). Instead of bouncing between 2-classes-up/2-classes-down and later 1-class-up/1-class-down, class players in the middle of their class will likely bounce between class/non-class pairings, with players in the larger classes having an even higher percentage of in-class pairings.

nolan · January 7, 2011, 8:42pm

Aside from the Amateur Teams, the US Opens, blitz events at national scholastics and some other scholastic events, it looks like the largest event in the past two years which was mainly a one-section event was the 2010 Northern Virginia Open, with 138 players.

Brian_Mottershead · January 7, 2011, 10:16pm

That isn’t so much the issue. Any section of a tournament which has significantly more than 2^N participants where N is the number of rounds is going to have the kind of problems that are being discussed, especially if the ratings range is such that the top quartile has ratings more than 300 or 400 rating points greater than the bottom quartile. That affects a lot more sections than just single-section tournaments, I suspect.

nolan · January 8, 2011, 1:00am

I suspect that in an Under/2000 section there aren’t going to be as many instances of that as there would be in an open-only event with the same number of players.

jwiewel · January 8, 2011, 3:29am

Brian Mottershead:

nolan:

How many big US Open-style events are there anyway? Aren’t nearly all of the big tournaments heavily sectioned?

Aside from the Amateur Teams, the US Opens, blitz events at national scholastics and some other scholastic events, it looks like the largest event in the past two years which was mainly a one-section event was the 2010 Northern Virginia Open, with 138 players.

That isn’t so much the issue. Any section of a tournament which has significantly more than 2^N participants where N is the number of rounds is going to have the kind of problems that are being discussed, especially if the ratings range is such that the top quartile has ratings more than 300 or 400 rating points greater than the bottom quartile. That affects a lot more sections than just single-section tournaments, I suspect.

Some examples (various open-but-school-grade-limited sections of)
main.uschess.org/assets/msa_joom … 0-10336015
main.uschess.org/assets/msa_joom … 4-10336015
main.uschess.org/assets/msa_joom … 4-10336015
main.uschess.org/assets/msa_joom … 3-10336015
main.uschess.org/assets/msa_joom … 4-10336015
main.uschess.org/assets/msa_joom … 1-10336015

Open tournaments with a reserve section (exceeding 2^(N+1)) include
main.uschess.org/assets/msa_joom … 2-20001492
main.uschess.org/assets/msa_joom … 09012390.2

or accelerated pairing single-section events like
5-round 137-player event (the two perfect scores after four rounds drew)
main.uschess.org/assets/msa_joom … 0307132060
(136 players, 6 rounds, 1 perfect score after 5, 0 after 6)
main.uschess.org/assets/msa_joom … 0207146680
(156 players, 5 rounds, 1 perfect score)
main.uschess.org/assets/msa_joom … 0106241720
(105 players, 5 rounds, 1 perfect score)
main.uschess.org/assets/msa_joom … 9606233080

wintdoan · January 10, 2011, 11:04pm

Brian Mottershead:

I suspect that the vendors who claim to have done it are either exaggerating about being USCF compliant or else they did the USCF’s work for it and simply came up with deterministic algorithms not specified by the rules but which were more or less in the ballpark, and crossed their fingers on “compliance”. They may have adjusted the algorithms over the course of many years based on “feedback” until the complaints about “bad pairings” got down to where it was acceptable. Since the USCF apparently does not have a certification program or any means of testing these programs, they could get away with it. The programs probably just use something like the “Top Down” method, or some similar mechanical process (such as in the FIDE pairing rules) and hope that it looks close enough to what USCF TD’s expect.

The situation is very different with the FIDE pairing rules, by the way. FIDE has several different pairing algorithms for Swiss-system tournaments (Dubov, Dutch, etc). They are all designed for manual implementation with cards, but they are all algorithmically specified, deterministic, and mechanical. With a computer pairing program instead of pairing cards, there are better algorithms (such as those based on graph theory algorithms like weighted perfect matching, or on the Stable Roommates problem). But at least the FIDE rules are algorithmic. The same cannot be said for the USCF pairing rules.

The USCF pairing rules, as presented in the rule book, are just a mish-mash of rules of thumb, guidelines, consider-this, consider-that, and “tips”, and are frustrating as #### for a programmer. They are a bit of a disgrace, actually. Any software engineer who put out something like that as a “spec” would be told to go back to coding. The USCF would be better off just adopting the FIDE pairing rules. Another thing that is never going to happen.

You are greatly mistaken if you think that Dubov is 100% algorithmic. The final line from the rules is

“A situation which cannot be directly resolved by using the given instructions, the referee should proceed wisely and impartially in the spirit of the basic principles outlined above.”

This would not be necessary if the sequence of instructions worked every time. The Dubov rules are closer to being a recipe than USCF’s, but that’s at the cost of what many would consider to be bizarre pairings in later rounds when many pairings are off the table because of duplications. If there are problems with the white vs black matchups, you look for an opponent first for the bottom player due White. There’s an algorithmic choice for that opponent, but it doesn’t take into account whether or not that choice would make a mess of the rest of the score group. In many cases (in late rounds), the reason A can’t play B is that C can’t play D. If you follow the “rules” exactly, you can easily be boxed in with no permissible pairings. That’s where the escape clause comes in; the TD would simply switch things around to get a pairing that works. The only guideline governing that is for the TD to be “wise and impartial.” It’s easy to write an impartial algorithm; wise is a bit harder to justify.

Brian_Mottershead · January 11, 2011, 4:30pm

wintdoan:

Brian Mottershead:

I suspect that the vendors who claim to have done it are either exaggerating about being USCF compliant or else they did the USCF’s work for it and simply came up with deterministic algorithms not specified by the rules but which were more or less in the ballpark, and crossed their fingers on “compliance”. They may have adjusted the algorithms over the course of many years based on “feedback” until the complaints about “bad pairings” got down to where it was acceptable. Since the USCF apparently does not have a certification program or any means of testing these programs, they could get away with it. The programs probably just use something like the “Top Down” method, or some similar mechanical process (such as in the FIDE pairing rules) and hope that it looks close enough to what USCF TD’s expect.

The situation is very different with the FIDE pairing rules, by the way. FIDE has several different pairing algorithms for Swiss-system tournaments (Dubov, Dutch, etc). They are all designed for manual implementation with cards, but they are all algorithmically specified, deterministic, and mechanical. With a computer pairing program instead of pairing cards, there are better algorithms (such as those based on graph theory algorithms like weighted perfect matching, or on the Stable Roommates problem). But at least the FIDE rules are algorithmic. The same cannot be said for the USCF pairing rules.

The USCF pairing rules, as presented in the rule book, are just a mish-mash of rules of thumb, guidelines, consider-this, consider-that, and “tips”, and are frustrating as #### for a programmer. They are a bit of a disgrace, actually. Any software engineer who put out something like that as a “spec” would be told to go back to coding. The USCF would be better off just adopting the FIDE pairing rules. Another thing that is never going to happen.

You are greatly mistaken if you think that Dubov is 100% algorithmic. The final line from the rules is

“A situation which cannot be directly resolved by using the given instructions, the referee should proceed wisely and impartially in the spirit of the basic principles outlined above.”

This would not be necessary if the sequence of instructions worked every time. The Dubov rules are closer to being a recipe than USCF’s, but that’s at the cost of what many would consider to be bizarre pairings in later rounds when many pairings are off the table because of duplications. If there are problems with the white vs black matchups, you look for an opponent first for the bottom player due White. There’s an algorithmic choice for that opponent, but it doesn’t take into account whether or not that choice would make a mess of the rest of the score group. In many cases (in late rounds), the reason A can’t play B is that C can’t play D. If you follow the “rules” exactly, you can easily be boxed in with no permissible pairings. That’s where the escape clause comes in; the TD would simply switch things around to get a pairing that works. The only guideline governing that is for the TD to be “wise and impartial.” It’s easy to write an impartial algorithm; wise is a bit harder to justify.

I stand corrected, although I am not surprised.

In looking at the Dutch system, which is a different FIDE pairing system, and studying some of the steps more closely since I started this thread, I had already noticed that some of the steps are also a little hand-wavy, and the rules for the Dutch system are not 100% algorithmic and deterministic either.

For example, in the case where you have already paired top to bottom and you get to the bottom score group and are unable to pair it because of previous match-ups, the rules tell you to back up to the previous score group, undo those pairings, and “try” to find a different pairing for the penultimate score group that allows you to pair the bottom score group. Nothing is said about how you are supposed to “try” to do that, and what you are supposed to do if you cannot find such a pairing. (I imagine you are supposed to back up one more score group and “try” again with that.) So that is an indeterminism where two different pairings could both be “compliant” by virtue of having “tried” in different ways to bail out of a problem developing in the final score group.

So, your examples from the Dubov system are disappointing, but similar to what I have since learned about the Dutch system.

Suffice to say that the FIDE pairing rules are significantly more deterministic than the USCF’s rules, but there are still fuzzy spots where the TD is expected (and allowed) to just shuffle the cards around and come up with something.

wintdoan · January 12, 2011, 11:53pm

USCF’s rules are closer to giving a “penalty function” for the lack of desirability of pairings, which is how WinTD works. In WinTD, the pairings get a score of zero if top half plays bottom half in each score group, in order, and colors are correct on each board. Minor switches in order are penalized the least, followed by alternation errors then larger switches then equalization errors, top/bottom interchanges fall somewhere in there depending upon preferences, then playing out of score group, finally duplicating a pairing is given a very high penalty, so effectively anything would be done rather than that. It took a lot of “tuning” to get the weights to produce decent pairings in the trickier situations, primarily when you need a lot of players playing out of score group.

I could not, for the life of me, come up with an analogous way of handling Dubov, which is a list of instructions, rather than principles.

Brian_Mottershead · January 13, 2011, 12:22am

wintdoan:

Brian Mottershead:

Suffice to say that the FIDE pairing rules are significantly more deterministic than the USCF’s rules, but there are still fuzzy spots where the TD is expected (and allowed) to just shuffle the cards around and come up with something.

USCF’s rules are closer to giving a “penalty function” for the lack of desirability of pairings, which is how WinTD works. In WinTD, the pairings get a score of zero if top half plays bottom half in each score group, in order, and colors are correct on each board. Minor switches in order are penalized the least, followed by alternation errors then larger switches then equalization errors, top/bottom interchanges fall somewhere in there depending upon preferences, then playing out of score group, finally duplicating a pairing is given a very high penalty, so effectively anything would be done rather than that. It took a lot of “tuning” to get the weights to produce decent pairings in the trickier situations, primarily when you need a lot of players playing out of score group.

I could not, for the life of me, come up with an analogous way of handling Dubov, which is a list of instructions, rather than principles.

Makes sense. What algorithm does WinTD use to search for pairing possibilities to score using the “penalty function”? Are the penalties in effect during that search, or are they used to score the pairings generated by an algorithm that does not reflect the penalties?

seki · January 13, 2011, 2:23am

Hadn’t noticed…

seki · January 13, 2011, 2:32am

Many strong players want to club rabbits as a business practice. Club rabbits rounds 1 to 4, draw round five, split the top money and start home. Max income, min risk.

seki · January 13, 2011, 2:40am

Yes, the trouble with using accelerated pairings for two rounds is that you play round 1 in round 3.

jwiewel · January 13, 2011, 5:12am

Not exactly.

If there are no upsets then the top and bottom eighths are each playing within themselves (so their round 3 feels like a standard round 4) and the middle 3/4 plays something similar to round one pairings except that they are somewhat closer in rating group.

A 32-player 4-round Swiss vs a 4-round accelerated Swiss ends up something like this (assuming the higher rated player wins all games):
players 1&2 play 17&18, 9&10, 5&6, 3&4 vs 9&10, 5&6, 3&4, each other (essentially an extra round)
players 15&16 play 31&32, 7&8, 23&24, 9&10 vs 7&8, 23&24, 27&28, 8&9 (slightly higher opponents)
players 23&24 play 7&8, 31&32, 15&16, 17&18 vs 31&32, 15&16, 11&12, 17&30

So you can see that for most players an accelerated round three is a somewhat less one-sided version of a standard round one, and an accelerated round four has somewhat different breaks than a standard round four

Topic		Replies	Views
Optimal pairings. USCF rules, WinTD, and a program I wrote. Running Chess Tournaments	102	3045	January 23, 2012
Show me some respect Running Chess Tournaments	50	1140	December 20, 2008
Pairing Problem Running Chess Tournaments	89	1843	March 22, 2012
What is the correct pairing? Running Chess Tournaments	50	2886	August 23, 2004
Simple looking pairing question Running Chess Tournaments	31	747	June 22, 2009

Certification of Tournament Pairing Software

Related topics