Certification of Tournament Pairing Software

billbrock · January 5, 2011, 3:29pm

At the risk of thread drift, here’s an idea that might improve pairings in US Opens. It would be a royal pain with pairing cards, but a snap with software.

Thesis: when one is playing one game per day, mismatches (gaps between players in excess of 350 points) are generally not desirable. Competitive games (difference under 150 points) are generally desirable.

Suppose the traditional section (one game/day) of the US Open has 300 players, and the merge occurs at round six. Competitiveness might be maximized by breaking the field up into SIX accelerated pairing groups: 1 plays 26, 2 plays 27, …51 plays 76,… 275 plays 300. The software would accomplish this by assigning (say) 2.5 pairing points to players 1-50, 2.0 points to players 51-100…0.5 points to players 201-250, and 0 points to players 251-300.

In traditional accelerated paring systems, the pairing points are on in early rounds, then switched off. Instead, I’d suggest letting the pairing points decay. (The first round after acceleration is switched off too often feels like a delayed round one.) So in a nine-round event, the accelerated points added to Group 1 for pairing purposes would decay from 2.5>2.0>1.5>1.0>0.5>0. Group 3 would decay from 1.5>1.2>0.9>0.6>0.3>0, and so on. The pairing program would privilege equalization of colors over differences in pairing score less than 0.5, but would otherwise try to pair the group of 300 players with the best fit minimizing dropdowns to next score group. (Edit: or, more simply, a score group might be redefined as 3.01-3.50, 2.51-3.00, etc., and the pairing points be used to order within the score group for pairing purposes.)

There’s probably a more sophisticated and/or straightforward way to accomplish this general purpose.

Brian_Mottershead · January 5, 2011, 6:51pm

Many players may prefer that their games be competitive and that they meet players with ratings comparable to their own. However, the people who are trying to win the prizes prefer that the rating differences be maximized – with the differences being in their favor and the points coming easily.

The Swiss system sides with the people who want big ratings differences. The whole point of ratings-based Swiss tournaments is to maximize rating differences within a score group so that the lower-rated players are left behind in the early rounds, the middle-rated players are left behind in the middle rounds, and the competitive games happen in the final rounds. The Swiss system is not trying to give you competitive games. If you want that, join a chess club, or crank Fritz down until you can beat it occasionally.

But don’t expect competitive games from the Swiss System. It is trying to have the lower rated players stomped by the higher rated players as quickly as possible and to keep the close games to the end. That means large early-round rating differences with narrowing of rating differences only in the late rounds. The whole point of “top-half versus bottom-half” pairing in ratings-based Swiss tournaments is to give the highest-rated players smooth sailing until the final rounds, with everybody else getting knocked around, and exciting competition between the top-rated players only at the end, for the prizes.

jwiewel · January 5, 2011, 7:28pm

Bill, there are also players who go to tournaments wanting to play tougher opponents. Such players would consider a 350 point rating difference to be just fine (or maybe the minimum to aim for since going even 1-8 against players averaging 400 points higher would likely result in gaining rating points). Some of those players consider themselves (possibly correctly) to be underrated, and look at difficult pairings as a way of more quickly increasing their rating (significant for juniors trying for the top 20 or top 100 lists or trying to qualify for assistance for lessons or those trying for invitations to closed tournaments).

A lot of people would like closer pairings (often including me), but that is not the same as saying that a majority would (it would need to be an overwhelming majority to justify changing the US Open format, but it could be tested in a different event).

timjust · January 5, 2011, 7:29pm

Brian Mottershead:

billbrock:

Thesis: when one is playing one game per day, mismatches (gaps between players in excess of 350 points) are generally not desirable. Competitive games (difference under 150 points) are generally desirable.

Many players may prefer that their games be competitive and that they meet players with ratings comparable to their own. However, the people who are trying to win the prizes prefer that the rating differences be maximized – with the differences being in their favor and the points coming easily.

The Swiss system sides with the people who want big ratings differences. The whole point of ratings-based Swiss tournaments is to maximize rating differences within a score group so that the lower-rated players are left behind in the early rounds, the middle-rated players are left behind in the middle rounds, and the competitive games happen in the final rounds. The Swiss system is not trying to give you competitive games. If you want that, join a chess club, or crank Fritz down until you can beat it occasionally.

But don’t expect competitive games from the Swiss System. It is trying to have the lower rated players stomped by the higher rated players as quickly as possible and to keep the close games to the end. That means large early-round rating differences with narrowing of rating differences only in the late rounds. The whole point of “top-half versus bottom-half” pairing in ratings-based Swiss tournaments is to give the highest-rated players smooth sailing until the final rounds, with everybody else getting knocked around, with exciting competition between the top-rated players only at the end, for the prizes.

There is a theory out there that says the lower rated player can only improve by playing higher rated players (thus the “playing up” phenomenon). If this theory is adhered to then the lower player also wants a maximum difference in the point value between them and their opponent. I don’t know if the theory works or not but I do see a lot of players try to practice it.

billbrock · January 5, 2011, 7:48pm

jwiewel:

Bill, there are also players who go to tournaments wanting to play tougher opponents. Such players would consider a 350 point rating difference to be just fine (or maybe the minimum to aim for since going even 1-8 against players averaging 400 points higher would likely result in gaining rating points). Some of those players consider themselves (possibly correctly) to be underrated, and look at difficult pairings as a way of more quickly increasing their rating (significant for juniors trying for the top 20 or top 100 lists or trying to qualify for assistance for lessons or those trying for invitations to closed tournaments).

A lot of people would like closer pairings (often including me), but that is not the same as saying that a majority would (it would need to be an overwhelming majority to justify changing the US Open format, but it could be tested in a different event).

The weakies would still get squashed, and an underrated player would still get paired up routinely within this (or similar) accelerated systems, but the games would be more competitive. There would still be 350-point differential games, but the lower-rated player would have already “earned” the pairing. (When acceleration ends, the improving young player will have a higher raw score than under non-accelerated systems and will get tough pairings in rounds 6-9.)

And the event would be harder-fought and MORE FUN for all players: fewer 8-1 scores, but also fewer 1-8 scores.

I’ve suggested an extreme acceleration: computer simulations would suggest which variation would be most effective.

billbrock · January 5, 2011, 7:56pm

Brian Mottershead:

billbrock:

Thesis: when one is playing one game per day, mismatches (gaps between players in excess of 350 points) are generally not desirable. Competitive games (difference under 150 points) are generally desirable.

Many players may prefer that their games be competitive and that they meet players with ratings comparable to their own. However, the people who are trying to win the prizes prefer that the rating differences be maximized – with the differences being in their favor and the points coming easily.

The Swiss system sides with the people who want big ratings differences. The whole point of ratings-based Swiss tournaments is to maximize rating differences within a score group so that the lower-rated players are left behind in the early rounds, the middle-rated players are left behind in the middle rounds, and the competitive games happen in the final rounds. The Swiss system is not trying to give you competitive games. If you want that, join a chess club, or crank Fritz down until you can beat it occasionally.

But don’t expect competitive games from the Swiss System. It is trying to have the lower rated players stomped by the higher rated players as quickly as possible and to keep the close games to the end. That means large early-round rating differences with narrowing of rating differences only in the late rounds. The whole point of “top-half versus bottom-half” pairing in ratings-based Swiss tournaments is to give the highest-rated players smooth sailing until the final rounds, with everybody else getting knocked around, and exciting competition between the top-rated players only at the end, for the prizes.

The stomping is actually more effective (and the information gleaned from the result more meaningful) when the gap between players is smaller. A 200-point differential is what, an expected 25% score? The odds of the lower-rated player scoring 2-0 in two games against higher-rated players are (ceteris paribus) much less than 0.25^2 because of the possibility of draws.

Brian_Mottershead · January 5, 2011, 8:04pm

billbrock:

Brian Mottershead:

billbrock:

Thesis: when one is playing one game per day, mismatches (gaps between players in excess of 350 points) are generally not desirable. Competitive games (difference under 150 points) are generally desirable.

Many players may prefer that their games be competitive and that they meet players with ratings comparable to their own. However, the people who are trying to win the prizes prefer that the rating differences be maximized – with the differences being in their favor and the points coming easily.

The Swiss system sides with the people who want big ratings differences. The whole point of ratings-based Swiss tournaments is to maximize rating differences within a score group so that the lower-rated players are left behind in the early rounds, the middle-rated players are left behind in the middle rounds, and the competitive games happen in the final rounds. The Swiss system is not trying to give you competitive games. If you want that, join a chess club, or crank Fritz down until you can beat it occasionally.

But don’t expect competitive games from the Swiss System. It is trying to have the lower rated players stomped by the higher rated players as quickly as possible and to keep the close games to the end. That means large early-round rating differences with narrowing of rating differences only in the late rounds. The whole point of “top-half versus bottom-half” pairing in ratings-based Swiss tournaments is to give the highest-rated players smooth sailing until the final rounds, with everybody else getting knocked around, and exciting competition between the top-rated players only at the end, for the prizes.

The stomping is actually more effective (and the information gleaned from the result more meaningful) when the gap between players is smaller. A 200-point differential is what, an expected 25% score? The odds of the lower-rated player scoring 2-0 in two games against higher-rated players are (ceteris paribus) much less than 0.25^2 because of the possibility of draws.

I understand the theory of the accelerated Swiss. With one round of acceleration, in the first round the lowest quartile gets stomped by quartile 3, instead of by quartile 2. Quartile 4 gets stomped by Quartile 2 in the second round, instead of the first. Quartile 2 gets stomped by Quartile 1 in the first round instead of the second. Quartile 3 gets stomped by Quartile 1 in the second round instead of in the first round. All in all, more or less the same amount of stomping – only in different rounds and maybe a little less blood on the floor. Basically ditto with two rounds of acceleration, only substitute octiles for quartiles, and it takes more rounds for things to sort out. Perhaps the acceleration gives you a slightly more information before the end about the actual rankings within the lower score groups, but this is academic. There isn’t much signal from an accelerated Swiss tournament with a wide rating range, except in the results from the last few rounds, as in a regular Swiss. This is especially true on the tails of the rating distribution in the tournament.

A few more upsets at the beginning occasionally interferes with the right-and-proper purpose of Swisses, which is to ensure that the best players don’t expend too much effort before the Grandmaster draws at the end, where the prize money gets divided up. However, the differences from a regular Swiss are small by the end.

Accelerated pairing doesn’t really achieve what you are aiming for. For that, you need 1 versus 2 pairing, or maybe random pairing, within score groups, instead of top half versus bottom half. But these mean that the high-rated players can knock one another out near the beginning, with no big build-up to the last round as in Swisses (forgetting about those darn Grandmaster draws) . Of course, that could be solved by also dropping the no-replays rule and having the leaders meet in the last round even if they have already played. But now we aren’t talking a Swiss any more, and are getting a bit too radical for the good ole’ USCF.

billbrock · January 5, 2011, 8:44pm

Brian Mottershead:

[…]

I understand the theory of the accelerated Swiss. With one round of acceleration, in the first round the lowest quartile gets stomped by quartile 3, instead of by quartile 2. Quartile 4 gets stomped by Quartile 2 in the second round, instead of the first. Quartile 2 gets stomped by Quartile 1 in the first round instead of the second. Quartile 3 gets stomped by Quartile 1 in the second round instead of in the first round. All in all, more or less the same amount of stomping – only in different rounds and maybe a little less blood on the floor. Basically ditto with two rounds of acceleration, only substitute octiles for quartiles, and it takes more rounds for things to sort out.

A few more upsets at the beginning occasionally interferes with the right-and-proper purpose of Swisses, which is to ensure that the best players don’t expend too much effort before the Grandmaster draws at the end, where the prize money gets divided up. However, the differences from a regular Swiss are small by the end.

[…]

Arguments from anecdote are always suspect, but I remember one US Open in which I had a perfect 5-0 against lower-rated players and a miserable 0.5-3.5 against higher-rated players. And perhaps only four of those nine games were truly competitive.

I agree that quartiles to octiles is not that exciting: at best, it adds one round of precision to the top finishers, and probably muddies the results of the class players. But starting the event as a quasi-class event that becomes more and more like an open every round, and truly an open in the final rounds, may be a worthwhile idea.

artichoke · January 5, 2011, 8:51pm

I have recently become aware of empirical evidence that this works. One’s rating has some tendency to converge to the average rating of one’s opponents, setting performance-ratings aside. For example if, according to a reasonable rating system, two players gain 100 rating points (or keep the same rating, or lose 100 rating points) , one by stomping lower rated players and the other by surviving against some higher rated players, the second player will later perform better.

The test is vulnerable to selection bias (as so many are): the ambitious improving player, confident in his strength, is likely more eager to “play up” than the guy who doesn’t feel that the Force is with him.

artichoke · January 5, 2011, 9:20pm

This makes sense to me. It’s the idea that emerges from maximizing the sum of R * E in the early rounds (so that the expected results of the highest rated players are maximized, protecting them til the later rounds) but then minimizing that same sum or minimizing the sum of absolute rating differences within a score group in the later rounds (which removes that protection and tends to give 1 vs. 2 pairings.)

Something less blunt than “pairing points” is needed, because score should have a strong priority over rating. I can see using them in the first round when one cannot make a “wrong” pairing based on score groups anyway, but after that it seems gimmicky to me, unless they are used exclusively within score groups.

artichoke · January 5, 2011, 9:28pm

Yes I agree that the use of pairing points after the first round is extreme.

But anyway, how would the computer evaluate effectiveness?

Brian_Mottershead · January 5, 2011, 9:51pm

artichoke:

billbrock:

… starting the event as a quasi-class event that becomes more and more like an open every round, and truly an open in the final rounds, may be a worthwhile idea.

This makes sense to me. It’s the idea that emerges from maximizing the sum of R * E in the early rounds (so that the expected results of the highest rated players are maximized, protecting them til the later rounds) but then minimizing that same sum in the late rounds or minimizing rating differences within a score group in the later rounds (which removes that protection and tends to give 1 vs. 2 pairings.)

Something less blunt than “pairing points” is needed, because score should have a strong priority over rating. I can see using them in the first round when one cannot make a “wrong” pairing based on score groups anyway, but after that it seems gimmicky to me, unless they are used exclusively within score groups.

The main problem that you are trying to solve is that these events often have rating distributions that are too wide, and there are too few rounds for too many players.

What we would really like is for the N rounds of the tournament to be between the (2^(N-2)) or so players at the top who have a realistic chance of winning the tournament, with perhaps an opportunity for some of the (2^(N-1)) players who have an outside chance to break into the top circle. However, several precious rounds must be used just for “top” players to emerge from the group simul with which every large Swiss is obliged to start. It often happens that the players who had any chance of winning the tournament meet only in the last two rounds, having spent the rest of the time playing lower-rated players who were never in any way in contention for a prize. A random draw against an unexpectedly good, but lower-rated player, who nevertheless was never in contention, can seriously impact a top player’s chances of emerging from the simul stage.

It is all a waste of rounds and time because for the most part, the rating system tells us who the top players are, more or less, going into the tournament. Of course, the simul at the beginning can be justified (and necessitated) by the fact that the prizes are paid courtesy of the lower-rated players in the tournament; so they deserve at least some entertainment, even though they cannot win. (And a little further gambling action can be added for those players through “class prizes”, in case getting stomped during the simul phase is not appealing.)

I am not sure that the problems you are discussing can be solved simply by tuning weights in a more sophisticated pairing system. The current situation with Swiss tournaments is more the product of how tournaments are organized, prizes funded, and the insistence of the top players on playing for money, which has to come from somewhere.

timjust · January 5, 2011, 10:12pm

Ahhh Grasshopper, the light begins to shine!?

hmb · January 5, 2011, 10:35pm

Why, then, aren’t most people as strong as their chess computers by now?

artichoke:

Tim Just:

… There is a theory out there that says the lower rated player can only improve by playing higher rated players (thus the “playing up” phenomenon). If this theory is adhered to then the lower player also wants a maximum difference in the point value between them and their opponent. I don’t know if the theory works or not but I do see a lot of players try to practice it.

I have recently become aware of empirical evidence that this works. One’s rating has some tendency to converge to the average rating of one’s opponents, setting performance-ratings aside. For example if, according to a reasonable rating system, two players gain 100 rating points (or keep the same rating, or lose 100 rating points) , one by stomping lower rated players and the other by surviving against some higher rated players, the second player will later perform better.

The test is vulnerable to selection bias (as so many are): the ambitious improving player, confident in his strength, is likely more eager to “play up” than the guy who doesn’t feel that the Force is with him.

nolan · January 5, 2011, 10:51pm

I’ve been driving a car for over 40 years, why can’t I run 55 miles/hour?

jwiewel · January 5, 2011, 11:24pm

That is not a completely accurate summation of +1 acceleration.
Round 1: Q1 stomps Q2 and Q3 stomps Q4 (in a regular swiss these happen in round 2) (this matches Brian’s summation)
Round 2: Q2 stomps Q3, Q1 fights itself, Q4 fights itself (in a regular swiss these happen in round 3)

Thus after round 2:
Q1 has faced Q2 and itself instead of Q3 and Q2;
Q2 has faced Q1 and Q3 instead of Q1 and Q4;
Q3 has faced Q2 and Q4 instead of Q1 and Q4;
Q4 has faced Q3 and itself instead of Q2 and Q3.

Round 3: Q1 and Q4 are narrower at the opposite ends of the perfect scores and there is a broader group at 1-1 to be paired as a normal swiss. This ends up being pretty much a round one that is paired excluding the top and bottom eighths. Because of that broader middle, it takes longer to shake things out so that people in the later rounds are playing closer to their level. Thus the middle trades the longer up/down of a swiss for the closer games at the beginning. The two ends narrow more quickly so that it looks almost like an extra round for them. Since the goal of acceleration is to reduce perfect scores (NOT to get closer pairings every round) that works out fine (again - it assumes the ratings are reasonable and there are unlikely to be many upsets in the Q2vsQ3 round 2 pairings, as otherwise you could end up with half again as many perfect scores as would be seen in a normal swiss).

billbrock · January 6, 2011, 12:09am

Monte Carlo simulations: average ELO spread by round for each quartile?

artichoke · January 6, 2011, 12:30am

I’ll agree with the MC idea but could you flesh out the objective? You want the average ELO spread to be small or large to suggest effectiveness? It’s easy to get a small ELO spread next round: give lots of pairing points so that each “score group” is actually a rating group. Then do 1 vs. 2 pairings. But I guess we should be looking instead several rounds ahead after the manipulations have ceased, and look at average ELO spread with top half vs. bottom half pairings? I dunno, just random thoughts off the top of my head, you tell me what you mean.

artichoke · January 6, 2011, 1:03am

Best phrase of 2011 and some ways back into 2010!

hmb · January 6, 2011, 4:48am

Indeed, no one can run that fast. But neither Tim nor David suggested that was a reliable method for producing people who could run that fast. They did, however, suggest that it works differently for helping develop chess skills.

The main thread of discussion leads me to ask whether anyone would like to discuss that Diesen system besides me, since it satisfies some of the concerns expressed here. (So, also, do narrow class sections.)

Topic		Replies	Views
Optimal pairings. USCF rules, WinTD, and a program I wrote. Running Chess Tournaments	102	3045	January 23, 2012
Show me some respect Running Chess Tournaments	50	1140	December 20, 2008
Pairing Problem Running Chess Tournaments	89	1843	March 22, 2012
What is the correct pairing? Running Chess Tournaments	50	2886	August 23, 2004
Simple looking pairing question Running Chess Tournaments	31	747	June 22, 2009

Certification of Tournament Pairing Software

Related topics