Median and Modified Median inequity with byes

I’m not so sure I agree with this.

While there is a high correlation between tiebreak scores and OVERALL performance, keep in mind that the players are already tied on the basis of overall performance.

If over a large number of events various tiebreak methods tend to produce different results, then I would argue that there is little correlation between those tiebreak methods and the RELATIVE performance of the tied players.

I’ve asked David Kuhns, the chair of the Rules Committee, to drop by. He does statistical analysis for a living, he might even have some data on tiebreak performance to share with us.

I am only talking about the coorelation within a certain score group, e.g. players tied at 4-1 say for second thru sixth. Based on relative performance the common methods will predict with some accuracy the correct ranking. Yes, the second best player might finish forth with the third best second, but the correlation is definetly better than using pre-tournament ratings.

I never said there was 100% coorelation. Most tie-break methods will produce the same result, not every time but definetly a majority of the time. We only use multiple methods to break the ties that occur within the various methods.

The problem with performance rating as a tiebreak is that it assumes the ratings are evenly distributed throughout the field. If the number one seed is rated 2450 and the number two 2150, the two players just below the cut are both going to lose routinely in the first round, but their performance ratings will be quite different.

Here’s yet another example which shows the fallacy of using performance rating as a tie-break system.

In a 5-round tournament, suppose two players finish with the same score, and draw each other at some point in the tournament. Suppose, further, that in the other four rounds, they play the SAME four opponents (in a different order, of course). Then, the lower-rated would automatically have a higher performance rating.

This would ALWAYS happen in a round-robin (if there are no forfeits, withdrawals, etc) – tied players would always finish in REVERSE rating order.

Bill Smythe

I think Tournament Direction (or maybe Tournament Organization) is the right forum. USCF Issues is more for political stuff.

On the other hand, like a student transferring from an Ivy League school to a community college, such a move might improve the level of BOTH forums. :slight_smile:

Bill Smythe

That’s likely to be the result of ANY study of tie-break systems, and has been since the beginning of time.

As for Sonnenborn-Berger, the only reason that system even exists is that the others (Solkoff, Median, Modified Median) don’t work for round robins. In a round robin, tied players always remain tied after these other systems are applied. Unfortunately, even in a round robin, S-B tends to reward inconsistency – player A who defeats strong opponents and loses to weak ones will finish ahead of player B (tied with player A) who does just the opposite.

Bill Smythe

I have been asked to comment on this thread. I have done quite a bit of analysis on various tie-break methods, including performance ratings, double itterated performance ratings and the methods listed in the Rules as well as others such as SB and playoffs. This could get lengthy, so please bear with me.

First, let me state categorically, that no numerical method of breaking a tie is sound mathemetically or statistically. The sample size in a 4 to 9 round tournament is just too small to obtain any significance whatsoever. A 5 round event requires a seperation in tie-break, using the modified median method of at least 2.5 tie break points. Rarely obtained.

The advantage of numerical methods is that it is an unbiased method of breaking ties that works and is sellable. Someone wins, and there is no arguing with the numbers.
This advantage, by itself, is a good reason to use them foir SMALL unbreakable prizes, such as a trophy.

I am going to stick with ties for first place. We have several distinct categories, even here. N round event (my analysis was on events from 4-6 rounds)
#1. Perfect score tie.
#2. N - 1/2; draw with each other.
#3. N - 1/2; did not play with each other,
#4. N-1 or less.

In #1 and #2 the higher rated player has a strong advantage. Much stronger than the rating would indicate. The higher rated player wins on the modified median and the solkoff method 59% of the time. Note that this is even higher than the 55% expected score than white gets with a playoff.
In #3 and #4 the advantage is slightly less. About the 55% level (about the same as the player with white).

Modified Meidan and Solkoff, under analysis, are very close to the same kind of results. No significant differences were observed in all four categories.
The modified median has an advantage in that if you played someone who ended up with 0 points or had a bye, that result is dropped, and makes you seem more on a par with the other contender.

Cumulative score and especially opponent’s cumulative score, in my opinion, are random number generators, and shouldn’t even be on the list. The variance of results is so wide-spread, that there is no significance until the scores differ by about 50% of the maximum obtainable.

Performance rating favors the higher rated player even more. Take the example that two players were seeded 1 and 2. They both were paired ‘down’ every round. 1 is almost always paired against a slightly higher rated player than 2 (by Swiss rules). 1 will have a higher performance rating. Period.
If they played each other and drew, 2 gets the advantage only if the rating between 1 and 2 is greater than the cumulative diffences in the other rounds. Possible, but not likely.

Therefore, in my opinion, where the unsplittable prize is substantial, a playoff, ANY playoff, is far superior to a numerical tie-break, ANY numerical tie-break. Others will disagree with me, some quite strongly. That is OK, this is my opinion, and the numbers back me up.

Now to answer some of the questions brought up by other posters.

Quote:
While there are flaws with any tie-break system, there is a good degree of correlation between performance and each of the common methods

response:
Not true. Overall there is very little correlation (R^2 ~ 0.5) though better than a random distribution, not much better.


Paraphrase:
Modified median penalizes a player with a bye.

Response:
Definately NOT true. In fact, modified median FAVORS the player with a bye. Remember, that player didn’t have to play a game to get the full point.
The modified median method drops that 0 from the sum, decreasing the total by nothing. Solkoff uses the 0. That’s a penalty.
I am still wondering about the circumstances of this result. This must be an extremely rare and strange event. With an odd number of player, by Swiss rules, the lowest rated player in the event gets the bye in round 1 (the only round this would have an effect by the way).
Are you telling me that the lowest rated player actually tied for first place? I had no instances of that in the events I used in my extensive analysis. Very rare indeed. How can we make rules to cover such a situation? In fact, mathematically I prefer the modified median in this case.
If the player had a half-point bye and tied for first (several examples of that exist), that player CHOSE not to play and does not deserve tie-breaks for an unplayed game. If he was penalized for doing so, he chose that penalty himself.


Quote:
If over a large number of events various tiebreak methods tend to produce different results, then I would argue that there is little correlation between those tiebreak methods and the RELATIVE performance of the tied players

Response:
Very true. Little correlation exists between performance and any tie-break method.


Quote:

The problem with performance rating as a tiebreak is that it assumes the ratings are evenly distributed throughout the field. If the number one seed is rated 2450 and the number two 2150, the two players just below the cut are both going to lose routinely in the first round, but their performance ratings will be quite different.

Response (split this one)
Performance rating does not assume the ratings are EVENLY distributed. In fact, performance rating assumes that ratings are NORMALLY distributed. However, it assumes that pairings are somewhat random, which they are definitely NOT.
But your arguement is nonetheless correct.


Quote:
I am only talking about the coorelation within a certain score group, e.g. players tied at 4-1 say for second thru sixth. Based on relative performance the common methods will predict with some accuracy the correct ranking. Yes, the second best player might finish forth with the third best second, but the correlation is definetly better than using pre-tournament ratings.

Response:
The covariance between pre-tounament ratings and performance rating within a point group is very high (about 0.8) with first place ties. This throws way too much bias into the analysis. In your example the covariance is a bit lower, but still high. The difference in the methods is statistically insignificant, even taking more than 100 events as a sample.


Quote:
This is a bit out of my field of study, but why is the comparison to performance rating the key to evaluating the various methods? Seems like the measure of success should be against some predetermined strength criteria at the beginning of the event. I see performance rating for that one event just as biased as the tie-break methods

Response:
A bit better than performance rating is ‘expected score’ (or the probability curve used in computing ratings in the first place). The player with the higher expected score difference (performs better than expected) would win.
This turns the table, and strongly favors the lower rated player, in effect making this method a form of ‘handicap’ event. Not necessarily a bad thing.


Quote:
Arguably, (and IMHO) performance rating is the best measure of that performance – even better than the actual score in the event – IF the ratings are reasonably accurate to begin with.

Response:
An established player’s rating varies, with a 95% confidence interval of 55 points about a mean performance. This is not precise enough, in most cases, as performance ratings among two tied players are likely to be less 55 points. The sample size (N rounds) is far too small and the measurement error (precision) is far too large for a statistically significant result. In a larger point group (4 players or more) there may be significannce between first and last, but not first and second.

Quote:

If a player happens to get a first round full-point bye because he is the odd player, I don’t think that he should automatically be at the bottom of the tie-break order.

Response:
We are not talking about tie-breaks in the top half of the tournament here?
Modified median deletes the bye, therefore eliminating that bias, if we are talking about a plus score-group. Why are we even bothering to break ties in a lower score group? I don’t see the point. If its for a class trophy, buy a duplicate trophy! they’re cheap!
Solkoff, on the other hand, does precisely what you are saying. (all rounds count). Perhaps better is full median (dropping both high and low) BUT this reduces the sample size even further, thus reducing significance.

I probably did not cover everything here, but this post is far too long already. If you have any specific questions, please ask.

David Kuhns
professional statistician

Ties for first place are generally no more than two players, although I have seen as many as eight. Would your answers be the same for ties for the lower places, e.g. 3rd-10th place where the players have records such as 3-1-1 or 3-2?

Also, are the results any different for a 64 player five round event as opposed to a 200 player five round event?

Is the answer the same for the ties at 3-1-1 and 3-2 as opposed to ties for first place? Larger versus smaller sections as well?

The unplayed result comes into play a lot in scholastic events. For every scholastic event there will be a handful of preregistered players who get paired but don’t show up for the tournament. In many events the player they were paired against will get a forfeit win instead of being repaired. Often this is a player who ends up in a tie for a prize.

My response is that Modified Median is putting the player with the unplayed game back on even footing when comparing results with a player who had a first round opponent. Since these are generally the higher rated players who are involved in the tie, the first round result will be the one that is thrown out under Modified Median. Under Solkoff, counting a first round game against a much lower rated opponent who’s only score is a full point bye is a penalty on the player with no opponent.

David, thanks for responding when called.

David,

Agreeing that there is some bias towards ratings when determining tie-break order I ran a 64 player five round tournament with the player ranking randomly determined. I ignored upsets and draws. Within the 3-2 score group (20 players) using Modified Median, Solkoff, Cummulative in that order the following results were observed:

1st place #1 player
2nd #3
3rd #7
4th #2
5th #15
6th #5
etc.

Isn’t this better than flipping a coin? Even if there is bias towards pretournament ratings wouldn’t the introduction of ranking by pretournament ratings improve on the outcome?

Grant wrote:

My detailed anaysis did not go deep into the lower point groups, but I can respond with some statistical logic.

The tie break systems will have a higher correlation with performance, given mixed results (some losses and wins). And tie-breaks get the best result with an even score. There is a definate differentiation between the high tie-breaks and the low. In addition, players in these score groups will have played both higher rated AND lower rated opposition (an important point)
Having said that, we are normally only dealing with the top end of the scoregroup. Typically, the computed tie-break is very close among (what ends up to be) adjoining players. Here’s where the logic falls apart. There is virtually no significance between the what ends up to be the first place of the group and the second. In fact, often you have to go to a second or even third tie-break to break that tie.

As far as player to round ratio, the results get worse as the player to round ratio gets worse. The tie breaks will work slightly better with 64 players than with 200 players. (Of course, so does the probability of having ties in the first place). They improve dramatically when the number of players is less than 2^N (less than 32 players in a 5 round event).

Same answer. The correlation improves dramatically. There are more players tied, and the overall spread is larger. The highest tie break will have faced stronger opposition than the lowest. However, the difference between two adjoining players after tie break will be small and is not statistically significant.

The original question was about a Bye, which is different from a forfeit win. I see your point here, somewhat. My belief here is that the rules on tie-break do not adequately cover this situation. What I would tend to favor (with insufficient analysis, at this point) is to treat the opponent of the unplayed game with an ‘adjusted’ score. Unplayed games are given 1/2 tie break points each, thus a player with a win forfeit might be awarded 2 1/2 tie break points for that ‘game’ (5 unplayed games in a 5 round event). Or even better might be the expected score (rounded to the nearest half point) that player would have acheived if he had played. My educated guess tells me that might be more ‘fair’ than giving 0 tie break points.

I’m not sure I think that’s a problem. IF the ratings mean anything, the the lowest rated player did have the best tournament, right?

First of all, you can’t ignore upsets and draws. They play an important role in any analysis. In a simulation, you can compute the probability of an upset based on the rating formula, and even compute the probability of a draw. Then using a random number table, determine a winner, loser and/or draw based on that probability.

Yes, the overall spread is better than flipping a coin. But it is NOT better among adjoining players (see my previous post).

My original analysis used actual tournament results (not a simulation) from more than 100 events. By the way, these results agreed quite well with simulations using probability tables. Note that the probability table given in the rating formula is slightly different from (but highly correlated to) actual results. The actual probability of upset is slightly higher than the table value.

Did you consider any class tournaments? I’d think performance rating would be less dependent on pre-tournment rating in that case.
Even in the case where the #1 and #2 players end up tied, the #1 player did have slightly stronger opposition, right? Maybe not statistically significant, but then the same will hold true for any tie-break system, won’t it?

Maybe tie-breaks should get go to the lowest rated player??? If I were looking at a group of players with the same score, I’d say that the lowest rated of those players had the “best” tournament result!

In all, I think any kind of playoff is better than a tie-break. Tie breaks are only a little better than flipping a coin.

This is where I am a bit confused. If you used actual tournament results and not a simulation on what basis did you determine who was the best relative player in that particular event? Isn’t it important to know the relative strength of the players to evaluate the effectiveness of the tie-break methods?

Regarging response (1):
I disagree. We shouldn’t award the prize to the player with the most unexpectedly good result, but to the one with the best result. (Unless the purpose of awarding the prize is to encourage the younger or lower-rated players rather than to award the best performance – maybe that’s an acceptable alternate goal). If somebody achieves the same score, but against better opposition, then that’s the better result. That seems to be what tie-breaks are crudely trying to determine.

Regarging response (2):
So what? What kind of confidence can we have that other tie-breaks (not performance rating) accurately represent player strength? If you calculated an expected tie-break score as a function of true player strength, I’m pretty sure you’ll find that other tie-break scores are even less meaningful (and yes, not statistically significant measures of the differences in player strength within a score group). As many problems are there are using performance rating, anything else has as many or more flaws. The best guess we have of the relative strength of the various tied player’s opponents is the ratings of those opponents and that’s what (along with score, of course) determines performance rating.

I’d guess that the reason that performance rating wasn’t used historically is that the result is more complicated to calculate than most of the simple tie-break methods. With a computer, it shouldn’t be hard. BTW we can “modify” performance rating the same way provisional ratings are treated to answer most of the other complaints – something like: don’t count victories against opponents more than 400 points below the performance rating or losses to opponents more than 400 points higher, right? With software to calculate the result, it shouldn’t be any trouble to do this.

My own personal evaluation of how well I did in a tournament is always based on performance rating. I suspect many other players feel the same. If that’s the case, why not use it for tie-breaks?

Not specifically, but I did include many sectionals (three and four sections)

That’s what class prizes are for.
It depends on your goal for the event. as you pointed out above, the assumed goal of tie-break is to determine the “strongest among equals”, NOT “who got the most unexpected result.” or “who performed better than their rating”

by pre-event rating, yes, but #2 did not have the opportunity to “prove himself” because he never (or seldom) was paired against higher rated opponents. Remember, I was using this as an extreme example of a perfect score tie.

my point precisely.

The same place you did in your simulation. Their pre-tournament rating. I also did a “sanity check” on the event to see of the number of upsets and draws was dispropotionate to historical statistics.
I also did a double itteration on performance rating. That is, I calculated all the performance ratings adjusted those by eliminating games where a loss would increase the estimate, or a win would decrease it, and used those as inputs to calculate the ‘final’ pefomance rating.
For those that favor performance ratings, (by the way, I am one of those) this double itteration is probably the most accurate method of determining that estimate and of breaking ties. However, I will still stick by my earlier statements, though, that the sample size is too small in a single event to have much significance. A few points different in performance rating is meaningless, and does not become significant until it succeeds in exceeding the measurement error (+ 55 points or more than 110 points apart). So this method also has its difficulties.
I avoided scholastic events because I wanted to include as many established players as possible. Adult established players have a much more stable rating (estimated playing strength) than provisional or scholastic players. I also avoided events that had a disproportionate amount of unrated players for the same reason.
I also did some simulations using probability tables and had a very high correlation to actual results.

All the tie-break methods are “effective”. They do succeed in breaking the ties. By using arithmetic, they do resolve personal bias. They do their “job”.

However, they do NOT answer the question “who is the best in this point group” consistently or accurately.

Two or more people tied with the same score are TIED. They remain tied. Any title or money is shared. A tie for first place should be considered as co-champions. The net result of this analysis is that tie breaks should ONLY be used to award small, unsplittable prizes.

Which performance rating do you want to use, the one based on the pre-event ratings for the players, which could be WAY out of date for some players, or the ones from the initial rating report for the event, probably somewhat less out of date, or the ones from the first re-rating of the event, etc?

Performance ratings favor the lower-rated players at least some of the time:

Consider two players who play the same 4 players, with the same results, and then draw against each other in the final round. The lower rated player will get a higher rated performance rating, because he gets his final opponent’s higher rating.

It is worth noting that the WORST of the tiebreaks (if there is such a thing as a worst choice among a number of statistically unsound choices) is cumulative, which used to be the default for most large tournaments, because they could be computed during the final round. The cumulative system is like voting in Chicago, it rewards those who win early and often. :slight_smile:

Can we say “measurement error”?

Not quite, only if the difference between the two players is more than the cumulative delta of the other four games. In most cases, this still favors the higher rated player.

Itterative performance rating (using the first computation to calculate the second) helps the measurement error, but does not solve it. Multiple itterations, until no integral change is detected is even better.
This process is not practical.

Remember, I said double itteration is the best of the tie break methods, I did not say it was a good one.

This has been an interesting discussion. I think I have said enough on the subject.

One final parting shot…

Do not trust tie breaks to give you the right answer, only trust them to give you AN answer. If you must use tiebreaks for an unsplittable prize, my actual recommendation is to use the list in the rulebook. Why? because they do give an answer, and because they are in print for all to point to as a reference. Most players will not question that.
For substantial unsplittable prizes plan for, advertise, and implement a playoff.

Thanks for listening, and good chess

Have we determined that this is the correct forum? I thought Jud was attempting to change the order of tie-break methods. If so, then the USCF forum seems more appropriate to me.

I see the Tournament Direction forum as an advice area for tournament directors, not as an area to get the attention of the Executive Board and Delegates. I assume members of both of these groups are more likely to read the USCF Issues forum than this one on tournament direction. Personally, I read more of the post in that forum than I do in this one.

Still confused.

I did not use the pre-tournament rating as a measure of the effectiveness of the tie-break method. I used a preassigned strength for each player in the simulation. For some of the players in each simulation the preassigned strength was close to the player’s pre-tournament rating, for others it was not. At the end of the simulation I compared the finish as determined by the tie-break methods to the preassigned strength, not to the pre-tournament rating.

Using pre-tournament ratings seems biased to those ratings.

It seems intuitive to me that if you use pre-tournament ratings to calculate the double itteration then compare the result to pre-tournament ratings that the double itteration will be the best at predicting pre-tournament ratings. I just don’t see what this has to do with evaluation of tie-break methods.

Tie-breaks come into play more in scholastic events than adult events. Although I have won a few trophies as an adult on tie-breaks. As you point out, scholastic events have more provisional and unrated players. Since the pre-tournament rating information is not established and beginners improve rapidly, pre-tournament and performance ratings seem useless to breaking ties.