I have been asked to comment on this thread. I have done quite a bit of analysis on various tie-break methods, including performance ratings, double itterated performance ratings and the methods listed in the Rules as well as others such as SB and playoffs. This could get lengthy, so please bear with me.
First, let me state categorically, that no numerical method of breaking a tie is sound mathemetically or statistically. The sample size in a 4 to 9 round tournament is just too small to obtain any significance whatsoever. A 5 round event requires a seperation in tie-break, using the modified median method of at least 2.5 tie break points. Rarely obtained.
The advantage of numerical methods is that it is an unbiased method of breaking ties that works and is sellable. Someone wins, and there is no arguing with the numbers.
This advantage, by itself, is a good reason to use them foir SMALL unbreakable prizes, such as a trophy.
I am going to stick with ties for first place. We have several distinct categories, even here. N round event (my analysis was on events from 4-6 rounds)
#1. Perfect score tie.
#2. N - 1/2; draw with each other.
#3. N - 1/2; did not play with each other,
#4. N-1 or less.
In #1 and #2 the higher rated player has a strong advantage. Much stronger than the rating would indicate. The higher rated player wins on the modified median and the solkoff method 59% of the time. Note that this is even higher than the 55% expected score than white gets with a playoff.
In #3 and #4 the advantage is slightly less. About the 55% level (about the same as the player with white).
Modified Meidan and Solkoff, under analysis, are very close to the same kind of results. No significant differences were observed in all four categories.
The modified median has an advantage in that if you played someone who ended up with 0 points or had a bye, that result is dropped, and makes you seem more on a par with the other contender.
Cumulative score and especially opponent’s cumulative score, in my opinion, are random number generators, and shouldn’t even be on the list. The variance of results is so wide-spread, that there is no significance until the scores differ by about 50% of the maximum obtainable.
Performance rating favors the higher rated player even more. Take the example that two players were seeded 1 and 2. They both were paired ‘down’ every round. 1 is almost always paired against a slightly higher rated player than 2 (by Swiss rules). 1 will have a higher performance rating. Period.
If they played each other and drew, 2 gets the advantage only if the rating between 1 and 2 is greater than the cumulative diffences in the other rounds. Possible, but not likely.
Therefore, in my opinion, where the unsplittable prize is substantial, a playoff, ANY playoff, is far superior to a numerical tie-break, ANY numerical tie-break. Others will disagree with me, some quite strongly. That is OK, this is my opinion, and the numbers back me up.
Now to answer some of the questions brought up by other posters.
Quote:
While there are flaws with any tie-break system, there is a good degree of correlation between performance and each of the common methods
response:
Not true. Overall there is very little correlation (R^2 ~ 0.5) though better than a random distribution, not much better.
Paraphrase:
Modified median penalizes a player with a bye.
Response:
Definately NOT true. In fact, modified median FAVORS the player with a bye. Remember, that player didn’t have to play a game to get the full point.
The modified median method drops that 0 from the sum, decreasing the total by nothing. Solkoff uses the 0. That’s a penalty.
I am still wondering about the circumstances of this result. This must be an extremely rare and strange event. With an odd number of player, by Swiss rules, the lowest rated player in the event gets the bye in round 1 (the only round this would have an effect by the way).
Are you telling me that the lowest rated player actually tied for first place? I had no instances of that in the events I used in my extensive analysis. Very rare indeed. How can we make rules to cover such a situation? In fact, mathematically I prefer the modified median in this case.
If the player had a half-point bye and tied for first (several examples of that exist), that player CHOSE not to play and does not deserve tie-breaks for an unplayed game. If he was penalized for doing so, he chose that penalty himself.
Quote:
If over a large number of events various tiebreak methods tend to produce different results, then I would argue that there is little correlation between those tiebreak methods and the RELATIVE performance of the tied players
Response:
Very true. Little correlation exists between performance and any tie-break method.
Quote:
The problem with performance rating as a tiebreak is that it assumes the ratings are evenly distributed throughout the field. If the number one seed is rated 2450 and the number two 2150, the two players just below the cut are both going to lose routinely in the first round, but their performance ratings will be quite different.
Response (split this one)
Performance rating does not assume the ratings are EVENLY distributed. In fact, performance rating assumes that ratings are NORMALLY distributed. However, it assumes that pairings are somewhat random, which they are definitely NOT.
But your arguement is nonetheless correct.
Quote:
I am only talking about the coorelation within a certain score group, e.g. players tied at 4-1 say for second thru sixth. Based on relative performance the common methods will predict with some accuracy the correct ranking. Yes, the second best player might finish forth with the third best second, but the correlation is definetly better than using pre-tournament ratings.
Response:
The covariance between pre-tounament ratings and performance rating within a point group is very high (about 0.8) with first place ties. This throws way too much bias into the analysis. In your example the covariance is a bit lower, but still high. The difference in the methods is statistically insignificant, even taking more than 100 events as a sample.
Quote:
This is a bit out of my field of study, but why is the comparison to performance rating the key to evaluating the various methods? Seems like the measure of success should be against some predetermined strength criteria at the beginning of the event. I see performance rating for that one event just as biased as the tie-break methods
Response:
A bit better than performance rating is ‘expected score’ (or the probability curve used in computing ratings in the first place). The player with the higher expected score difference (performs better than expected) would win.
This turns the table, and strongly favors the lower rated player, in effect making this method a form of ‘handicap’ event. Not necessarily a bad thing.
Quote:
Arguably, (and IMHO) performance rating is the best measure of that performance – even better than the actual score in the event – IF the ratings are reasonably accurate to begin with.
Response:
An established player’s rating varies, with a 95% confidence interval of 55 points about a mean performance. This is not precise enough, in most cases, as performance ratings among two tied players are likely to be less 55 points. The sample size (N rounds) is far too small and the measurement error (precision) is far too large for a statistically significant result. In a larger point group (4 players or more) there may be significannce between first and last, but not first and second.
Quote:
If a player happens to get a first round full-point bye because he is the odd player, I don’t think that he should automatically be at the bottom of the tie-break order.
Response:
We are not talking about tie-breaks in the top half of the tournament here?
Modified median deletes the bye, therefore eliminating that bias, if we are talking about a plus score-group. Why are we even bothering to break ties in a lower score group? I don’t see the point. If its for a class trophy, buy a duplicate trophy! they’re cheap!
Solkoff, on the other hand, does precisely what you are saying. (all rounds count). Perhaps better is full median (dropping both high and low) BUT this reduces the sample size even further, thus reducing significance.
I probably did not cover everything here, but this post is far too long already. If you have any specific questions, please ask.
David Kuhns
professional statistician