I played in this recent blitz event on chess.com, uschess.org/msa/XtblMain.php?202008265222. My rating went up from 1865 to 1880. The ratings calculator estimated my rating would go up from 1865 to 1893 using the post-event ratings of my opponents like the rating calculator recommends. Why is the ratings estimator calculating a rating 13 points different in this case?
One of your opponents was unrated, and the ratings estimator doesn’t have the ability to do all the things that are done with unrated players, so that often results in an under- or over-estimation of that players’ opponents’ post-event ratings.
Next week’s rerate will put it into proper chronological order, which may also have some impact. I can’t look at the trace logs until then.
In this event, uschess.org/msa/XtblMain.php?202008172322, I didn’t play any unrated players. My rating went down from 1880 to 1865. The ratings estimator calculated my rating would go down from 1880 to 1853 using the post-event or pre-event ratings of my opponents. Why would it be off 12 points in this case?
obviously the ratings system doesn’t like you.
Since the rating system only dropped my rating to 1865 when the ratings calculator said it would go down to 1853, I would say the rating system likes me in this case .
So it cheated you out of 13 points in one event, and gave you back 12 points in another.
Please know that the ratings estimator was written by a computer professional (in Texas, I don’t remember his name anymore) as an aid to players in estimating their new ratings. It was never intended to be 100% accurate. It can’t be, because the “real” formula is far more complicated, and makes use of various bits of information about the players that the estimator can’t possibly be aware of. A concept called “effective number of games” is an example of such “extra” information, as is the fact that the Ratings Committee monitors the ratings for inflation or deflation, and from time to time changes the “bonus threshold” and/or the “bonus value” and/or various players’ K-factors and/or a whole bunch of other things to keep everything more or less aligned and stable.
Also, your idea that your rating change should be calculated from your opponents’ post-event ratings (instead of from their pre-event ratings) is slightly flawed. Which came first, the chicken or the egg? If everybody’s post-event ratings were calculated based on everybody else’s post-event ratings, you’d have an endless loop. In Excel this is called a “circular reference” because the value of A depends on the value of B and vice versa.
To avoid these circular references, the rating system uses a two-pass (or maybe three-pass) system, rather than an “infinite-pass” system which would just keep calculating everybody’s post-event ratings from everybody else’s post-event ratings until everybody’s ratings quit changing from one pass to the next.
So, for a whole bunch of reasons, you should never expect any ratings estimator to come up with the exact same ratings as the “real” system with all of its sophistication and access to external facts.
Bill Smythe
The estimator knows about ‘effective number of games’ and bonus points, but it doesn’t know about seeding ratings from other systems or the two-step ratings computations everyone gets or the extra steps for unrated players.
It probably would best be described as two and a half. There’s an initial pass which includes only the players with N=0 pre-tournament information (generally having only age to help guess the ratings). That’s actually done treating all those as N=1 just to tack things down a bit.
So perhaps this is why it is called … wait for it … an [size=200]ESTIMATOR[/size]?
I didn’t have to wait for it. The large word in red came up right away.
Bill Smythe
Micah isn’t imagining this. There has been an error in the calculations of the on-line ratings dating back to the introduction of OL Blitz ratings. It was first brought to our attention last month—goes to show how seriously most people took their OL Blitz ratings. These will be corrected in a future re-rate (timing still to be determined).
Thanks. I thought this might be the case.
Mr. Smith played four games. All four opponents were rated more than 600 points below him. This makes his expected score approximately 3.9 points. He actually scored 3.0 points. Therefore he deserved to lose 0.9 times K from his 1880 rating. The K factor for an 1880 player in a four round tournament is 28. Consequently, his rating change should have been -25 points (1880 to 1855). Using the exact decimals, the rating calculator showed 1880 to 1853.
Alas, the actual rating report shows 1880 to 1865.
What could have gone wrong? He scored 3.0 points in four games. That’s a fact. He was rated 1880 going into the tournament. MSA tells us that. His opponents were rated between 900 and 1300 and three were provisional. When you’re 1880, the difference between facing a 1100, 1200, or 1300 opponent is quite small.
The only other number in the calculation is the K factor itself. Are the values of K used in online blitz ratings different than the K used in over-the-board ratings?
Michael Aigner
My guess (and it is just that, a guess) is the number of games previously played. If the player has only a few games, the K factor will be larger than if the effective number of previous games is 50 or more.
The K factors are wrong for online chess, and have been all along.
There are other ratings formula changes on the EB’s agenda, assuming those are approved US Chess may wait until those are tested before fixing the K factor. I’ve also suggested that we add online ratings systems to the floating minimum floor at the same time, since it is pretty clear from the rewrite of Chapter 10 that they should apply.
I believe it was written by George John of Houston.
Moogy in Texas
Another Texan sent me a PM with this same information, which jogged my memory. Indeed, it was George John. Thank you!
Bill Smythe
Does anyone know when all this is going to be done?
I haven’t received official word yet that the EB has approved the formula changes, I think it was out for email vote due to be completed earlier this week.
The changes still would need to be programmed and tested on the testbed server, a process that would probably take a week or longer. (Finding or creating test cases for all of the variants for how to initialize ratings could take a while, as all told there are several dozen options.)
As I understand it, the changes to the online ratings initialization procedures will be retroactive to the start of those ratings systems, as would the correction to the effective games/K formula, but the changes to the OTB ratings initialization procedures would be effective at some future date that has not yet been set.
Thanks for the update Mike.