Event Performance Ratings

I take an Event Performance Rating with a heaping teaspoon of salt, but I always check. I mean, sometimes a 3-2 score reflects a better performance than a 4-1, and it’s satisfying to quantify this. But I just finished a weekend event where I played three folks around 1600, one guy rated about 2050, and went 4-zip, taking one half point bye. My EPR computed to 2450! No, much as I’d like to believe it, that’s not reality, not even close. The rating estimator said I’d gain about 20 rating points, which seems a more realistic assessment of my score (I went in with a rating of 2012).

Performance ratings are IMHO fatally flawed for many reasons, which is why we no longer use them when generating a player’s updated rating. They have a very limited role in the generation of initial ratings for some unrated players, and even in that role they are flawed and those flaws have led to some surprising and unrealistic initial ratings.

For players with established ratings, the EPRs can be used to track trends in tournament performance over a set period of time. They can give interesting results which can help you to determine how to make changes. For example, you might note that in the winter you seem to perform better than during other seasons. Your EPR at local events might be different than at large open events. Your EPR at one particular event every year seems to be higher than at other events. Maybe you are most comfortable at that tournament and will make sure to go back to it as your “mojo” of performance is at its most confident. There are some tournament which you might find are always bad for you. Maybe you do not do well because the TD always seems to mess things up or you cannot find a good place to eat. The EPR is just one indicator of how you do. One EPR might not be enough to tell you anything. A string of EPRs might tell you more. I know someone who swears by biorhythms and their effect on his performance.

I’ve read that in Japan, being in a double or triple critical day on your biorythm chart can get you a warning rather than a traffic ticket.

I remember when biorhythms were the rage. You can’t everything like this too seriously. Pseudoscience has made people suspicious of real science and critical thinking. In the newspapers they still publish astrological forecasts. I know people who live by them. Then there are the “lucky numbers” you get in fortune cookies along with the quaint fortunes or adages.

If you’ve never read it, I strongly recommend author Michael Crichton’s 2003 address at CalTech: Aliens Cause Global Warming.

See stephenschneider.stanford.edu/P … cle_inline

Andy Kessler had a column on the Wall Street Journal’s website on Sunday in which he quoted at length from Crichton’s address. It is probably behind their paywall, but here’s the URL anyway: wsj.com/articles/follow-mic … opin_pos_3

I weary of the shows about aliens on the History Channel. As a former social studies teacher, I could bore you to death with all of the inaccuracies, prejudices, and political gibberish in the texts we use in the schools. Political and social biases permeate the literature we use in the social sciences. We need a little more skepticism of Crichton.

When I was in college, our political science department head required all of his students to sign up to take a statistics class. Officially, we did not need to, but if one wanted to stay in his good graces you followed his prescription to take way over the minimum number of courses in economics, history, political science, philosophy, sciences, and math. In the statistics class, the professor pounded into us the idea that you must always be careful of extrapolating from data. His favorite topic was on errors in polling, both in design and in interpretation. We all suffered in the class as it was the era before calculators became de riguer. We had to do all of the formulas on paper by hand which took forever but which also showed what the numbers did and how they could be subtly and not so subtly shaped and manipulated by errors. Today when I see polling over a year in advance of an election, I have to laugh. Our political science head taught us to read carefully and test assumptions and “facts”, especially those provided by authoritative sources, like the government and industry. In those days the buzzword in economics was “interdependence”, the precursor of the term “globalization”. As I recall of the time, the great fears were of global cooling and the perils of overpopulation.

Later, in graduate school, one of my professors extolled the works of Mihaly Cziksentmihalj who wrote the book “Flow” and other works on the development of expertise and work in general. There were parts that covered chess that I found fascinating but also simplistic and unrealistic. I remember writing a paper critical of several of his works as going steps too far in applying his concepts broadly as a utopic prescription for learning and the development of society. I did not know that he was a mentor of hers and I expected the paper to get crushed for a grade once I found out. Instead, she laughed and said he would have adored the paper as a correct and scientific way to approach such screeds. She said we must all be skeptical and take care that we do not go too far in trying to impose values. I recall messing up some of her research on “personality types” as applied to education as I skewed the data in her surveys in an unusual way. After I explained that I was a chess master and might not fit into some of the categories, she was relieved and said that outliers are always a problem. For that and other reasons, I have always been skeptical of elements in educational psychology. Even so, I found graduate work entertaining and miss it.

(The Crichton address was not about aliens or global warming, that was just a hook to get people interested.)

I enjoyed my years in graduate school as well. I took several graduate courses in statistics and research methodology, most of the surveys I see reported are either completely misunderstood by the media or appallingly poorly designed, if not both.

But then again, most of the ‘polls’ I get calls on are really sales calls.

I had an interesting discussion with a VP of Gallup on this once. (His wife and I were on the Lincoln School Board together.)

It surprised me very little when the 2016 election polls in the USA were dead wrong, when the Brexit polls in the UK were dead wrong, when the polls on the 2017 UK election were dead wrong, and I’m watching the initial polling for the upcoming 2019 UK election with both fascination and trepidation. The prognosticators got the choice for the new Speaker of the House of Commons right, though.

One of the funniest anecdotes I have heard on polling and statistics concerned the 2008 Iowa primary. Young people were sent out to get the opinions of potential voters to see how the candidates stood. The sample size and confidence levels of the polling were questioned. The young person in charge of gathering the data told the candidate that the sample size was very large. When asked what the size was, he told the candidate and his staff that his group had knocked on every door and that their sample size was everyone. Today, they are drilling down to the core of each precinct to discover what the voters think on every candidate and every issue using available technology. In a way, this is a throwback to the old days of machine politics in big cities. Ward heelers knew everyone in their precincts and had a better than good appreciation of what was going on, what people liked and did not like, and who would vote. They were extremely close on numbers. Having this level of data is priceless.

The national polling done today is useful in trying to attract donors but useless this far out in determining who will win. State by state polling and dissecting county information is where it is at for candidates. The red and blue maps do not tell you much as there are wide swaths of territory with small populations. As you probably know, in PA there are two major pockets of Democrats in Pgh and Philly with smaller pockets in Wilkes-Barre and Erie. Most of the rest of the state is conservative and Republican. When the votes come out, major attention is paid to Philly and the suburbs during Presidential elections. If the voting total differences exceed a certain number, it does not matter that much what the votes are in the rest of the state. The pros pay a great deal of attention to individual Congressional districts, many of which have been gerrymandered. Finding patterns in voting is often compared to playing chess. I think that voter suppression, gerrymandering, and cutting people from the voting rolls is more like chess.

Your post-tournament rating is (in effect) a weighted average of your pre-tournament rating and a (not the, a) rating of the performance, where the performance rating is estimated based upon a linearization around the old rating. If you think about what you did, it would be an astonishing result for a 1500, a really strong result for an 1800, a very respectable result for a 2000 like you, and a ho-hum result for a 2400. The approximate Rp is roughly your rating plus the average rate gain x your effective games number. For a 2000, the latter is roughly 30, so this would be about a 2150 performance (2000 + 30 x 20/4). For a 2400, the Rp would be something like 2425. The problem with the EPR is that it tries to evaluate the performance in the absence of any information about what type of rating is expected. Sometimes, that works OK (even scores against tightly packed competition), sometimes it works rather badly (virtually any perfect + or - score, where the maximum likelihood estimate of the performance is actually +infinity or -infinity).

Tom Magar suggests tracking your PR’s, but that’s tracking a bunch of individually flawed numbers. A more useful thing to track is your average (per game) rating change which will give a better idea of what your performance is relative to your rating. (And if you want to convert that to a PR, multiply by 30 and add your old rating).

I use the EPRs to graph trends over periods of time. It has been my experience that if one of my students has 3 or more EPRs above his rating, it has been an indicator that they are going to have a rating pop of 100-200 or more points very soon. When they EPRs drift or start to go downward, it means we have to work harder and study more. One EPR can be an anomaly. The trend graph is what we look at. Kids and adults get frustrated when they arrive at inevitable plateaus and will either work harder or get lazy and sloppy about study. Tracking your trends is a device to motivate. It is interesting to see whether the kids will be “rockets” and soar which happens to some players initially, move up in steps, or trend upward in a steady climb. Often it is a roller coaster of dips and surges depending on how they are working. A distinct, steady decline is something to be concerned about as it may show that the player is starting to give up.

When a player is floored, it is really frustrating to have EPRs that are a hundred or more points below your rating. When several happen in a row, it is time to rethink what you are doing to study and play. If your EPRs are going up, you don’t feel so bad that the actual rating only goes up a few points. Sometimes you have to drill down deeper and see who you are losing to and why. Then we switch from objective, numbers based measures to subjective judging of the quality of the play through game study. If a student seems to be stuck on a rating, but he has shown that he can beat players 200-300 players occasionally, then we have an indicator and a target for developing his/her talent. Using objective and subjective measures is not always easy, but EPRs are one among a group of interesting tools to use.

That makes sense. A 2150 EPR is in the general range of what I would have guessed. Changing one of those wins to a draw causes the Rating Estimator to assign me an EPR of 2141. So, as you say, a clean sweep throws things off.