How to determine the "best" pairings

I heard this line a few times too often over three decades of active play. “The computer is unbiased” is the next sentence. Fortunately, it has now been a while since I joined a tournament not directed or assisted by a NTD.

If SwissSys isn’t always giving the most optimal pairings, why not contact Thad about it so he can improve SwissSys?

1 Like

My guess is that the settings have a lot to do with whether pairings are ‘suboptimal’ or not, to the extent that ‘optimal’ even exists.

I do have to wonder how many TDs even know what those settings are and what they mean?

I ran my Senior TD pairing question through WinTD and had 3 different pairings depending upon the settings. (I had come up with the same three myself when I did the exam). All three were “legal” in terms of swap sizes, did maximal color corrections and were not strictly dominated by one of the others. (That’s what you can get when you have to do wholesale rearrangements.)

In reality, FIDE Dutch was only “faux” deterministic. It’s deterministic if you can pair the section successfully following the direct rules (which calls for backtracking at most score group to fix major problems). If you get to the end and can’t pair the last score group and can’t fix it by unwinding to the previous one, then the current rules have a (very) convoluted process for backtracking through even higher score groups which eventually results in a whatever works. The 2025 rules simply punts the whole question to

1.9.3 If it is impossible to complete a round-pairing, the arbiter shall decide what to do.

Are TD exams graded by hand or using pairing software in which settings would matter? Like others, I got two pairing questions wrong with no explanation so I was left feeling stuck.

I didn’t get the point about needing to generate hundreds of test questions for an online exam. How many unique paper tests exist now, 10, 100? For generating unique pairing questions, would we need more inputs than variations on ratings and score groups?

My understanding is that the club/local (same exam) and senior exams are multiple choice, graded by the office. ANTD and NTD exams are more of an essay format and are hand-graded by a group of NTDs.

In a Keller plan course, the students can keep taking the exams until they’re satisfied with their grade on that exam. Of course that means you have to ask different questions each time they take the exam, which means you need a lot of questions.

Because students take the exams until they’ve demonstrated sufficient mastery of the underlying subject material, each exam (and reading through the explanation of the answers) becomes a learning experience that advances the student’s knowledge on the subject material. That’s what I meant upthread about the difference between ‘seeing what they know’ and ‘make sure they know what they need to know’.

I’m not sure how many versions there are of the current club/local and senior exams, I think it may still be around 3.

1 Like

Thank you. Is it possible the answer key is old, maybe newer rules or such aren’t incorporated? I realize that’s a reach.

1 Like

I assume TDCC reviews the tests periodically, especially after significant rule changes. I don’t know how recently that was done, though.

1 Like

It would be nice to know the percentage of correct answers across the population. For example, if 90% of test takers get a problem right and I got it wrong, I should practice and study more. But if a large number get a problem wrong, maybe the answer needs review. For those who’ve written tests, is there a target for percent correct per question over time?

Maybe the interns can setup a database to collect the data!

I don’t know how many people take the senior exam every year, it may only be a few dozen, and I don’t know if there’s a file of recent senior exam tests to use to populate such a database.

Would you rather attend an event run by a TD who missed 24% of the questions on the senior exam or an event run by a TD who got a 90+% score after multiple times of taking a Keller Plan format exam?

ETS (the people behind the SATs) has published a lot of data on correlation of individual questions to overall scores, but it’s really dense material. (I remember looking at it when I was taking a multivariate statistics course around 1980.)

I think the intern projects are already pretty much set, though looking at validating pairing programs is one of the projects.

Those are graded by hand. I don’t know how they handle partial credit. One would hope that if you get a pairing where two players can’t even legally play, you would get a zero even if two of the four pairings matched the “correct” answer. OTOH, as I said above, I came up with three legal, non-dominated, pairings and two of those had no board in common and I would hope that would get partial credit despite that.

This does not mesh well with a policy requiring 6 months between attempts (2 months between attempts 1 and 2).

That policy may exist more because there are only a few different versions of each exam. And also, the current format isn’t intended to be a learning experience, just an assessment.

Again, that’s why a Keller Plan course needs a big question bank, because the students are ENCOURAGED to keep taking the exams until they pass. (A passing score is usually set fairly high, like 90%.)

If each exam has 50 questions, then having 500 questions in the question bank means they can take it 10 times without repeating any questions. (In practice it gets messier than that, because each test is usually divided up into sub-sections by subject, so in a TD format there might be 50 questions on pairing but 25 questions on clock rulings, 30 on illegal move rulings, etc.)

Some Keller plan software packages just pull a random set of questions from each subject group, others track which questions the student has seen (and whether the student got them right) and use that to select questions for the next exam, with a mix of new questions, ones they got wrong in the past, and ones they got right in the past. (Most packages swap the order of the answer around, some get even fancier and can adjust the question to produce many examples of the same type of question, that works well in a math course, maybe not so well in American History. It would be an interesting experiment to see if that could be applied to pairing questions, maybe using an AI engine.)

But as I said upthread, I’ve pretty much given up hope of US Chess ever going to this kind of learning experience for TDs.

Um, are the interns NTDs? :slight_smile: I wonder just how much validating will get done although I like the goal.

I don’t remember needing a ruling so it might not matter as much to me. Being thankful for the event, I’d probably choose either whereas it’s easy to say the NTD would run a better event.

You’re assuming that NTDs stay current with the rulebook, and for the most part that’s true. There have been more than a few cases where NTDS made rulings that were wrong based on the current rulebook but right based on a previous rulebook, sometimes two or three editions ago.

I always designed tests to have a certain number of “C” problems, a certain number of “B” problems, and a certain number of “A” problems. If you couldn’t reliably do the “C” problems, you failed. To get an “A” you basically needed to get the “C” and “B” right and maybe half the “A”.

My impression was that the Senior TD exam had around 1/2 the MC problems that should have been slam dunks for someone taking the Local exam. There were some where the rule was obscure enough that it had to be looked up but the answer was clear. There were some where you had to pay careful attention to the details. And there were a few that were incredibly confusing. I assume those last ones were based upon actual case studies and, in practice, the “facts” can be far from clear. If you don’t botch the first three categories and get at least partial credit on the pairing question, you probably are pretty much at the passing score. Which seems reasonable.

I think I remember reading somewhere that most people do not pass the Senior TD exam the first time they take it. If that’s the case, are TD’s as a whole that incompetent or is there a problem with the test itself?

Or maybe both are true?

If it’s common to fail the first time but pass the second, why would that be an indictment of either the TD’s or the test?