Reported problem with May 2022 ratings files

A report has been received that importing the May 2022 ratings files into Excel is creating error messages and resulting in an incomplete file, missing the last several hundred recently created IDs.

This appears to be an issue with Excel.

Importing ratings data into Excel may no longer be possible because it appears the maximum number of rows in an Excel spreadsheet is 1,048,576 and there are 1,065,508 rows in the May 2022 gold master and 1,049,436 rows in the all ratings file. That probably accounts for the error messages.

There may be other programs that also have upper limits that we are exceeding. I am not aware of any upper limits in either WinTD or SwisSys, though.

I just loaded the May golden DB and WinTD 4.20 seems to go up to 30568340 (updated on 4/1) with an expiration of 4/30/2023. Looking at the website, the next ID 30568379 (a non-member) looks to have been created on 4/21 and that may have been just after the file build started.

Would it help to load the excel file only with IDs that have a rating and/or an expiration date other than 0000-00-00? That might avoid all those extra unwanted IDs.

That is correct. 1,048,576 is 2 to the 20th power.

Excel also has a limit on the number of columns. That limit is 16,384, which is 2 to the 14th power. The largest allowable column number is XFD. After using up A-Z (26 columns) and AA-ZZ (26-squared columns), you don’t quite make it to the end of AAA-ZZZ (26-cubed columns).

Before the 2007 version, Excel had a limit of 65536 rows rather than 1,048,576. So no matter how bad you think things are, they were even worse once upon a time.

Perhaps our (U.S. Chess) software might want to consider breaking the list into 2 or 3 lists. One could be for current and recently expired (within the last 10 years) person-members, i.e. those eligible to play in tournaments. Another list could be for everybody else – long-expired members, affiliates, duplicate IDs, temporary IDs, deceased, etc.

That way, the two lists could be loaded into two separate tabs on an Excel spreadsheet.

Bill Smythe

To confirm, 30568340 is the last ID in all 3 files for May 2022 (two golden masters and the all ratings file.)

I’m not sure what advantages there are to loading the ratings files into Excel (although Boyd tells me they do it at nationals, too), but we could consider splitting the file up somehow, such as geographically or perhaps by expiration date, with any ID that expired 10 or more years ago in a separate file.

For example, there are around 651,000 IDs that expired before 1/1/2012 and around 400,000 that expired after 1/1/2012. Only about 8000 of those have a non-active status (duplicate, deceased, etc.), so I don’t think splitting them off materially affects things and I would be inclined to just use the expiration date.

We do get some returning members after 10 years, usually less than 100 a month. Since July of 2020, when we switched membership systems, we’ve had around 1850 IDs that had lapsed prior to 1/1/2012 that now have expiration dates > 5/1/2020.

Affiliates aren’t in the Golden Master or all ratings files.

People like Excel. It can do a lot of things easily that are difficult or impossible otherwise.

Splitting the master file by expirations before and after 1/1/2012, as you suggested, seems like an excellent choice. With 600,000 older names and 400,000 newer ones, there is plenty of room for both to expand, and the (slightly) shorter file is where most will be found when preparing for a tournament.

Bill Smythe

It would have to be approved by the office, because there’s some development time as well as additional ongoing staff time to update the rating supplement indexes each month and additional disk space requirements.

Does it make more sense to bifurcate the All Ratings file, one of the existing Golden Master files or to create a third Golden Master, perhaps one with both the OTB and Online Regular ratings in it?

I did send a similar email to Boyd on this, but he’s in the throes of the Elementary this week, and then taking some well-earned vacation time, so I don’t expect a decision on this overnight.

WinTD takes the allratings file and converts it into DBF. DBF uses a 32 bit unsigned integer for the number of records (for a spec from 1985 when 16 bit processors were brand new, they were way ahead of their time), though we actually use a signed integer internally. Thus, the max is 2^(32)-1 so 4 billion or so.

On what’s basically a raw database?

Sometimes we go too far in avoiding “secret handshakes.”

Alex Relyea

As I recall, the import functions in Excel do a pretty good job loading both DBF and tab-delimited files.

Also comma-delimited. Also HTML.

Bill Smythe

If you use a signed 32-bit integer, aren’t you throwing away half the possible values, leaving you with “only” 2 billion or so?

Bill Smythe

It may have used a signed integer field to simplify and standardize what math libraries were needed.

You’re right. 2^(31)-1.

Yes, Excel can be very handy—I use it (or OpenOffice, which doesn’t have as many unnecessary bells and whistles) in some way almost every day. BUT, it can’t handle any longer the full USCF database (and will, by default, drop the most recent adds). So the question is what are people doing with Excel and the GDB, and is there some better way to provide something they could use for that? (I can see what the staff might do at, e.g. Nationals, since they can make a spreadsheet with a bunch of non-public information).

I use -1 to indicate that a record isn’t found.

This topic was automatically closed 730 days after the last reply. New replies are no longer allowed.