One of the downsides of working on a pre-startup project is that you really can’t say much about it. Seriously. You think Cryptonomicon seemed paranoid about security? You’ve just never met the folks who safeguard possible IP for college spin-outs. Yowza. And it’s a shame because some of this stuff is really rather nifty, and it’s been good to not only do some high-level design of low-level stuff, but also to get back to implementing in C and for high-capacity stuff as well.
However, side projects are totally fair game 😀
At the moment, most of my side-project time has gone into a quick script for the rifle club in college. It has to read in a text file and do some basic statistics on the data therein. PHP would blaze through this in a web setting, but to my mind, PHP is out of its depth when not running on a webserver so I thought something else would be more appropriate. Perl is certainly up to the task, as is Ruby and I’ve been wanting to learn Ruby for a while, but some upcoming PhD stuff requires me to know Python, so I figured this would be a good starting point for it, so apt-get install python and away I went.
First, the problem. There’s a program my college rifle club uses called Kada, which tracks members names and details and scores and produces charts of the club’s “ladders”, a sort of running competition which tracks shooters’ performances over the year (there’s a small prize at the end). The problem isn’t Kada itself – it is pretty much a textbook case of how you develop a user interface, even if it is character-mode. Keith, who developed Kada, spent ages refining the user interface by actually using it and talking to other range officers in the club who used it as well and making changes and tweaking. The UI isn’t fancy and mousable, but it’s slick and efficient, and all the edge cases for using it are handled. That’s the kind of really solid work that takes time and effort to do and frankly, it’s always impressed me. Keith also did some work on a program to run competitions in the statistics office back in the days when we scored targets by hand. It also had a very slick and efficient UI but it also did some smart things – it let us project scores on a projector long before we had electronic targets or target scoring machines, and it also estimated what a shooter’s likely final score was as targets were scored.
The problem is that Keith’s not been actively developing Kada for a while and the current committee in the club wanted to make some slight changes to how the ladders were calculated. So I volunteered to take Kada’s data files, figure out how to extract the data entered with Kada and produce the ladder charts. The data files are text with a straightforward format, so reading in the data is simple enough. First read the member’s data file:
def readMembers(): membersfile = open("members.kda", "r") membersfile.readline() membersfile.readline() i = 1 firstname = ' ' while firstname: shooter = Shooter() # members.kda file record format: # surname # firstname # gender (m/f) # alias (not usually used) # student ID card seen? (y/n) # experience level (three letters, Novice/Experienced/Advanced, in Target, Air, Sporter) # course # year # unknown surname = membersfile.readline().rstrip() firstname = membersfile.readline().rstrip() shooter.name = firstname + ' ' + surname shooter.gender = membersfile.readline().rstrip() shooter.alias = membersfile.readline() shooter.idcardseen = membersfile.readline().rstrip() shooter.experience = list(membersfile.readline().rstrip()) shooter.course = membersfile.readline().rstrip() shooter.year = membersfile.readline().rstrip() membersfile.readline() member[i] = shooter i += 1 membersfile.close()
This just reads in the file and creates a dictionary of Shooter objects (they’re not really an object as such, I’m just emulating a C struct sort of idea there), which is actually a global in the script. We discard the initial two lines in the file (I don’t actually know what Keith stores there apart from the season – 2007/8 or whatever). The readline() approach is a bit awkward and not very Pythonic methinks.
Next, read in the scores file, and add the scores as a sequence to the member dictionary. Slightly complicated by the air and smallbore rifle scores being in the same file, and I haven’t yet figured out a clean way to use Python’s for line in file idiom to cover just a part of the file.
def readScores(): scoresfile = open("scores.g1", "r") x = False y = 3 for line in scoresfile: if y == 0: line = line.strip() if line: scores =  shooter = Shooter() for s in line.split(' '): scores.append(int(s)) id = scores if x: # Air Rifle scores member[id].scores_air = scores[1:] member[id].mean_air = stats.mean(scores[1:]) member[id].oldKADAannual_air = oldKADAannual(member[id].scores_air) member[id].oldKADA_air = oldKADA(member[id].scores_air) member[id].newKADAannual_air = newKADAannual(member[id].scores_air) member[id].newKADA_air = newKADA(member[id].scores_air) else: # Target Rifle scores member[id].scores_target = scores[1:] member[id].mean_target = stats.mean(scores[1:]) member[id].oldKADAannual_target = oldKADAannual(member[id].scores_target) member[id].oldKADA_target = oldKADA(member[id].scores_target) member[id].newKADAannual_target = newKADAannual(member[id].scores_target) member[id].newKADA_target = newKADA(member[id].scores_target) else: x = True else: y = y - 1 scoresfile.close()
That’s not horrible. It’s certainly less clumsy than the readline() approach when reading the members file.
So at this stage the data’s in. Turns out, calculating the ladder averages is exceptionally compact in Python. First, I wanted to duplicate Kada’s original algorithm:
def oldKADAannual(scores): """Calculates the shooter's ladder average for the whole year under the old KADA algorithm of dropping the lowest card out of every six shot """ tmpscores = sorted(scores) n = len(tmpscores)//6 tmpscores = tmpscores[n:] return stats.mean(tmpscores)
Four lines of code. I do love scripting languages 😀 Now for the new algorithm:
def newKADAannual(scores): """Calculates the shooter's ladder average for the whole year under the new KADA algorithm of dropping the lowest and highest cards out of every eight shot """ tmpscores = sorted(scores) n = len(tmpscores)//8 if n >= 1: tmpscores = tmpscores[n:-n] else: if len(tmpscores) == 7: tmpscores = tmpscores[1:] return stats.mean(tmpscores)
Again, very compact with only eight lines of code (and really, it’s hard to count some of them as actual lines as such 😀 ). There are two other variants of these for calculating the running averages rather than the end-of-year averages, but let’s ignore them. They’re equally short.
Calculating the ladders themselves is even shorter, at least on a per-ladder basis. There are 16 ladders all told: Target/Air x Novice/Experienced/Advanced/Overall x Running/Final. Looking at just the code for two (the others are basicly more of the same:
def calculateLadders(): # Novice Air Ladder for id, m in member.iteritems(): if hasattr(m,'scores_air'): if m.experience == 'N': NAL[m.newKADA_air] = id # Final Novice Air Ladder for id, m in member.iteritems(): if hasattr(m,'scores_air'): if m.experience == 'N': if len(m.scores_air) >= 3: NALannual[m.oldKADAannual_air] = id
Very straightforward really, the ladders are just dictionaries of id numbers with the keys being the actual ladder averages. That way, to print them out, you just sort the keys, pull the first key from that list and look up the id and that id is the key to your shooter object with all the data to play with. So that’s the calculation done. Now, the output.
Initially, I wanted some very basic output of the ladders, more for test reasons than anything else. Kada just creates plain ASCII text files which we dump to the printer like so:
NOVICE 10-METRE AIR RIFLE LADDER Final Ladder 9/05/08 CARDS BEST RANK NAME SHOT AVERAGE CARD 1. J.D'Plumber 26 86.864 92 2. J.D'Plumber 45 86.500 94 3. J.D'Plumber 18 85.067 92 4. J.D'Plumber 16 84.857 91 5. J.D'Plumber 36 81.300 91 6. J.D'Plumber 31 79.846 87 7. J.D'Plumber 7 74.000 82 8. J.D'Plumber 16 65.429 84 9. J.D'Plumber 3 64.000 71 10. J.D'Plumber 10 63.444 78 11. J.D'Plumber 4 61.750 76 12. J.D'Plumber 3 53.667 62
So I set up to do something similar. It’s not a perfect match, but I just wanted to be able to compare the old output and mine to know my math was good:
def printLadder(ladder, description, discipline): keys = ladder.keys() keys.sort() keys.reverse() i = 0 j = 0 print description for j in range (1, len(description)): print '-', print '\n' for key in keys: if discipline == 'air': scores = member[ladder[key]].scores_air else: scores = member[ladder[key]].scores_target if len(scores) >= 3: i = i + 1 print '%2d ' % i, else: print ' ', print '%20s ' % member[ladder[key]].name, print '%4d ' % len(scores), print '%5.3f ' % key, print '%4d ' % max(scores), if len(scores) < 3: print '*', print print '\n\n'
And the output (with the names redacted for their privacy):
Novice Air Ladder - - - - - - - - - - - - - - - - 1 Joe D'Plumber 45 91.833 94 2 Joe D'Plumber 36 87.500 91 3 Joe D'Plumber 18 87.167 92 4 Joe D'Plumber 16 85.833 91 5 Joe D'Plumber 26 85.167 92 6 Joe D'Plumber 31 81.167 87 Tito D'Builder 2 76.000 76 * 7 Joe D'Plumber 7 74.000 82 Tito D'Builder 2 69.000 75 * Tito D'Builder 2 67.000 68 * Tito D'Builder 2 66.000 70 * Tito D'Builder 1 64.000 64 * 8 Joe D'Plumber 10 63.167 78 Tito D'Builder 1 62.000 62 * 9 Joe D'Plumber 4 61.750 76 Tito D'Builder 2 61.500 72 * Tito D'Builder 1 60.000 60 * Tito D'Builder 2 55.500 71 * Tito D'Builder 2 55.000 58 * 10 Joe D'Plumber 3 53.667 62 Tito D'Builder 1 52.000 52 * Tito D'Builder 1 50.000 50 * Tito D'Builder 1 49.000 49 *
Which is unfancy and basic, but does the job. You’ll note the lists aren’t identical – in the Kada output, there’s only Joe D’Plumber, but mine has several entries from Tito D’Builder as well (thank the Daily Show for the names btw). The reason is that the official rules of the ladder say you must have at least 3 cards shot to enter; Tito hasn’t had three cards shot, so he’s listed on the chart (which is posted each week) to see where he is in relation to those who are entered, but with no rank and an asterisk to point out that his score isn’t official yet.
This isn’t bad, but the thing is that I wanted to be a bit fancier with the output. Enter the ReportLab library, which lets Python generate PDFs and some simple code to generate sparklines, and now the ladder prints to a PDF file, Tito is in a smaller, italic, gray font so that Joe stands out more, and Joe has sparklines of his scores showing his (hopefully) upward progress through the year and with his high point highlighted. The PDF isn’t quite as I’d like it yet, but I’ll post up a snapshot when I get it right.
The important thing about all this though, is that I never did any Python programming before last week; it’s taken the very limited free time I’ve had over five days (while under the gun for a major demo during the day) to learn Python and do something useful and a bit complicated with it (generating a PDF report with custom graphics and typography). All told, maybe eight to ten hours. I’m really quite impressed with Python – I didn’t think much of it prior to now because frankly I had hassles with how the indentation thing sat in my head, but the truth is that once you’ve started (and gotten frustrated, cursed a bit, googled vim customisations for python and installed the main two or three and restarted) it really does just go away and you see through it to the code itself. Which is clean, powerful, and very easy to read. So far I’ve not run into any real warts, though I will admit to having used an in-scope import because I couldn’t figure out how to use Python’s namespacing to instantiate an Image object from the Imaging library in the same script that I was using Image objects from the ReportLab library. And even the stranger idioms like ”.join() aren’t horrible. It’s quite a relief really because I was planning on using SAGE in my PhD stuff and this does indicate that it’ll be a lot easier than I was anticipating.
Next up, once this is done and made into an executable using py2exe, will be PyQt. I’ve bought this new toy for the RCMS project you see… but that’s another post 😀