Okay, so I can’t really go into too much of the big picture since this is from the day job, but I can certainly tear into gethostbyname_r() a bit.
For part of what we’re doing, we’re sending a RADIUS message using the freeradius project’s radius client library. So, nice and simple (after you’ve done some setup):
result = rc_acct(rh, 0, send);
Easy enough, right? So it fails. Specifically, it segfaults and since it’s in a multithreaded server, it’s a pain to track down. And I mean a pain. Hours of fun with DDD, gdb, nana and finally printf() lead to here:
res = gethostbyname_r(hostname, &hostbuf, tmphostbuf, hostbuflen, &hp, &herr)
Ah. gethostbyname_r(), the glibc2-reentrant thread-safe version of gethostbyname(). Except that it’s deprecated, and has the unique property of working differently on just about every machine out there.
And of course, in my machine, and the server, it’s going nuts. Not because of a dodgy parameter or anything like that, though that took a while to confirm, it’s going nuts because of the way /etc/nsswitch.conf is written:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
That’s the standard way to write nsswitch.conf in ubuntu – it first checks the /etc/hosts file, then uses the avahi daemon, then the DNS system.
Only that’s not good enough for gethostbyname_r(). Grrr. So after a full day bug-chasing through two codebases, the fix is to change a configuration file to this:
#hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 hosts: dns files
What a waste of a day! Gah!
I’m going to rewrite the radius-client library to use getaddrinfo() over the next while. I already have to make some changes to it to cope with other things, I may as well help here too I guess. But for today, I console myself by printing out that page of source code and reaching for the matches…