Project Aardvark

Intern vs. Compiler

Tuesday, May 31, 2005 posted by Benjamin Pollack

OK, in Tyler’s last post, he hinted that there were some segfaults involved getting a certain GPL program to run properly in VS.NET 2003. I want to elaborate on what exactly happened, because it's happened enough hours ago now to be funny instead of humiliating. Note that this is going to get a bit technical and I probably will come off looking like an idiot, in which case all I can say is a) sorry and b) I’m still learning Windows and VisualStudio, so cut me some slack. :)

On Friday afternoon, having pushed as far ahead with the reflector as I could until we had some of the other stuff that needs it actually running, I downloaded the GPL program we’ll be extending, discovered that it had Visual C++ project files, and opened those in VS.NET. I checked out the README, which said things ought to build just fine on VisualStudio.NET 2003, so I thought it would be a piece of cake to get running. Bzzzt! I hit F5, the build starts, and then dies with my least favorite error message in the entire world:

“Internal Compiler Error”

Just in case you have never had the pleasure of getting this error message (and given that we’re currently getting a lot of Joel on Software readers here, that’s probably unlikely), it means that the compiler not only cannot handle whatever code you gave it, but cannot even give you a meaningful error message, because the compiler itself has a bug. Worse, I didn’t get this error in just one lone file; I got it in almost a hundred different files in the project. So, I began spelunking around the source code. After about 45 minutes of tweaking code, sometimes getting further, other times segfaulting the compiler rather spectacularly, I finally realized that all of the errors were coming from C++ header files. On a hunch, I went to the build settings disabled precompiled headers. I started the build cycle again, and, about 25 seconds later (I love these computers) had...missing symbol definitions. Well, fine, but at least missing symbol definitions I can handle. A quick check revealed that the missing symbol definitions happened because the projects were simply missing a dozen or so .cpp files. I quickly threw those into the project and rebuilt, and then got something that genuinely did not make any sense to me.

“Symbol deflateEnd@4 not found.”

The symbol I recognized; the project has a dependency on zlib, and deflateEnd looked like (and is) part of zlib’s API. Yet I could trivially verify that zlib was building properly and getting linked properly. The problem, I decided, was that deflateEnd@4 was clearly a mangled name (i.e., the name that was compiled into the library did not match its actual name), and the only time I’d seen names mangled like that was from C++ projects, so VS.NET must be compiling zlib as C, but the main program thinks it’s C++. (Experienced Windows programmers are now laughing maniacally; the rest of you can just follow along.) So I spent another 30 minutes trying to figure out why zlib’s extern “C” command in the header file wasn’t taking hold until I finally gave up and asked for Joel’s help. Joel comes over, identifies that this is not C++ name mangling, but rather Windows __fastcall mangling, and quickly generates a nice printout of the symbol table of zlib.lib where we verify that there are (surprise) only normal C names. Thankfully, I now had enough information to fix the problem. I changed zlib to be compiled with __fastcall, then (when that failed) as __stdcall, and finally, finally had something that compiled.

Well, almost. Turned out I needed to link against some parts of Windows that the project didn’t specify. But after that it compiled. Cleanly.

It just segfaulted immediately on launch.

At that point, I was too worn out to continue and decided to just leave the whole mess for later. I checked it into our Subversion repository with a note that it crashed instantly on startup and went home for the weekend.

So today, while we were trying to get VS.NET and friends installed on Michael and Tyler’s computers, Michael very kindly came over and started attacking the program with fresh eyes. He succeeded in recreating the full call stack at the time of the crash (which was actually surprisingly difficult due to threading issues), and with that boost I was able to determine that the crash was happening due to an instance variable getting trampled, which was happening due to a call to a string copy that seem to be overwriting most of a class’ data.

So Michael and I are sitting there, hunched over the monitor, trying and failing with idea after idea, when Tyler wanders over and asks, “Is this the same as what’s in Subversion?” I tell him yes. He goes over, opens the project, builds it, and hits the Debug button.

And it runs.

Perfectly.

We still have absolutely no idea what the problem was. I ended up blasting my entire source tree and checking out again from our repository, which ought to have been exactly the same broken code that I was working with before (I’m the one who checked it in, after all), but which now inexplicably works just fine. So now I’m finally able to hack on actually extending and modifying this stuff instead of waging a war with the compiler. I just feel annoyed, because I wasted several hours and I caused Michael to waste several hours when we could simply have switched to another machine. It wasn’t a complete waste of time—my initial project fixes were necessary (I double-checked), and I learned an awful lot about the VS.NET debugger—but I still feel like I could have gotten a lot more done if I’d just tried things on a different computer.

There’s probably a lesson to be learned here, but I’m not sure what it is, so instead I’ll just close by promising to keep you up-to-date if I discover my machine is possessed by demons. If it’s not, of course, you’ll have to make due with normal Project Aardvark updates. But here’s to hoping.