(I'm told guion bajo is the preferred name for the underscore sign _ in Castilian, but that would have made it harder to echo Por una cabeza. Then again, why the Spanish title? Because.)

(Also, this is going to be a very boring post, because it's mostly just a rant to let off some steam after a frustrating debug session.)


I'm getting into the bad habit of not trusting the compiler, especially when it comes to a specific compiler1. I'm not sure if there's a particular reason for that, other than —possibly— a particular dislike for its closed nature, or past unpleasant experiences in trying to make it work with the more recent versions of the host compiler(s).

Compilers have progressed enourmously in the latest years. I have a strong suspicion that this has been by and large merit of the (re)surgence of the Clang/LLVM family, and the strong pressure it has put the GCC developers under —with the consequent significant improvements on both sides.

However, compilers that need to somehow interact with these compilers (most famously the nvcc compiler developed by NVIDIA for CUDA) have a tendency to lag behind: you can't always the latest version of GCC (or Clang for the amtter) with them, and they themselves do not provide many of the benefits that developers have come to expect from modern compiler, especially in the fields of error and warning message quality and detail, or even in the nature of those same warnings and errors.


This rant is born out of a stressing and frustrating debugging session that has lasted for a few days, and that could have easily been avoided with better tools. What made the bug particularly frustrating was that it seemed to trigger or disappear in the most incoherent of circurmstances. Adding some conditional code (even code that would never run) or moving code around in assumingly idempotent transformations would be enough to make it appear, or disappear again, until the program was recompiled.

The most frustrating part was that, when the code seemed to work, it would seem to work correctly (or at least give credible results). When it seemed to not work, it would simply produce invalid values from thin air.

The symptoms, for anyone with some experience in the field, would be obviously: reading from unitialized memory —even if for some magic reason it seemed to work (when it worked) despite the massively parallel nature of the code and the hundreds of thousands of cycles it ran for.

The code in question is something like this:

struct A : B, C, D
{
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
    :
        B(index_, params),
        C(index_, pdata, params),
        D(r, params),
        relPos(relPos_),
        r(r_),
        mass(relPos.w),
        f(func(r, params))
    {}
};

Can you spot what's wrong with the code?

Spoiler Alert!

Here's the correct version of the code:

struct A : B, C, D
{
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
    :
        B(index_, params),
        C(index_, pdata, params),
        D(r_, params),
        relPos(relPos_),
        r(r_),
        mass(relPos.w),
        f(func(r, params))
    {}
};

The only difference, in case you're having trouble noticing, is that D is being initialized using r_ instead of r.

What's the difference? The object we're talking about, and initialization order. r is the member of our structure, r_ is the parameter we're passing to the constructor to initialize it. After the structure initialization is complete, they will hold the same value, but until r gets initialized (with the value r_), its content is undefined, and using it (instead of r_) will lead to undefined behavior; and D gets initialized before r, because it's one of the parent structures for the structure we want to initialize —and note that this would happen even if we put the initialization of r before the initialization of D, because initialization actually happens in the order the members (and parents) are declared, not in the order their initialization is expressed.

That single _ made me waste at least two days of work.

Now, this error is my fault —it's undoubtedly my fault, it's a clear example of PEBKAC. And yet, proper tooling would have caught it for me, and made it easier to debug.


  1. if you want to know, I'm talking about the nvcc compiler, i.e. the compiler the handles the single-source CUDA files for GPU programming. ↩