25.8.13

Type tagging and SFINAE in C++

While it may sound like an onomatopoeia for somebody sneezing, SFINAE is a C++ idiom, standing for 'Substitution Failure Is Not An Error'. The idea is that when instantiating a template, if more than one instantiation is viable, then any other instantiations which would cause an error are not considered. As long as there is one valid instantiation, that instantiation will be used. In other words, the fact that some substitutions may fail is not enough to cause a(n) compilation error. A quick example to demonstrate this—let's assume we have declared the following template which expects to operate on types which contain an embedded type called SomeType:
template<typename T>struct Example {

    typename T::SomeType t;
};

struct Ok { typedef int SomeType; };

Example<Ok> ok; // perfectly fine
It should be fairly uncontroversial to point out that is not going to work with native types, such as int:
Example<int> i; // not so fine
But, if we were to provide an int-compatible instantiation, then the presence of the default instantiation isn't going to interfere with the use of the overridden version:
template<>struct Example<int> {
    int t;
};
Example<int> i; // fine now, default Example template is no longer considered.
This selection process can be used to choose programmatically between different template instantiations, based on the presence or absence of an embedded type (i.e. a tag type) in a type declaration. The syntax used to define these kinds of template mechanisms can often be somewhat opaque, so I devised a mechanism which conveniently wraps the type detection mechanism into a single macro, called TYPE_CHECK(). An example usage would be something like this:
TYPE_CHECK(Test1Check, T, TypeToCheck,
    static const bool VALUE = true, // Test1Check body when T::TypeToCheck exists
    static const bool VALUE = false); // Test1Check body default
This defines a template type called Test1Check<T>, containing a boolean constant VALUE which is true for any T where T::TypeToCheck exists, or false if it doesn't, so, in the following example, we would see output of "0, 1" from printf():
struct TestingF {};
struct TestingT { typedef void TypeToCheck; };

printf("%u, %u\n",  // prints "0, 1"
    Test1Check<TestingF>::VALUE,
    Test1Check<TestingT>::VALUE);
TYPE_CHECK() takes 5 arguments: the first and second are the name of the check type (Test1Check), and the type parameter (usually but not necessarily T). The macro will expand into a template struct definition (template<typename T>struct Test1Check { /*...*/ }; in this case). The third parameter is the name of the type we want to test for (i.e. the presence or absence of T::TypeToCheck), and the fourth and fifth parameters represent the body of this struct (the /*...*/ part) if the test type is present, or the default in the case it's not present.
We could rewrite our initial Example given above as follows, although it should now work for any type without an embedded T::SomeType, and not just int:
TYPE_CHECK(Example, T, SomeType,
    typename T::SomeType t,
    T t);
You can also use TYPE_CHECK() to embed functions into the check type, so that your program can operate differently depending on if the test type is present or not. You can use this to implement some fairly primitive compile-time reflection mechanisms.
One additional refinement worth mentioning is that if you have a compiler which supports C99-style variadic macros, it's possible to parenthesize the fourth and fifth arguments, which is occasionally useful if they need to contain commas—an example of this is in the test code provided below.
There's one additional macro called TYPE_CHECK_FRIEND(). It takes the name of a check defined by TYPE_CHECK() and this can be placed inside the body of a type if you want to give the check access to the internals of a type. Again, there's an example of this in the test code.
The TYPE_CHECK() implementation lives in a single header file, nominally called "type_check.h", which can be copied from here. You should be able to just paste it to a local file and start using it. It contains the two macros outlined above, and a few implementation details (anything in the namespace tc_ or starting with a tc_ prefix), which you can ignore. If you're using a compiler which doesn't support variadic macros, you should #define TYPE_CHECK_NO_VA_ARGS before #including it.
A simple 'test suite' can be copied from here, which shows a few different ways that this kind of mechanism can be used. As far as I'm concerned, this code is public domain, so feel free to do whatever you'd like with it.

Addendum 8.3.14:

I just noticed that my source code (which was hosted on hastebin.com) is no longer available there, so I've pushed the files to Dropbox instead, where hopefully they will remain accessible for the forseeable future. It should make it easier for me to publish updates as well - which is for the best as running the code through ideone reveals an issue with the TYPE_CHECK_FRIEND() macro in GCC 4.8.1, but 4.3.2 seems happy enough with it.

17.6.13

On casting the result of malloc()

It my be that a good way to drive traffic to your (relatively) new blog would be to find a contentious but ultimately minor technical argument and take sides, and so without further ado:
uint8_t *p = (uint8_t*)malloc(n);
There's a large amount of debate around casting the result of malloc(), and we're going to examine whether or not it's necessary. (The short answer is that it isn't, but we will explore why in more detail.) There are three main scenarios in which a cast of malloc() could be used, either in C or in C++, or in what we'll refer to as "C/C++" (a misguided attempt to write in both languages at once).

In C

This is fairly straightforward - the C standard (as of C89, at least) specifies that any void pointer can be implicitly converted to any other pointer type, so any cast would be unnecessary, so therefore we shouldn't add unnecessary casts to our code because casts are bad, so we should write the following:
uint8_t *p = malloc(n);
We can go slightly further than this if we want to allocate an instance of a specific type, rather than a buffer of arbitrary size, and we should phrase the call to malloc() thus:
Type *p = malloc(sizeof(*p));
In this case, the compiler can calculate the size we want to allocate for the object from the dereferenced pointer type. Some people would have you phrase that as:
Type *p = (Type*)malloc(sizeof(Type));
Which manages to be ugly, repetitive and fragile, mentioning the name of Type three times, where once would suffice. We should not listen to these people.
Another argument against casting in C is that if you've neglected to #include <stdlib.h>, then you would get a warning about a cast from int to a pointer type. This would be due to the compiler assuming that malloc() returns an int as it hasn't seen a prototype. This is technically true, but I would think that if you've neglected to include system header files, you'd have to be very unlucky if the worst outcome was getting a single warning (i.e. you will most likely have larger problems). And it seems that recent versions of GCC will give you a warning ("incompatible implicit declaration of built-in function ‘malloc’") if <stdlib.h> is missing, whether you cast the result of malloc() or not.

In C++

The argument in C++ is also fairly straightforward—while implicit casts of void pointers are verboten, there is really no need to use malloc() at all in C++, where new[] exists, and is much more typesafe:
uint8_t *p = new uint8_t[n];
And in the case of allocating an instance of a type we could write something like this (which will also allow you to pass arguments to the Type constructor):
Type *p = new Type(a, b, c);
In some limited circumstances, you may want to allocate memory for an object in an an unusual way, but you can still use a placement new on a void pointer in this kind of situation:
void *p = memalign(64, sizeof(Type));
Type *t = new(p) Type(a, b, c); // no casting required

In "C/C++"

One remaining argument which might be raised is that you'd like to write code which can be compiled with both a C compiler and a C++ compiler (simultaneously, perhaps?). In this case, people will try to convince you that you'd need to use malloc() for C compatibility, and you'll need to cast its result for C++ compatibility, so in this specific case, you really have no choice but to write:
uint8_t *p = (uint8_t*)malloc(n);
And these people are wrong, for two reasons. Firstly, if I genuinely need code which compiles to both languages, I'm going to use the preprocessor so I can work with the union of the idioms of both languages, rather than the intersection:
#ifdef __cplusplus
#define MY_MALLOC(type_, size_) static_cast<type_>(malloc(size_)) // ...or even "new type_[size_]"
#else//__cplusplus
#define MY_MALLOC(type_, size_) malloc(size_)
#endif//__cplusplus

//later...
uint8_t *p = MY_MALLOC(uint8_t, n);
But (secondly) there are very few reasons to do this kind of thing anyway - if you have some C code, just compile it with a C compiler and link it against your C++ application, possibly with some judicious use of extern "C" here and there.
So, in summary, there are no situations where it is necessary to cast the result of malloc()—it is at best redundant, and at worst actively detrimental to your code's quality.

27.5.13

Colons in make targets

I learned something interesting about GNU make recently. It's possible to write rules for targets which contain colons (:). This doesn't work very well for filenames, even though Linux/UNIX filesystems could support it in theory—from the evidence on stackoverflow, it seems to break make's handing of dependencies internally.
But there is one potential situation where the colon could be of use, in pattern rules. Consider the following makefile1:
SOME_VAR:=some_value
OTHER_VAR=other_value

all: ; @echo "Just a vanilla rule to show the 'cut & paste'-friendly rule syntax."

show\:%: ; @echo $(@:show:%=%)="$($(@:show:%=%))"
This creates a target pattern show:%, where % operates as a wildcard. Notice that we need to escape the colon in the target's definition, as an unescaped colon would be interpreted as part of the rule's target: deps syntax. However, when it comes to making substitution references, a colon can be used without needing to be escaped, despite being part of the syntax. (In fact, the substitution will actually fail if the colon is escaped in this case—this is probably due to this being a syntactical edge-case.)
The formulation $(@:show:%=%) in the rule's recipe takes the name of the target (e.g. show:something) and strips off the initial show:, leaving the rest of the target name as a result (e.g. something). We can then use this value as we'd use any data in make—in this case, we're using it to show the value of a makefile variable, which could be useful when debugging makefiles, as the examples show:
$ make show:SOME_VAR
SOME_VAR=some_value

$ make show:OTHER_VAR
OTHER_VAR=other_value
So we can see that this show: pattern rule handles both flavours of make variable (i.e. = and :=). It can even be used to inspect some of make's built-in special variables:
$ make show:MAKEFILE_LIST
MAKEFILE_LIST= makefile

$ make show:.FEATURES
.FEATURES=target-specific order-only second-expansion else-if archives jobserver check-symlink

$ make show:.VARIABLES
.VARIABLES=<D ?F DESKTOP_SESSION CWEAVE ?D @D XAUTHORITY GDMSESSION CURDIR SHELL RM CO _ [...]

$ make show:DESKTOP_SESSION
DESKTOP_SESSION=ubuntu
So now we have a fairly natural-looking syntax for building make targets which take a single variable 'parameter'. I can see other uses for this: a rule to write a version number or string into a header file, for example.
There's a minor refactoring which could be made: if the repetition of $(@:show:%=%) in the rule is unacceptably offensive, we can hoist out the substitution logic into its own variable (which needs to be a recursive (=) flavour), although we then have to use $(patsubst) to make the substitution work:
_showtarget=$(patsubst show:%,%,$@)
show\:%: ; @echo $(_showtarget)="$($(_showtarget))"
One final note—when I say make above, I specifically mean GNU make v3.81.
$ make -v
GNU Make 3.81
This trick might work in older versions of GNU make (and it could possibly break in future versions, but hopefully with enough publicity, it won't). I doubt it will work with any other variant of make. (But I believe the first rule of Makefiles, so other versions of make are irrelevant to me.)


1 Note that I'm using the inline rule syntax to work around the 'tabs' issue—using a semicolon after a rule's target, so the recipe does not need to be indented. This should allow you to copy the code out of the blog and have it work as intended when it's pasted into a makefile.

21.5.13

Running with pointers I

(This is the first in an occasional series of explorations of some of the stranger areas of C++ syntax.)

Consider the following code. What does it output? Will it run without crashing?
#include <stdio.h>

struct TypeA {
    TypeA() { printf("::TypeA() {}\n"); }
    ~TypeA() { printf("::~TypeA() {}\n"); }
};

struct TypeB {
    TypeB() { printf("::TypeB() {}\n"); }
    ~TypeB() { printf("::~TypeB() {}\n"); }
};

int main() {
    TypeA *a = new TypeA;
    TypeB *b = new TypeB;

    printf(" %p %p\n", a, b);

    delete a, b;
    delete (a, b); // I really want these deleted.

    return 0;
}
You may be relieved to learn that the program does actually release its resources correctly. But the two delete lines should probably be rewritten and the utterly misleading comment removed. Both a and b are deleted once each, although the appearance of the comma operator in the delete expressions introduces some confusion. In the first delete, delete a is evaluated first, then the expression as a whole evaluates to b. On the next line, the expression (a, b) is evaluated, and then the result of that (b) is deleted.
The output you'll see is something like the following (pointer values may settle in transit; dramatization, do not attempt, etc):
$ ./a.out
::TypeA() {}
::TypeB() {}
 0x10a2010 0x10a2030
::~TypeA() {}
::~TypeB() {}
Slightly alarming is that the code compiles without a peep using GCC's default settings (on Ubuntu 12.04.2):
$ g++ main.cpp
[ compiler says nothing... ]
Although we do get some (slightly cryptic) warnings when compiling with -Wall:
$ g++ main.cpp -Wall
main.cpp: In function ‘int main()’:
main.cpp:19:16: warning: right operand of comma operator has no effect [-Wunused-value]
main.cpp:20:16: warning: left operand of comma operator has no effect [-Wunused-value]
So, uh, -Wunused-value is your friend, I guess... (If you compile this on a different compiler/OS, let me know what results you get.)