A surprising practical gadget: the finger mouse


A friend of mine has been doing for a while now a weekly reading on YouTube. Sometimes you can clearly see him holding a computer mouse in his hands, whose only purpose is to scroll the reading material.

Myself, I'm a big fan of webcomics, and find myself frequently reading material that is published online in long strip format, where each chapter or episode is a single continuous vertical strip. This format is geared towards “mobile” usage, designed to be viewed on a display in “portrait” orientation, but if you're willing to risk it on your laptop (and don't want to spend the money to get a convertible one that can be transformed in a tablet), you can simply flip the laptop on its size, reading it like a book. The worst downside I've found to this configuration is —possibly suprisingly— the input mechanism.

The solution to my long strip webcoming fruition issues and my friend's reading is the same: something that allow scrolling documents on the computer without the full encumbrance of a traditional mouse.

Enter the finger mouse

The finger mouse, or ring mouse, is an input device that is tied to a finger and typically operated with the other fingers (usually the thumb) of the same hand.

There are at least three forms of finger mice, that I've seen, that chiefly differ by how motion is handled: the trackball, the “nub”, the gyro and the optical.

Trackball finger mice follow the same mechanism as traditional desktop trackballs, and thus the reverse of the old-style mice with balls: you roll the ball with the thumb, and the motion of the ball is converted into planar motions (combinations of left/right and up/down).

Nub finger mice follow the same mechanism as the TrackPoint™ or pointing stick found on some laptop keyboards (most famously IBM/Lenovo): the nub is pushed around with the thumb, and again this converts to planar motions.

Gyroscopic mice use an internal gyroscope to convert hand motions into planar motions. This has the advantage of freeing up some estate on the rest of the device for more buttons.

Finally, optical finger mice work exactly like the usual modern mice, with a laser and optical sensor, the only difference being that instead of holding them with your hands, the pointing device is tied to a finger.

The search (and the finding)

While researching finger mice options (as a present for my friend and obviously for me), I've been held back by two things: pricing and size.

Size was a particularly surprising issue: most of the finger mice options I've seen appear to be unwieldy, some even resembling more dashboards that would require a full hand (other than the one holding it) to operate, than a practical single-handed input device.

Price was no joke either: with more modest pricing ranging between 25€ and 50€, and some options breaking the 100€ barrier or even approaching 200€, one would be led to ask: who is the intended target for these devices? Most definitely not amateurs like me or my friend, but I would be hard pressed to find a justification even at the professional level, except maybe for the lower-cost options if you spent your life doing presentations.

Ultimately, I did find a palatable solution in this (knock-off?) solution: it has everything I wanted (i.e. an easily acccessible scrollwheel) and the price (around 10€ plus shipping) was low enough to cover the worst case scenario. And this is its review.


First, the good news. I'm extremely favorably impressed by the device. It works, it does what I wanted it for, and it's in fact an exceptionally practical device. I mean, I'm not going to say it's good enough for gaming, but I did use it exactly for that too, in the end.

I'm not a pro gamer, most of the games I play are not particularly challenging and I'm generally not a fan of stuff that requires quick reflexes, and perfect timing. But, I do play puzzle-platform games and sometimes you do need pretty good control and timing for them. And I was able to achieve both with this device —definitely much more so with it than with my laptop's touchpad.

To wit, a couple of years ago I had abandoned The swapper shortly after starting it, because I came across a puzzle that had an obvious solution that I was unable to complete on my trackpad. Shortly after getting the new finger mouse, and using it to my enjoyment as no more than a scrollwheel for my weekly dose of long-strip webcomics, I decided to give it a go: let's see if we can finish that stupid puzzle; what's the worst that can happen?

In this case, the worst that happened was that I did manage to solve the puzzle, and many other puzzles after it, all while lying in bed with the laptop on my stomach, a hand on the keyboard (WASD) and the other, with the finger mouse, lying relaxed on the bed sheets. Until 2:30am.

So yes, it's accurate enough at least for casual gaming (I've also replayed Portal, and finally started Lugaru, which was unplayable on the touchpad) and what's more it works on surfaces where a standard mouse would have issues working, such as bed sheets and covers or the shirt or T-shirt you're wearing. Or the palmrests of your laptop, if you don't want to look too weird (but in that case you're not the kind of person that flips the laptop on its side to use it in portrait mode, so you have one less reason to enjoy this gadget).

The device runs on battery, with a single AAA battery. It has a physical switch to turn the power on and off, and from what I understand it goes into low power mode while not being in use too. And of course you can use recharable batteries in it without issues (it's what I'm using).

And it works out of the box (at least on my machine, running Linux).


The device isn't perfect.

It's wireless, which while practical may be an issue for security-conscious people (and possibly health fanatics too).

It does require a surface for use as a mouse (but of course not if you only care about the scrollwheel, which is my case for the most part), but it's not that big of an issue since, as mentioned, I've been able to use it even on surfaces where even standard optical mice are notoriously problematic (there are, however, surfaces on top of which the finger mouse has issues too).

It can take a bit to get used to it, and it feels wierd. The most comfortable way to use it is to tie it to the outside of the middle finger, resting the index finger on top of it, and leaving the thumb to control the buttons and scrollwheel. It's not particularly heavy, but not exceptionally light either (yet I suspect a large part of the weight actually comes from the battery, so if you can find an extra-light battery, that might fix the issue for you). I got used to it and it doesn't annoy me in the least, but I've read reviews of people that find it too weird, so this is most definitely subjective.

It ties to the finger with a strap; this allows freedom to regulate the tightness, but it may be difficult to find the optimal one: too tight, and the diminished circulation can make your finger go numb; not tight enough, and the wiggling will chafe your skin.

It's designed to be used with the right hand. This isn't a big problem for me, since I've always used mice with my right hand even though I'm left-handed, but it might be an issue for other people. It can be used with the left hand, and the most practical way I've found for is to tie it to the inside of the middle finger (so it's inside your hand, more similar to classic mice), but you'll need to flip the axis directions (both horizontal and vertical —and possibly the buttons too) unless you use it on your stomach.


The specific product I bought for myself is already not available anymore on the Amazon page, but several other similarly-priced variants are there. The product I have identifies with USB ID 062A:4010, registered to MosArt for a wireless Keyboard/Mouse combo (even though in this case there's only a mouse), and I've seen the same product ID used in several cheapo brands mouse and keyboard/mouse combos (Trust, RadioShack, etc). Products similar to mine, always from no-name brands and at similar (around 10€, sometimes less) prices, can also be found on both Amazon and other e-commerce sites. I don't know how closesly they match the products I've reviewed (aside from the branding), but given my package flew in almost directly from the factory in China, I'm going out on a limb and guess that for the most part they're all the same thing.

Ah, you want pictures too? There's a couple on my Twitter.

Por una subraya

(I'm told guion bajo is the preferred name for the underscore sign _ in Castilian, but that would have made it harder to echo Por una cabeza. Then again, why the Spanish title? Because.)

(Also, this is going to be a very boring post, because it's mostly just a rant to let off some steam after a frustrating debug session.)

I'm getting into the bad habit of not trusting the compiler, especially when it comes to a specific compiler1. I'm not sure if there's a particular reason for that, other than —possibly— a particular dislike for its closed nature, or past unpleasant experiences in trying to make it work with the more recent versions of the host compiler(s).

Compilers have progressed enourmously in the latest years. I have a strong suspicion that this has been by and large merit of the (re)surgence of the Clang/LLVM family, and the strong pressure it has put the GCC developers under —with the consequent significant improvements on both sides.

However, compilers that need to somehow interact with these compilers (most famously the nvcc compiler developed by NVIDIA for CUDA) have a tendency to lag behind: you can't always the latest version of GCC (or Clang for the amtter) with them, and they themselves do not provide many of the benefits that developers have come to expect from modern compiler, especially in the fields of error and warning message quality and detail, or even in the nature of those same warnings and errors.

This rant is born out of a stressing and frustrating debugging session that has lasted for a few days, and that could have easily been avoided with better tools. What made the bug particularly frustrating was that it seemed to trigger or disappear in the most incoherent of circurmstances. Adding some conditional code (even code that would never run) or moving code around in assumingly idempotent transformations would be enough to make it appear, or disappear again, until the program was recompiled.

The most frustrating part was that, when the code seemed to work, it would seem to work correctly (or at least give credible results). When it seemed to not work, it would simply produce invalid values from thin air.

The symptoms, for anyone with some experience in the field, would be obviously: reading from unitialized memory —even if for some magic reason it seemed to work (when it worked) despite the massively parallel nature of the code and the hundreds of thousands of cycles it ran for.

The code in question is something like this:

struct A : B, C, D
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
        B(index_, params),
        C(index_, pdata, params),
        D(r, params),
        f(func(r, params))

Can you spot what's wrong with the code?

Spoiler Alert!

Here's the correct version of the code:

struct A : B, C, D
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
        B(index_, params),
        C(index_, pdata, params),
        D(r_, params),
        f(func(r, params))

The only difference, in case you're having trouble noticing, is that D is being initialized using r_ instead of r.

What's the difference? The object we're talking about, and initialization order. r is the member of our structure, r_ is the parameter we're passing to the constructor to initialize it. After the structure initialization is complete, they will hold the same value, but until r gets initialized (with the value r_), its content is undefined, and using it (instead of r_) will lead to undefined behavior; and D gets initialized before r, because it's one of the parent structures for the structure we want to initialize —and note that this would happen even if we put the initialization of r before the initialization of D, because initialization actually happens in the order the members (and parents) are declared, not in the order their initialization is expressed.

That single _ made me waste at least two days of work.

Now, this error is my fault —it's undoubtedly my fault, it's a clear example of PEBKAC. And yet, proper tooling would have caught it for me, and made it easier to debug.

  1. if you want to know, I'm talking about the nvcc compiler, i.e. the compiler the handles the single-source CUDA files for GPU programming. ↩

10 digits

The question

How many digits do you need, in base 10, to represent a given (binary) number?

A premise

The C++ standard defines a trait for numerical datatypes that describes “the number of base-10 digits that can be represented by a given type without change”: std::numeric_limits::digits10.

What this means is that all numbers with at most that many digits in base 10 will be representable in the given type. For example, 8-bit integers can represent all numbers from 0 to 99, but not all numbers from 0 to 999, so their digit10 value will be 2.

For integer types, the value can be obtained by taking the number of bits (binary digits) used by the type, dividing by log2(10) (or multiplying by log10(2), which is the same thing), and taking the integer part of the results.

This works because with n bits you can represent 2n values, and with d digits you can represent 10d values, and the condition for digit10 is that d should be such that 10d2n. By taking the logarithm on both sides we get dlog10(2n)=nlog10(2), and since d must be an integer, we get the formula d=nlog10(2).

(Technically, this is still a bit of a simplification, since actually the highest representable number with n bits is 2n-1, and that's still only for unsigned types; for signed one things get more complicated, but that's beyond our scope here.)

The answer

What we want is in some sense the complement of digit10, since we want to ensure that our number of (decimal) digits will be sufficient to represent all numbers of the binary type. Following the same line of reasoning above, we want d such that 2n10d, and thus, skipping a few passages, d=nlog10(2), at least assuming unsigned integer types.

We're looking for the simplest formula that gives us the given result. With C++, we could actually just use digits10 plus one, but we want something independent, for example because we want to use this with C (or any other language that doesn't have a digits10 equivalent).

The first thing we want to do is avoid the logarithm. We could compute the actual value, or at least a value with sufficient precision, but in fact we'll avoid doing that, and instead remember that 210 is pretty close to 103, which puts the logarithm in question in the 310 ballpark, an approximation that is good enough for the first several powers of 210.

With this knowledge, we can approximate dn310. In most programming languages integer division with positive operands returns the floor rather than the ceiling, but it can be turned into something that returns the ceiling by adding to the numerator one less than the denominator1. So:


is the formula we're looking for. In a language like C, where the size of types is given in bytes, that would be come something like

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*3+9)/10)

where we're assuming 8 bits per byte (adapt as needed if you're on an insane architecture).


The C expression provided above isn't universal. It is better than the even more aggressive approximation sizeof(type)*CHAR_BIT/3, which for example fails for 8-bit bytes (gives 2 instead of 3) and overestimates the result for 64-bit data types (gives 21 instead of 20), but it's not universal.

It works for most standard signed data types, because the number of base-10 digits needed to represent them is almost always the same as their unsigned equivalents, but for example it doesn't work for 64-bit data types (the signed ones need one less digit in this case).

Moreover, it actually starts breaking down for very large integers, because the 310 approximation commits an error of about 10% which starts becoming significant at 256 bits or higher: the formula predicts 77 digits, but 78 are actually needed.

We can expand this by taking more digits to approximate the logarith. For example

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*301+999)/1000)

doesn't break down until 4096 bits, at which point it misses one digit again. On the other hand

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*30103+99999)/100000)

can get us reasonably high (in fact, by a quick check it seems this formula should work correctly even for types with 2216=265536 bits, if not more). It also has a nice symmetry to it, even though I guess it would overflow on machines with smaller word sizes (but then again, you probably wouldn't need it there anyway).

  1. If a,b are non-negative integers with b>0, then a+b-1b=ab: (1) if a is a multiple of b, then adding b-1 doesn't go to the next multiple, and thus on both sides we have ab (which is an integer) and (2) if a is not a multiple of b, adding b-1 will overtake exactly one multiple of b. More formally, we can write a=kb+c where k,c are non-negative integers, and c<b (c=0 if a is a multiple, and c>0 otherwise). Define s=sign(c), i.e. s=0 if c=0 and s=1 otherwise. Then a+b-1b=kb+c+b-1b=(k+1)+c-1b=k+1-(1-s)=k+s and ab=kb+cb=k+cb=k+s. ↩

Oblomov permalink
Mixed DPI and the X Window System

I'm writing this article because I'm getting tired of repeating the same concepts every time someone makes misinformed statements about the (lack of) support for mixed-DPI configurations in X11. It is my hope that anybody looking for information on the subject may be directed here, to get the facts about the actual possibilities offered by the protocol, avoiding the biased misinformation available from other sources.

If you only care about “how to do it”, jump straight to The RANDR way, otherwise read along.

So, what are we talking about?

The X Window System

The X Window System (frequently shortened to X11 or even just X), is a system to create and manage graphical user interfaces. It handles both the creation and rendering of graphical elements inside specific subregions of the screen (windows), and the interaction with input devices (such as keyboards and mice).

It's built around a protocol by means of which programs (clients) tell another program (the server, that controls the actual display) what to put on the screen, and conversely by means of which the server can inform the client about all the necessary information concerning both the display and the input devices.

The protocol in question has evolved over time, and reached version 11 in 1987. While the core protocol hasn't introduced any backwards-incompatible changes in the last 30 years (hence the name X11 used to refer to the X Window System), its extensible design has allowed it to keep abreast of technological progress thanks to the introduction and standardization of a number of extensions, that have effectively become part of the subsequent revisions of the protocol (the last one being X11R7.7, released in 2012; the next, X11R7.8, following more a “rolling release” model).


Bitmapped visual surfaces (monitor displays, printed sheets of paper, images projected on a wall) have a certain resolution density, i.e. a certa number of dots or pixels per unit of length: dots per inch (DPI) or pixel per inch (PPI) is a common way to measure it. The reciprocal of the the DPI is usually called “dot pitch”, and refers to the distance between adjacent dots (or pixels). This is usually measured in millimeters, so conversion between DPI and dot pitch is obtained with

DPI   = pitch/25.4
pitch = 25.4/DPI

(there being 25.4 millimeters to the inch).

When it comes to graphics, knowing the DPI of the output is essential to ensure consistent rendering (for example, a drawing program may have a “100% zoom” option where the user might expect a 10cm line to take 10cm on screen), but when it comes to graphical interface elements (text in messages and labels, sizes of buttons and other widgets) the information itself may not be sufficient: usage of the surface should ideally also be taken into consideration.

To this end, the concept of reference pixel was introduced in CSS, representing the pixel of an “ideal” display with a resolution of exactly 96 DPI (dot pitch of around 0.26mm) viewed from a distance of 28 inches (71cm). The reference pixel thus becomes the umpteenth unit of (typographical) length, with exactly 4 reference pixels every 3 typographical points.

Effectively, this allows the definition of a device pixel ratio, as the ratio of device pixels to reference pixels, taking into account the device resolution (DPI) and its assumed distance from the observer (for example, a typical wall-projected image has a much lower DPI than a typical monitor, but is also viewed from much further away, so that the device pixel ratio can be assumed to be the same).

Mixed DPI

A mixed-DPI configuration is a setup where the same display server controls multiple monitors, each with a different DPI.

For example, my current laptop has a built-in 15.6" display (physical dimensions in millimeters: 346×194) with a resolution of 3200×1800 pixels, and a pixel density of about 235 DPI —for all intents and purposes, this is a HiDPI monitor, with slightly higher density than Apple's Retina display brand. I frequently use it together with a 19" external monitor (physical dimensions in millimeters: 408×255) with a resolution of 1440×900 pixels and a pixel density of about 90 DPI —absolutely normal, maybe even somewhat on the lower side.

The massive difference in pixel density between the two monitors can lead to extremely inconsistent appearance of graphical user interfaces that do not take it into consideration: if they render assuming the standard (reference) DPI, elements will appear reasonably sized on the external monitor, but extremely small on the built-in monitor; conversely, if they double the pixel sizing of all interface elements, they will appear properly sized on the built-in monitor, but oversized on the external one.

Proper support for such configuration requires all graphical and textual elements to take a number of pixel which depends on the monitor it is being drawn on. The question is: is this possible with X11?

And the answer is yes. But let's see how this happens in details.

A brief history of X11 and its support for multiple monitors

The origins: the X Screen

An interesting aspect of X11 is that it was designed in a period where the quality and characteristics of bitmap displays (monitors) was much less consistent than it is today. The core protocol thus provides a significant amount of information for the monitors it controls: the resolution, the physical size, the allowed color depth(s), the available color palettes, etc.

A single server could make use of multiple monitors (referred to as “X Screen”s), and each of them could have wildly different characteristics (for example: one could be a high-resolution monochrome display, the other could be a lower-resolution color display). Due to the possible inconsistency between monitors, the classical support for multiple monitors in X did not allow windows to be moved from one X Screen to another. (How would the server render a window created to use a certain kind of visual on a different display that didn't support it?)

It should be noted that while the server itself didn't natively support moving windows across X Screens, clients could be aware of the availability of multiple displays, and they could allow (by their own means) the user to “send” a window to a different display (effectively destroying it, and recreating it with matching content, but taking into account the different characteristics of the other display).

A parenthetical: the client, the server and the toolkit

Multiple X Screen support being dependent on the client, rather than the server, is actually a common leit motif in X11: due to one of its founding principles (“mechanism, not policy”), a lot of X11 features are limited only by how much the clients are aware of them and can make use of them. So, something may be allowed by the protocol, but certain sets of applications don't make use of the functionality.

This is particularly relevant today, when very few applications actually communicate with the X server directly, preferring to rely on an intermediate toolkit library that handles all the nasty little details of communicating with the display server (and possibly even display servers of different nature, not just X11) according to the higher-level “wishes” of the application (“put a window with this size and this content somewhere on the screen”).

The upside of this is that when the toolkit gains support for a certain feature, all applications using it can rely (sometimes automatically) on this. The downside is that if the toolkit removes support for certain features or configurations, suddenly all applications using it stop supporting them too. We'll see some example of this specifically about DPI in this article.

Towards a more modern multi-monitor support: the Xinerama extension

In 1998, an extension to the core X11 protocol was devised to integrate multiple displays seamlessly, making them appear as a single X Screen, and thus allowing windows to freely move between them.

This extension (Xinerama) had some requirements (most importantly, all displays had to support the same visuals), but for the most part they could be heterogeneous.

An important downside of the Xinerama extension is that while it provides information about the resolution (in pixels) and relative position (in pixels!) of the displays, it doesn't reveal any information about the physical characteristics of the displays.

This is an important difference with respect to the classic “separate X Screens” approach: the classic method allowed clients to compute the monitor DPI (as both the resolution and the physical size were provided), but this is not possible in Xinerama.

As a consequence, DPI-aware applications were actually irremediably broken on servers that only supported this extension, unless all the outputs had the same (or similar enough) DPI.

Modern multi-monitor in X11: the XRANDR extension

Xinerama had a number of limitations (the lack of physical information about the monitors being just one of many), and it was essentially superseded by the RANDR (Resize and Rotate) extension when the latter reached version 1.2 in 2007.

Point of interest for our discussion, the RANDR extension took into consideration both the resolution and physical size of the display even when originally proposed in 2001. And even today that it has grown in scope and functionality, it provides all necessary information for each connected, enabled display.

The RANDR caveat

One of the main aspects of the RANDR extension is that each display is essentially a “viewport” on a virtual framebuffer. This virtual framebuffer is the one reported as “X Screen” via the core protocol, even though it doesn't necessarily match any physical screen (not even when a single physical screen is available!).

This gives great flexibility on how to combine monitors (including overlaps, cloning, etc); the hidden cost is that all of the physical information that the core protocol would report about the virtual backend to its X Screen become essentially meaningless.

For this reason, when the RANDR extension is enabled, the core protocol will synthetize ficticious physical dimensions for its X Screen, from the overall framebuffer size, assuming a “reference” pixel density of 96 DPI.

When using a single display covering the whole framebuffer, this leads to a discrepancy between the physical information provided by the core protocol, and the one reported by the RANDR extension. Luckily, the solution for this is trivial, as the RANDR extension allows changing the ficticious dimensions of the X Screen to any value (for example, by using commands such as xrandr --dpi eDP-1, to tell the X server to match the core protocol DPI information to that of the eDP-1 output).

Mixed DPI in X11

Ultimately, X11, as a display protocol, has almost always had support for mixed DPI configurations. With the possible exception of the short period between the introduction of Xinerama and the maturity of the RANDR extension, the server has always been able to provide its clients with all the necessary information to adapt their rendering, window by window, widget by widget, based on the physical characteristics of the outputs in use.

Whether or not this information is being used correctly by clients, however, it's an entirely different matter.

The core way

If you like the old ways, you can manage your mixed DPI setup the classic way, by using separate X Screens for each monitor.

The only thing to be aware of is that if your server is recent enough (and supports the RANDR extension), then by default the core protocol will report a DPI of 96, as discussed here. This can be worked around by calling xrandr as appropriate during the server initialization.

Of course, whether or not applications will use the provided DPI information, X Screen by X Screen, is again entirely up the application. For applications that do not query the X server about DPI information (e.g. all applications using GTK+3, due to this regression), the Xft.dpi resource can be set appropriately for each X Screen.

The RANDR way

On a modern X server with RANDR enabled and monitors with (very) different DPIs merged in a single framebuffer, well-written applications and toolkits can leverage the information provided by the RANDR extension to get the DPI information for each output, and use this to change the font and widget rendering depending on window location.

(This will still result in poor rendering when a window spans multiple montiors, but if you can live with a 2-inch bezel in the middle of your window, you can probably survive misrendering due to poor choice of device pixel ratios.)

The good news is that all applications using the Qt toolkit can do this more or less automatically, provided they use a recent enough version (5.6 at least, 5.9 recommended). Correctly designed Applications can request this behavior from the toolkit on their own (QApplication::setAttribute(Qt::AA_EnableHighDpiScaling);), but the interesting thing is that the user can ask this to be enabled even for legacy applications, by setting the environment variable QT_AUTO_SCREEN_SCALE_FACTOR=1.

(The caveat is that the scaling factor for each monitor is determined from the ratio between the device pixel ratio of the monitor and the device pixel ratio of the primary monitor. So make sure that the DPI reported by the core protocol (which is used as base reference) matches the DPI of your primary monitor —or override the default DPI used by Qt applications by setting the QT_FONT_DPI environment variable appropriately.)

The downside is that outside of Qt, not many applications and tookits have this level of DPI-awareness, and the other major toolkit (GTK+) seems to have no intention to acquire it.

A possible workaround

If you're stuck with poorly written toolkits and applications, RANDR still offers a clumsy workaround: you can level out the heterogeneity in DPI across monitors by pushing your lower-DPI displays to a higher virtual resolution than their native one, and then scaling this down. Combined with appropriate settings to change the DPI reported by the core protocol, or the appropriate Screen resources or other settings, this may lead to a more consistent experience.

For example, I could set my external 1440×900 monitor to “scale down” from a virtual 2880×1800 resolution (xrandr --output DP-1 --scale-from 2880x1800), which would bring its virtual DPI more on par with that of my HiDPI laptop monitor. The cost is a somewhat poorer image overall, due to the combined up/downscaling, but it's a workable workaround for poorly written applications.

(If you think this idea is a bit stupid, shed a tear for the future of the display servers: this same mechanism is essentially how Wayland compositors —Wayland being the purported future replacement for X— cope with mixed-DPI setups.)

Final words

Just remember, if you have a mixed DPI setup and it's not properly supported in X, this is not an X11 limitation: it's the toolkit's (or the application's) fault. Check what the server knows about your setup and ask yourself why your programs don't make use of that information.

If you're a developer, follow Qt's example and patch your toolkit or application of choice to properly support mixed DPI via RANDR. If you're a user, ask for this to be implemented, or consider switching to better applications with proper mixed DPI support.

The capability is there, let's make proper use of it.

A small update

There's a proof of concept patchset that introduces mixed-DPI support for GTK+ under X11. It doesn't implement all of the ideas I mentioned above (in particular, there's no Xft.dpi support to override the DPI reported by core), but it works reasonably well on pure GTK+ applications (more so than in applications that have their own toolkit abstraction layer, such Firefox, Chromium, LibreOffice).

Cross-make selection

I was recently presented, as a mere onlooker, with the potential differences that exist in the syntax of a Makefile for anything non-trivial, when using different implementations of make.

(For the uninitiated, a Makefile is essentially a list of recipes that are automatically followed to build some targets from given dependencies, and are usually used to describe how to compile a program. Different implementations of make, the program that reads the Makefiles and runs the recipes, exist; and the issue is that for anything beyond the simplest of declarations and recipe structure, the syntax they support is different, and incompatible.)

Used as I was to using GNU make and its extensive set of functions and conditionals and predefined macros and rules, I rarely bothered looking into alternatives, except maybe for completely different build systems or meta-build-systems (the infamous GNU autotools, cmake, etc). However, being presented with the fact that even simple text transformations could not be done in the same way across the two major implementations of make (GNU and BSD) piqued my curiosity, and I set off to convert the rather simple (but still GNU-dependent) Makefile of my clinfo project to make it work at least in both GNU and BSD make.

Get the code for


Since clinfo is a rather simple program, its Makefile is very simple too:

  1. it defines the path under which the main source file can be found;
  2. it defines a list of header files, on which the main source file depends;
  3. it detects the operating system used for the compilation;
  4. it selects libraries to be passed to the linker to produce the final executable (LDLIBS), based on the operating system.

The last two points are necessary because:

  • under Linux, but not under any other operating system, the dl library is needed too;
  • under Darwin, linking to OpenCL is done using -framework OpenCL, whereas under any other operating system, this is achieved with a simpler -lOpenCL (provided the library is found in the path).

In all this, the GNU-specific things used in the Makefile were:

  1. the use of the wildcard function to find the header files;
  2. the use of the shell function to find the operating system;
  3. the use of the ifeq/else/endif conditionals to decide which flags to add to the LDLIBS.

Avoiding wildcard

In my case, the first GNUism is easily avoided by enumerating the header files explicitly: this has the underside that if a new header file is ever added to the project, I should remember to add it myself.

(An alternative approach would be to use some form of automatic dependency list generation, such as the -MM flag supported by most current compiles; however, this was deemed overkill for my case.)

(A third option, assuming a recent enough GNU make, is presented below.)

Avoiding shell

BSD make supports something similar to GNU make's shell function by means of the special != assignment operator. The good news is that GNU make has added support for the same assignment operator since version 4 (introduced in late 2013). This offers an alternative solution for wildcard as well: assigning the output of ls to a variable, using !=.

If you want to support versions of GNU make older than 4, though, you're out of luck: there is no trivial way to assign the output of a shell invocation to a Makefile variable that works on both GNU and BSD make (let alone when strict POSIX compliance is required).

If (and only if) the assignments can be done ‘before’ any other assignment is done, it is however possible to put them into a GNUmakefile (using GNU's syntax) and makefile (using BSD's syntax), and then have both of these include the shared part of the code. This works because GNU make will look for GNUmakefile first.

In my case, the only call to shell I had was a $(shell uname -s) to get the name of the operating system. The interesting thing in this case is that BSD make actually defines its own OS variable holding just what I was looking for.

My solution was therefore to add a GNUmakefile which defined OS using the shell invocation, and then include the same Makefile which is parsed directly by BSD make.

Conditional content for variables

Now comes the interesting part: we want the content of a variable (LDLIBS in our case) to be set based on the content of another variable (OS in our case).

There are actually two things that we want to do:

  1. (the simple one) add something to the content of LDLIBS only if OS has a specific value;
  2. (the difficult one) add something to the content of LDLIBS only if OS does not have a specific value.

Both of these would be rather trivial if we had conditional statements, but while both BSD and GNU make do have them, their syntax is completely incompatible. We therefore have to resort to a different approach, one that leverages features present in both implementations.

In this case, we're going to use the fact that when using a variable, you can use another variable to decide the name of the variable to use: whenever make comes across the syntax $(foo) (or ${foo}), it replaces it with the content of the foo variable. The interesting thing is that this holds even within another set of $() or ${}, so that if foo = bar and bar = quuz, then $(${foo}) expands to $(bar) and thus ultimately to quuz.

Add something to a variable only when another variable has a specific value

This possibility actually allows us to solve the ‘simple’ conditional problem, with something like:

LDLIBS_Darwin = -framework OpenCL
LDLIBS_Linux  = -ldl

Now, if OS = Darwin, LDLIBS will get extended by appending the value of LDLIBS_Darwin; if OS = Linux, LDLIBS gets extended by appending the value of LDLIBS_Linux, and otherwise it gets extended by appending the value of LDLIBS_, which is not defined, and thus empty.

This allows us to achieve exactly what we want: add specific values to a variable only when another variable has a specific value.

Add something to a variable only when another variable does not have a specific value

The ‘variable content as part of the variable name’ trick cannot be employed as-is for the complementary action, which is adding something only when the content of the control variable is not some specific value (in our case, adding -lOpenCL when OS is not Darwin).

We could actually use the same trick if the Makefile syntax allowed something like a -= operator to ‘remove’ things from the content of a variable (interestingly, the vim scripting and configuration language does have such an operator). Since the operator is missing, though, we'll have to work around it, and to achieve this we will use the possibility (shared by both GNU and BSD make) to manipulate the content of variables during expansion.

Variable content manipulation is another field where the syntax accepted by the various implementations differs wildly, but there is a small subset which is actually supported by most of them (even beyond GNU and BSD): the suffix substitution operator.

The idea is that often you want to do something like enumerate all your source files in a variable sources = file1.c file2.c file3.c etc and then you want to have a variable with all the object files that need to be linked, that just happen to be the same, with the .c suffix replaced by .o: in both GNU and BSD make (and not just them), this can be achieved by doing objs = $(sources:.c=.o). The best part of this is that the strings to be replaced, and the replacement, can be taken from the expansion of a variable!

We can then combine all this knowledge into our ‘hack’: always include the value we want to selectively exclude, and then remove it by ‘suffix’ substitution, where the suffix to be replaced is defined by a variable-expanded variable name: a horrible, yet effective, hack:

LDLIBS_not_Darwin = -lOpenCL
LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

This works because when OS = Darwin, the substitution argument will be $(LDLIBS_not_Darwin) which in turn expands to -lOpenCL, so that in the end the value assigned to LDLIBS will be ${LDLIBS:-lOpenCL=}, which is LDLIBS with -lOpenCL replaced by the empty string. For all other values of OS, we'll have ${LDLIBS:=} which just happens to be the same as ${LDLIBS}, and thus LDLIBS will not be changed1

Cross-make selection

We can then combine both previous ideas:


LDLIBS_Darwin = -framework OpenCL
LDLIBS_not_Darwin = -lOpenCL
LDLIBS_Linux  = -ldl

LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

And there we go: LDLIBS will be -framework OpenCL on Darwin, -lOpenCL -ldl on Linux, -lOpenCL on any other platform, regardless of wether GNU or BSD make are being used.

Despite the somewhat hackish nature of this approach (especially for the ‘exclusion’ case), I actually like it, for two reasons.

The first is, obviously, portability. Not requiring a specific incarnation of make is at the very least an act of courtesy. Being able to do without writing two separate, mostly duplicate, Makefiles is even better.

But there's another reason why I like the approach: even though the variable-in-variable syntax isn't exactly the most pleasurable to read, the intermediate variable names end up having a nice, self-explanatory name that gives a nice logical structure to the whole thing.

That being said, working around this kind of portability issues can make a developer better appreciate the need for more portable build systems, despite the heavier onus in terms of dependencies. Of course, for a smaller projects, deploying something as massive as autotools or cmake would still be ridiculous overkill: so to anyone that prefers leaner (if more fragile) options, I offer this set of solutions, in the hope that they'll help stimulate convergence.

  1. technically, we will replace the unexpanded value of LDLIBS with its expanded value; the implications of this are subtle, and a bit out of scope for this article. As long as this is kept as the 'last' change to LDLIBS, everything should be fine. ↩

Poi poi, mai mai

Note: this article makes use of MathML, the standard XML markup for math formulas. Sadly, this is not properly supported on some allegedly ‘modern’ and ‘feature-rich’ browsers. If the formulas don't make sense in your browser, consider reporting the issue to the respective developers and/or switching to a standard-compliant browser.

For the last three decades or so, non-integer numbers have been represented on computers following (predominantly) the floating-point standard known as IEEE-754.

The basic idea is that each number can be written in what is also known as engineering or scientific notation, such as 2.34567×1089, where the 2.34567 part is known as the mantissa or significand, 10 is the base and 89 is the exponent. Of course, on computers 2 is more typically used as base, and the mantissa and exponent are written in binary.

Following the IEEE-754 standard, a floating-point number is encoded using the most significant bit as sign (with 0 indicating a positive number and 1 indicating a negative number), followed by some bits encoding the exponent (in biased representation), and the rest of the bits to encode the fractional part of the mantissa (the leading digit of the mantissa is assumed to be 1, except for denormals in which case it's assumed 0, and is thus always implicit).

The biased representation for the exponent is used for a number of reasons, but the one I care about here is that it allows “special cases”. Specifically, the encoded value of 0 is used to indicate the number 0 (when the mantissa is also set to 0) and denormals (which I will not discuss here). An exponent with all bits set to 1, on the other hand, is used to represent (when the mantissa is set to 0) and special values called “Not-a-Number” (or NaN for short).

The ability of the IEEE-754 standard to describe such special values (infinities and NaN) is one of its most powerful features, although often not appreciated by programmers. Infinity is extremely useful to properly handle functions with special values (such as the trigonometric tangent, or even division of a non-zero value by zero), whereas NaNs are useful to indicate that somewhere an invalid operation was attempted (such as dividing zero by zero, or taking the square root of a negative number).

Consider now the proverb “later means never”. The Italian proverb with the same meaning (that is, procrastination is often an excuse to not do things ever) is slightly different, and it takes a variety of forms («il poi è parente del mai», «poi è parente di mai», «poi poi è parente di mai mai») which basically translate to “later is a relative of never”.

What is interesting is that if we were to define “later” and “never” as “moments in time”, and assign numerical values to it, we could associate “later” with infinity (we are procrastinating, after all), while “never”, which cannot actually be a “moment in time” (it is never, after all) would be … not a number.

(Actually, it's also possible to consider “later” as being indefinite in time, and thus not a (specific) number, and “never” having an infinite value. Or to have both later and never be not numbers. But that's fine, it still works!)

So as it happens, both later and never can be represented in the IEEE-754 floating-point standard, and they share the special exponent that marks non-finite numbers.

Later, it would seem, is indeed a relative of never.

Warp shuffles, or why OpenCL should expose low-level interfaces

Since OpenCL 2.0, the OpenCL C device programming language includes a set of work-group parallel reduction and scan built-in functions. These functions allow developers to execute local reductions and scans for the most common operations (addition, minimum and maximum), and allow vendors to implement them very efficiently using hardware intrinsics that are not normally exposed in OpenCL C.

In this article I aim at challenging the idea that exposing such high-level functions, but not the lower-level intrinsics on which their efficient implementation might rely, results in lower flexibility and less efficient OpenCL programs, and is ultimately detrimental to the quality of the standard itself.

While the arguments I will propose will be focused specifically on the parallel reduction and scans offered by OpenCL C since OpenCL 2.0, the fundamental idea applies in a much more general context: it is more important for a language or library to provide the building blocks on which to build certain high-level features than to expose the high-level features themselves (hiding the underlying building blocks).

For example, the same kind of argument would apply to a language or library that aimed at providing support for Interval Analysis (IA). A fundamental computational aspect which is required for proper IA support is directed rounding: just exposing directed rounding would be enough to allow efficient (custom) implementations of IA, and also allow other numerical feats (as discussed here); conversely, while it's possible to provide support for IA without exposing the underlying required directed rounding features, doing so results in an inefficient, inflexible standard1.

The case against high-level reduction operations

To clarify, I'm not actually against the presence of high-level reduction and scan functions in OpenCL. They are definitely a very practical and useful set of functions, with the potential of very efficient implementations by vendors —in fact, more efficient than any programmer may achieve, not just because they can be tuned (by the vendor) for the specific hardware, but also because they can in fact be implemented making use of hardware capabilities that are not exposed in the standard nor via extensions.

The problem is that the set of available functions is very limited (and must be so), and as soon as a developer needs a reduction or scan function that is even slightly different from the ones offered by the language, it suddenly becomes impossible for such a reduction or scan to be implemented with the same efficiency of the built-in ones, simply because the underlying hardware capabilities (necessary for the optimal implementation) are not available to the developer.

Thrust and Kahan summation

Interesting enough, I've hit a similar issue while working on a different code base, which makes use of CUDA rather than OpenCL, and for which we rely on the thrust library for the most common reduction operations.

The thrust library is a C++ template library that provides efficient CUDA implementations of a variety of common parallel programming paradigms, and is flexible enough to allow such paradigms to make use of user-defined operators, allowing for example reductions and scans with operators other than summation, minimum and maximum. Despite this flexibility, however, even the thrust library cannot move (easily) beyond stateless reduction operators, so that, for example, one cannot trivially implement a parallel reduction with Kahan summation using only the high-level features offered by thrust.

Of course, this is not a problem per se, since ultimately thrust just compiles to plain CUDA code, and it is possible to write such code by hand, thus achieving a Kahan summation parallel reduction, as efficiently as the developer's prowess allows. (And since CUDA exposes most if not all hardware intrinsics, such a hand-made implementation can in fact be as efficient as possible on any given CUDA-capable hardware.)

Local parallel reductions in OpenCL 2.0

The situation in OpenCL is sadly much worse, and not so much due to the lack of a high-level library such as thrust (to which end one may consider the Bolt library instead), but because the language itself is missing the fundamental building blocks to produce the most efficient reductions: and while it does offer built-ins for the most common operations, anything beyond that must be implemented by hand, and cannot be implemented as efficiently as the hardware allows.

One could be led to think that (at least for something like my specific use case) it would be “sufficient” to provide more built-ins for a wider range of reduction operations, but such an approach would be completely missing the point: there will always be variations of reductions that are not provided by the language, and such a variation will always be inefficient.

Implementor laziness

There is also another point to consider, and it has to do with the sad state of the OpenCL ecosystem. Developers that want to use OpenCL for their software, be it in academia, gaming, medicine or any industry, must face the reality of the quality of existing OpenCL implementations. And while for custom solutions one can focus on a specific vendor, and in fact choose the one with the best implementations, software vendors have to deal with the idiosyncrasies of all OpenCL implementations, and the best they can expect is for their customers to be up to date with the latest drivers.

What this implies in this context is that developers cannot, in fact, rely on high-level functions being implemented efficiently, nor can they sit idle waiting for the vendors to provide more efficient implementations: more often than not, developers will find themselves working around the limitations of this and that implementation, rewriting code that should be reduced to one liners in order to provide custom, faster implementations.

This is already the case for some functions such as the asynchronous work-group memory copies (from/to global/local memory), which are dramatically inefficient on some vendor implementations, so that developers are more likely to write their own loading functions instead, which generally end up being just as efficient as the built-ins on the platforms where such built-ins are properly implemented, and much faster on the lazy platforms.

Therefore, can we actually expect vendors to really implement the work-group reduction and scan operations as efficiently as their hardware allows? I doubt it. However, while for the memory copies an efficient workaround was offered by simple loads, such a workaround is impossible in OpenCL 2.0, since the building blocks of the efficient work-group reductions are missing.

Warp shuffles: the work-group reduction building block

Before version 2.0 of the standard, OpenCL offered only one way to allow work-items within a work-group to exchange informations: local memory. The feature reflected the capability of GPUs when the standard was first proposed, and could be trivially emulated on other hardware by making use of global memory (generally resulting in a performance hit).

With version 2.0, OpenCL exposes a new set of functions that allow data exchange between work-items in a work-group, which doesn't (necessarily) depend on local memory: such functions are the work-group vote functions, and the work-group reduction and scan functions. These functions can be implemented via local memory, but most modern hardware can implement them using lower-level intrinsics that do not depend on local memory at all, or only depend on local memory in smaller amounts than would be needed by a hand-coded implementation.

On GPUs, work-groups are executed in what are called warps or wave-fronts, and most modern GPUs can in fact exchange data between work-items in the same warp using specific shuffle intrinsics (which have nothing to do with the OpenCL C shuffle function): these intrinsics allow work-items to access the private registers of other work-items in the same warp. While warps in the same work-group still have to communicate using local memory, a simple reduction algorithm can thus be implemented using warp shuffle instructions and only requiring one word of local memory per warp, rather than one per work-item, which can lead to better hardware utilization (e.g. by allowing more work-groups per compute unit thanks to the reduced use of local memory).

Warp shuffle instructions are available on NVIDIA GPUs with compute capability 3.0 or higher, as well as on AMD GPUs since Graphics Core Next. Additionally, vectorizing CPU platforms such as Intel's can trivially implement them in the form of vector component swizzling. Finally, all other hardware can still emulate them via local memory (which in turn might be inefficiently emulated via global memory, but still): and as inefficient as such an emulation might be, it still would scarcely be worse than hand-coded use of local memory (which would still be a fall-back option to available to developers).

In practice, this means that all OpenCL hardware can implement work-group shuffle instructions (some more efficiently than others), and parallel reductions of any kind could be implemented through work-group shuffles, achieving much better performance than standard local-memory reductions on hardware supporting work-group shuffles in hardware, while not being less efficient than local-memory reductions where shuffles would be emulated.


Finally, it should be obvious now that the choice of exposing work-group reduction and scan functions, but not work-group shuffle functions in OpenCL 2.0 results in a crippled standard:

  • it does not represent the actual capabilities of current massively parallel computational hardware, let alone the hardware we may expect in the future;
  • it effectively prevents efficient implementation of reductions and scans beyond the elementary ones (simple summation, minimum and maximum);
  • to top it all, we can scarcely expect such high-level functions to be implemented efficiently, making them effectively useless.

The obvious solution would be to provide work-group shuffle instructions at the language level. This could in fact be a core feature, since it can be supported on all hardware, just like local memory, and the device could be queries to determine if the instructions are supported in hardware or emulated (pretty much like devices can be queried to determine if local memory is physical or emulated).

Optionally, it would be nice to have some introspection to allow the developer to programmatically find the warp size (i.e. work-item concurrency granularity) used for the kernel2, and potentially improve on the use of the instructions by limiting the strides used in the shuffles.

  1. since IA intrinsically depends on directed rounding, even if support for IA was provided without explicitly exposing directed rounding, it would in fact still be possible to emulate directed rounding of scalar operations by operating on interval types and then discarding the unneeded parts of the computation; of course, this would be dramatically inefficient. ↩

  2. in practice, the existing CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE kernel property that can be programmatically queried corresponds already to the warp/wave-front size on GPUs, so there might be no need for another property if it could be guaranteed that this is the work-item dispatch granularity. ↩

Colorize your man

The terminal, as powerful as it might be, has a not undeserved fame of being boring. Boring white (or some other, fixed, color) on boring black (or some other, fixed, color) for everything. Yet displays nowadays are capable of showing millions of colors, and have been able to display at least four since the eighties at least. There's a resurgence of “colorization” options for the terminal, from the shell prompt to the multiplexers (screen, tmux), from the output of commands such as ls to the syntax highlighting options of editors and pagers. A lot of modern programs will even try to use colors in their output right from the start, making it easier to tell apart semantically different parts of it.

One of the last strongholds of the boring white-on-black (or conversely) terminal displays is man, the manual page reader. Man pages constitute the backbone of technical documentation in Unix-like systems, and range from the description of the syntax and behaviour of command-line programs to the details of system calls and programming interfaces of libraries, passing through a description of the syntax of configuration files, and whatever else one might feel like documenting for ease of access.

The problem is, man pages are boring. They usually have all the same structures, with sections that follow a common convention both in the naming and in the sequence (NAME, SYNOPSIS, DESCRIPTION, SEE ALSO, etc.), and they are all boringly black and white, with a sprinkle of bold and italics/underline.

There's to be said that bold and underline don't really “cut it” in console: undoubtedly, things would stand out more if colors were used to highlight relevant parts of the manual pages (headings, examples, code, etc) rather than simply bold and underline or italics.

Thanks to the power and flexibility of the pagers used to actually visualize those man pages, a number of people have come out with simple tricks that colorize pages by just reinterpreting bold/italics/underline commands as colors. In fact, there's a pager (most) that does this by default. Of course, most is otherwise inferior in many ways to the more common less pager, so there are solutions to do the same trick (color replacement) with less. Both solutions, as well as a number of other tricks based on the same principle, are pretty well documented in a number of places, and can be found summarized on the Arch wiki page on man pages.

I must say I'm not too big a fan of this approach: while it has the huge advantage of being very generic (in fact, maybe a little too generic), it has a hackish feeling, which had me look for a cleaner, lower level approach: making man itself (or rather the groff typesetter it uses) colorize its output.

Colorizing man pages with *roff

The approach I'm going to present will only work if man uses (a recent enough version of) groff that actually supports colors. Also, the approach is limited to specific markup. It can be extended, but doing so robustly is non-trivial.

We will essentially do three things:

  • tell groff to look for (additional) macros in specific directories;
  • override some typical man page markup to include colors;
  • tell groff (or rather grotty, the terminal post-processor of groff) to enable support for SGR escapes.

Extending the groff search path

By default, groff will look for macro packages in a lot of places, among which the user's home directory. Since cluttering home isn't nice, we will create a ~/groff directory and put out overrides in there, but we also need to tell groff to look there, which is done by setting the GROFF_TMAC_PATH environment variables. So I have in my ~/.profile the following lines:


(Remember to source ~/.profile if you want to test the benefits of your override in your live sessions.)

Overriding the man page markup

The groff macro package used to typeset man pages includes an arbitrary man.local file that can be used to override definitions. For example, in Debian this is used to do some character substitutions based on whether UTF-8 is enabled or not, and it's found under /etc/groff. We will write our own man.local, and place it under ~/groff instead, to override the markup we want to colorize.

Sadly, most of the markup in man pages is presentational rather than semantic: things are explicitly typeset in bold/italic/regular, rather than as parameter/option/code/whatever. There are a few exceptions, most notably the .SH command to typeset section headers. So in this example we will only override .SH to set section headers to green, leaving the rest of the man pages as-is.

Instead of re-defining .SH from scratch, we will simply expand it by adding stuff around the original definition. This can be achieved with the following lines (put them in your ~/groff/man.local):

.rn SH SHorg
.de SH
. gcolor green
. SHorg \\$*
. gcolor

The code above renames .SH to .SHorg, and then defines a new .SH command that:

  1. sets the color to green;
  2. calls .SHorg (i.e. the original .SH), passing all the arguments over to it;
  3. resets the color to whatever it was before.

The exact same approach can be used to colorize the second-level section header macro, .SS; just repeat the same code with a general replacement of H to S, and tune the color to your liking.

Another semantic markup that is rather easy to override, even though it's only rarely used in actual man pages (possibly because it's a GNU extension), is the .UR/.UE pair of commands to typeset URLs, and its counterpart .MT/.ME pair of commands to typeset email addresses. Both work by storing the email address as the variable \m1, so all we need is to override its definition before it's actually used, in the second element of the pair; for example, if we want to typeset both URLs and email addresses in cyan, we would use:

.rn ME MEorg
.de ME
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. MEorg \\$*

.rn UE UEorg
.de UE
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. UEorg \\$*

(Keep in mind that I'm not a groff expert, so there might be better ways to achieve these overrides.)

Enabling SGR escapes

The more recent versions of grotty (the groff post-processor for terminal output) uses ANSI (SGR) escape codes for formatting, supporting both colors and emboldening, italicizing and underlining. On some distributions (Debian, for example), this is disabled by default, and must be enabled with some not-well-document method (e.g. exporting specific environment variables).

Since we already have various overrides in our ~/groff/man.local, we can restore the default behavior (enabling SGR escapes by default, unless the environment variable GROFF_NO_SGR is set) with the lines:

.if '\V[GROFF_NO_SGR]'' \
.   output x X tty: sgr 1

Of course, you should also make sure the pager used by man supports SGR escape sequences, for example by making your pager be less (which is likely to be the default already, if available) and telling it to interpret SGR sequences (e.g. by setting the environment variable LESS to include the -R option).

Limitations and future work

That's it. Now section headers, emails and URLs will come out typeset in color, provided man pages are written using semantic markup.

It is also possible to override the non-semantic markup that is used everywhere else, such as all the macros that combine or alternate B, I and R to mark options, parameters, arguments and types. This would definitely make pages more colored, but whether or not they will actually come out decently is all to be seen.

A much harder thing to achieve is the override of commands that explicitly set the font (e.g. *roff escape sequences such as \fB, often used inline in the code). But at this point the question becomes: is it worth the effort?

Wouldn't it be better to start a work on the cleanup and extension of the man macro package for groff to include (and use!) more semantic markup, with built-in colorization support?

Bonus track: italics

If your terminal emulator truly supports italics (honestly, a lot of modern terminals do, except possibly the non-graphical consoles), you can configure grotty to output instructions for italics instead of the usual behavior of replacing italics with underline. This is achieved by passing the -i option to grotty. Since grotty is rarely (if ever) called directly, one would usually pass -P-i option to groff.

This can be achieved in man by editing your ~/.manpath file and adding the following two lines:

DEFINE  troff   groff -mandoc -P-i
DEFINE  nroff   groff -mandoc -P-i

And voilà, italicized rather than underlined italics.

R.I.P. Opera

Opera is dead. I decided to give it some time, see how things developed (my first article on the topic was from over two years ago, and the more recent one about the switch to Blink was from February last year), but it's quite obvious that the Opera browser some of us knew and loved is dead for good.

For me, the finishing blow was a comment from an Opera employee, in response to my complaints about the regressions in standard support when Opera switched from Presto to Blink as rendering engine:

You may have lost a handful of things, but on the other hand you have gained a lot of other things that are not in Presto. The things you have gained are likely more useful in more situations as well.

This was shotly after followed by:

Whether you want to use Opera or not is entirely up to you. I merely pointed out that for the lost standards support, you are gaining a lot of other things (and those things are likely to be more useful in most cases).

Other things? What other things?

But it gets even better. I'm obviously not the only one complainig about the direction the new Opera has taken. One Patata Johnson comments:

There used to be a time when Opera fought for open standards and against Microsoft's monopol with it's IE. Am I the only one who us concerned about their new path? Today Google / Chrome became the new IE, using own standards and not carrying about open web standards that much.

The reply?

Opera is in a much better position to promote open standards with Blink than with Presto. It's kind of hard to influence the world when the engine is basically being ignored.

Really? How does being another skin over Blink help promote open standards? It helps promote Blink experimental features, lack of standard compliance, and buggy implementation of the standards it does support. That does as much to promote open standards as the Trident skins did during the 90s browser wars.

As small as Opera's market share was before the switch, its rendering engine was independent and precisely because of that it could be used to push the others into actually fixing their bugs and supporting the standards. It might have been ignored by the run-of-the-mill web developers, but it was actually useful in promoting standard compliance by being a benchmark against which other rendering engines were compared. The first thing that gets asked when someone reports a rendering issue is: how does it behave in the other rendering engines? If there are no other rendering engines, bugs in the dominant one become the de facto standard, against the open standard of the specification.

With the switch to Blink, Opera has even lost that role. As minor a voice as it might have been, it has now gone completely silent.

And let's be serious: the rendering engine it uses might not be ignored now (it's not their own, anyway), but I doubt that Opera has actually gained anything in terms of user base, and thus weight. If anything, I'm seeing quite a few former supporters switching away. Honestly, I suspect Opera's survival is much more in danger now than it was before the switch.

The truth is, the new Opera stands for nothing that the old Opera stood for: the old Opera stood for open standards, compliance, and a feature-rich, highly-customizable Internet suite. The new one is anything but that.

At the very least, for people that miss the other qualities that made Opera worthwile (among which the complete, highly customizable user interface, and the quite complete Internet suite capabilities, including mail, news, RSS, IRC and BitTorrent suport) there's now the open-source Otter browser coming along. It's still WebKit-based), so it won't really help break the development of a web monoculture, but it will at least offer a more reliable fallback to those loving the old Opera and looking for an alterntive to switch to from the new one.

For my part, I will keep using the latest available Presto version of Opera for as long as possible. In the mean time, Firefox has shown to have most complete support for current open standards, so it's likely to become my next browser of choice. I will miss Opera's UI, but maybe Otter will also support Gecko as rendering engine, and I might be able to get the best of both world.

We'll see.

Amethyst: essential statistics in Ruby

Get the code for
Amethyst, a Ruby library/script for essential statistics:


Amethyst is a small Ruby library/script to extract some essential statistics (mean, median, mode, midpoint and range, quartiles) from series of (numerical) data.

While it can be used as a library from other Ruby programs, its possibly most interesting use is as a command line filter: it can read the data series to be analyzed from its standard input (one datum per line), and it produces the relevant statistics on its standard output. Typical usage would be something like:

$ produce_some_data | amethyst

For example, statistics on the number of lines of the source files for one of my ongoing creative works at the time of writing:

$ wc -l oppure/ganka/ganka[0-9]* | head -n -1 | amethyst
# count: 43
# min: 48
# max: 274
# mid: 161
# range: 226

# mean: 102.3953488372093
# stddev: 42.266122304343874

# mode(s): 48 51 59 75 79 86 93 102 104

# median: 97
# quartiles: 79 97 110
# IQR: 31

When acting as a filter, Amethyst will check if its standard output has been redirected/piped to another program, in which case, by default, it will also produce commands that can be fed to gnuplot to produce a visual representation of the distribution of the dataset, including a histogram and a box plot:

$ produce_some_data | amethyst | gnuplot -p

Command line options such as --[no-]histogram and --[no-]boxplot can be used to override the default choices on what to plot (if anything), and options such as --dumb can be used to let gnuplot output a textual approximation of the plot(s) on the terminal itself.

Integer math and computers

One would assume that doing integer math with computers would be easy. After all, integer math is, in some sense, the “simplest” form of math: as Kronecker said:

Die ganzen Zahlen hat der liebe Gott gemacht, alles andere is Menschenwerk

The dear God has made integers, everything else is the work of man

While in practice this is (almost) always the case, introducing the extent (and particularly the limitations) to which integer math is easy (or, in fact, ‘doable’) is the first necessary step to understanding, later in this series of articles, some of the limitations of fixed-point math.

We start from the basics: since we are assuming a binary computer, we know that n bits can represent 2^n distinct values. So an 8-bit byte can represent 256 distinct values, a 16-bit word can represent 65536 distinct values, a 32-bit word can represent 4,294,967,296, and a 64-bit word a whooping 18,446,744,073,709,551,616, over 18 (short) trillion. Of course the question now is: which ones?

Representation of unsigned integers

Let's consider a standard 8-bit byte. The most obvious and natural interpretation of a byte (i.e. 8 consecutive bits) is to interpret it as a (non-negative, or unsigned) integer, just like we would interpret a sequence of consecutive (decimal) digits. So binary 00000000 would be 0, binary 00000001 would be (decimal) 1, binary 00000010 would be (decimal) 2 binary 00000011 would be (decimal) 3 and so on, up to (binary) 11111111 which would be (decimal) 255. From 0 to 255 inclusive, that's exactly the 256 values that can be represented by a byte (read as an unsigned integer).

Unsigned integers can be trivially promoted to wider words (e.g. from 8-bit byte to 16-bit word, preserving the numerical value) by padding with zeroes.

This is so simple that it's practically boring. Why are we even going through this? Because things are not that simple once you move beyond unsigned integers. But before we do that, I would like to point out that things aren't that simple even if we're just sticking to non-negative integers. In terms of representation of the numbers, we're pretty cozy: n bits can represent all non-negative integers from 0 to 2^n-1, but what happens when you start doing actual math on them?

Modulo and saturation

Let's stick to just addition and multiplication at first, which are the simplest and best defined operations on integers. Of course, the trouble is that if you are adding or multiplying two numbers between 0 and 255, the result might be bigger than 255. For example, you might need to do 100 + 200, or 128*2, or even just 255+1, and the result is not representable in an 8-bit byte. In general, if you are operating on n-bits numbers, the result might not be representable in n bits.

So what does the computer do when this kind of overflow happens? Most programmers will now chime in and say: well duh, it wraps! If you're doing 255+1, you will just get 0 as a result. If you're doing 128*2, you'll just get 0. If you're doing 100+200 you'll just get 44.

While this answer is not wrong, it's not right either.

Yes, it's true that the most common central processing units we're used to nowadays use modular arithmetic, so that operations that would overflow n-bits words are simply computed modulo 2^n (which is easy to implement, since it just means discarding higher bits, optionally using some specific flag to denote that a carry got lost along the way).

However, this is not the only possibility. For example, specialized DSP (Digital Signal Processing) hardware normally operates with saturation arithmetic: overflowing values are clamped to the maximum representable value. 255+1 gives 255. 128*2 gives 255. 100+200 gives 255.

Programmers used to the standard modular arithmetic can find saturation arithmetic ‘odd’ or ‘irrational’ or ‘misbehaving’. In particular, in saturation arithmetic (algebraic) addition is not associative, and multiplication does not distribute over (algebraic) addition.

Sticking to our 8-bit case, for example, with saturation arithmetic (100 + 200) - 100 results in 255 - 100 = 155, while 100 + (200 - 100) results in 100 + 100 = 200, which is the correct result. Similarly, still with saturation arithmetic, (200*2) - (100*2) results in 255 - 200 = 55, while (200 - 100)*2 results in 100*2 = 200. By contrast, with modular arithmetic, both expressions in each case give the correct result.

So, when the final result is representable, modular arithmetic gives the correct result in the case of a static sequence of operations. However, when the final result is not representable, saturation arithmetic returns values that are closer to the correct one than modular arithmetic: 300 is clamped to 255, in contrast to the severely underestimated 44.

Being as close as possible to the correct results is an extremely important property not just for the final result, but also for intermediate results, particularly in the cases where the sequence of operations is not static, but depends on the magnitude of the values (for example, software implementations of low- or high-pass filters).

In these applications (of which DSP, be it audio, video or image processing, is probably the most important one) both modular and saturation arithmetic might give the wrong result, but the modular result will usually be significantly worse than that obtained by saturation. For example, modular arithmetic might miscompute a frequency of 300Hz as 44Hz instead of 255Hz, and with a threshold of 100Hz this would lead to attenuation of a signal that should have passed unchanged, or conversely. Amplifying an audio signal beyond the representable values could result in silence with modular arithmetic, but it will just produce the loudest possible sound with saturation.

We mentioned that promotion of unsigned values to wider data types is trivial. What about demotion? For example, knowing that your original values are stored as 8-bit bytes and that the final result has to be again stored as an 8-bit byte, a programmer might consider operating with 16-bit (or wider) words to (try and) prevent overflow during computations. However, when the final result has to be demoted again to an 8-bit byte, a choice has to be made, again: should we just discard the higher bits (which is what modular arithmetic does), or return the highest representable value when any higher bits are set (which is what saturation arithmetic does)? Again, this is a choice for which there is no “correct” answer, but only answers that depend on the application.

To conclude, the behavior that programmers used to standard modular arithmetic might find ‘wrong’ is actually preferable in some applications (which is why it is has been supported in hardware in the multimedia and vector extensions (MMX and onwards) of the x86 architecture).

Thou shalt not overflow

Of course, the real problem in the examples presented in the previous section is that the data type used (e.g. 8-bit unsigned integers) was unable to represent intermediate or final results.

One of the most important things programmers should consider, maybe the most important, when discussing doing math on the computer, is precisely choosing the correct data type.

For integers, this means choosing a data type that can represent correctly not only the starting values and the final results, but also the intermediate values. If your data fits in 8 bits, then you want to use at least 16 bits. If it fits in 16 bits (but not 8), then you want to use at least 32, and so on.

Having a good understanding of the possible behaviors in case of overflow is extremely important to write robust code, but the main point is that you should not overflow.

Relative numbers: welcome to hell

In case you are still of the opinion that integer math is easy, don't worry. We still haven't gotten into the best part, which is how to deal with relative numbers, or, as the layman would call them, signed integers.

As we mentioned above, the ‘natural’ interpretation of n bits is to read them as natural, non-negative, unsigned integers, ranging from 0 to 2^n-1. However, let's be honest here, non-negative integers are pretty limiting. We would at least like to have the possibility to also specify negative numbers. And here the fun starts.

Although there is no official universal standard for the representation of relative numbers (signed integers) on computers, there is undoubtedly a dominating convention, which is the one programmers are nowadays used to: two's complement. However, this is just one of many (no less than four) possible representations:

  • sign bit and mantissa;
  • ones' complement;
  • two's complement;
  • offset binary aka biased representation.

Symmetry, zeroes and self-negatives

One of the issues with the representation of signed integers in binary computers is that binary words can always represent an even number of values, but a symmetrical amount of positive and negative integers, plus the value 0, is odd. Hence, when choosing the representation, one has to choose between either:

  • having one (usually negative) non-zero number with no representable opposite, or
  • having two representations of the value zero (essentially, positive and negative zero).

Of the four signed number representations enumerated above, the sign bit and ones' complement representations have a signed zero, but each non-zero number has a representable opposite, while two's complement and bias only have one value for zero, but have at least one non-zero number that has no representable opposite. (Offset binary is actually very generic and can have significant asymmetries in the ranges of representable numbers.)

Having a negative zero

The biggest issue with having a negative zero is that it violates a commonly held assumption, which is that there is a bijective correspondence between representable numerical values and their representation, since both positive and negative 0 have the same numerical value (0) but have distinct bit patterns.

Where this presents the biggest issue is in the comparison of two words. When comparing words for equality, we are now posed a conundrum: should they be compared by their value, or should they be compared by their representation? If a = -0, would a satisfy a == 0? Would it satisfy a < 0? Would it satisfy both? The obvious answer would be that +0 and -0 should compare equal (and just that), but how do you tell them apart then? Is it even worth it being able to tell them apart?

And finally, is the symmetry worth the lost of a representable value? (2^n bit patterns, but two of them have the same value, so e.g. with 8-bit bytes we have 256 patterns to represent 255 values instead of the usual 256.)

Having non-symmetric opposites

On the other hand, if we want to keep the bijectivity between value and representation, we will lose the symmetry of negation. This means, in particular, that knowing that a number a satisfies a < 0 we cannot deduce that -a > 0, or conversely, depending on whether the value with no opposite is positive or negative.

Consider for example the case of the standard two's complement representation in the case of 8-bit bytes: the largest representable positive value is 127, while the largest (in magnitude) representable negative value is -128. When computing opposites, all values between -127 and 127 have their opposite (which is the one we would expect algebraically), but negating -128 gives (again) -128 which, while algebraically wrong, is at least consistent with modular arithmetic, where adding -128 and -128 actually gives 0.

A brief exposition of the representations

Let's now see the representations in some more detail.

Sign bit and mantissa representation

The conceptually simplest approach to represent signed integers, given a fixed number of digits, is to reserve one bit to indicate the sign, and leave the other n-1 bits to indicate the mantissa i.e magnitude i.e. absolute value of the number. By convention, the sign bit is usually taken to be the most significant bit, and (again by convention) it is taken as 0 to indicate a positive number and 1 to indicate a negative number.

With this representations, two opposite values have the same representation except for the most significant bit. So, for example, assuming our usual 8-bit byte, 1 would be represented as 00000001, while -1 would be represented as 10000001.

In this representation, the highest positive value that can be represented with n bits is 2^{n-1} - 1, and the lowest (largest in magnitude) negative value that can be represented is its opposite. For example, with an 8-bit byte the largest positive integer is 127, i.e. 01111111, and the largest (in magnitude) negative integer is its opposite -127, i.e. 11111111.

As mentioned, one of the undersides of this representation is that it has both positive and negative zero, respectively represented by the 00000000 and 10000000 bit patterns.

While the sign bit and mantissa representation is conceptually obvious, its hardware implementation is more cumbersome that it might seem at first hand, since operations need to explicitly take the operands' signs into account. Similarly, sign-extension (for example, promoting an 8-bit byte to a 16-bit word preserving the numerical value) needs to ‘clear up’ the sign bit in the smaller-size representation before replicating it as the sign bit of the larger-size representation.

Ones' complement representation

A more efficient approach is offered by ones' complement representation, where negation maps to ones' complement, i.e. bit-flipping: the opposite of any given number is obtained as the bitwise NOT operation of the representation of the original value. For example, with 8-bit bytes, the value 1 is as usual represented as 00000001, while -1 is represented as 11111110.

The range of representable numbers is the same as in the sign bit and mantissa representation, so that, for example, 8-bit bytes range from -127 (10000000) to 127 (01111111), and we have both positive zero (00000000) and negative zero (11111111).

(Algebraic) addition in modular arithmetic with this representation is trivial to implement in hardware, with the only caveat that carries and borrows ‘wrap around’.

As in the sign-bit case, it is possible to tell if a number is positive or negative by looking at the most-significant bit, and 0 indicates a positive number, while 1 indicates a negative number (whose absolute value can then be obtained by flipping all the bits). Sign-extending a value can be done by simply propagating the sign bit of the smaller-size representation to all the additional bits in the larger-size representation.

Two's complement

While ones' complement representation is practical and relatively easy to implement in hardware, it is not the simplest, and it's afflicted by the infamous ‘negative zero’ issue.

Because of this, two's complement representation, which is simpler to implement and has no negative zero, has gained much wider adoption. It also has the benefit of ‘integrating’ rather well with the equally common modular arithmetic.

In two's complement representation, the opposite of an n-bit value is obtained by subtracting it from 2^n, or, equivalently, from flipping the bits and then adding 1, discarding any carries beyond the n-th bit. Using our usual 8-bit bytes as example, 1 will as usual be 00000001, while -1 will be 11111111.

The largest positive representable number with n bits is still 2^{n-1}-1, but the largest (in magnitude) negative representable number is now -2^{n-1}, and it's represented by a high-bit set to 1 and all other bits set to 0. For example, with 8-bit bytes the largest positive number is 127, represented by 01111111, whose opposite -127 is represented by 10000001, while the largest (in magnitude) negative number is -128, represented by 10000000.

In two's complement representation, there is no negative zero and the only representation for 0 is given by all bits set to 0. However, as discussed earlier, this leads to a negative value whose opposite is the value itself, since the representation of largest (in magnitude) negative representable number is invariant by two's complement.

As in the other two representations, the most significant bit can be checked to see if a number is positive and negative. As in ones' complement case, sign-extension is done trivially by propagating the sign bit of the smaller-size value to all other bits of the larger-size value.

Offset binary

Offset binary (or biased representation) is quite different from the other representations, but it has some very useful properties that have led to its adoption in a number of schemes (most notably the IEEE-754 standard for floating-point representation, where it's used to encode the exponent, and some DSP systems).

Before getting into the technical details of offset binary, we look at a possible motivation for its inception. The attentive reader will have noticed that all the previously mentioned representations of signed integers have one interesting property in common: they violate the natural ordering of the representations.

Since the most significant bit is taken as the sign bit, and negative numbers have a most significant bit set to one, natural ordering (by bit patterns) puts them after the positive numbers, whose most significant bit is set to 0. Additionally, in the sign bit and mantissa representation, the ordering of negative numbers is reversed with respect to the natural ordering of their representation. This means that when comparing numbers it is important to know if they are signed or unsigned (and if signed, which representation) to get the ordering right. The biased representation is one way (and probably the most straightforward way) to circumvent this.

The basic idea in biased representation or offset binary is to ‘shift’ the numerical value of all representations by a given amount (the bias or offset), so that the smallest natural representation (all bits 0) actually evaluates to the smallest representable number, and the largest natural representation (all bits 1) evaluates to the largest representable number.

The bias is the value that is added to the (representable) value to obtain the representation, and subtracted from the representation to obtain the represented value. The minimum representable number is then the opposite of the bias. Of course, the range of representable numbers doesn't change: if your data type can only represent 256 values, you can only choose which 256 values, as long as they are consecutive integers.

The bias in an offset binary representation can be chosen arbitrarily, but there is a ‘natural’ choice for n-bit words, which is 2^{n-1}: halfway through the natural representation. For example, with 8-bit bytes (256 values) the natural choice for the bias is 128, leading to a representable range of integers from -128 to 127, which looks distinctly similar to the one that can be expressed in two's complement representation.

In fact, the 2^{n-1} bias leads to a representation which is equivalent to the two's complement representation, except for a flipped sign bit, solving the famous signed versus unsigned comparison issue mentioned at the beginning of this subsection.

As an example, consider the usual 8-bit bytes with a bias of 128: then, the numerical values 1, 0 and -1 would be represented by the ‘natural’ representation of the values 129, 128 and 127 respectively, i.e. 10000001, 10000000 and 01111111: flipping the most significant bits, we get 00000001, 00000000 and 11111111 which are the two's complement representation of 1, 0 and -1.

Of course, the ‘natural’ bias is not the only option: it is possible to have arbitrary offsets, which makes offset binary extremely useful in applications where the range of possible values is strongly asymmetrical around zero, or where it is far from zero. Of course, such arbitrary biases are rarely supported in hardware, so operation on offset binary usually requires software implementations of even the most common operations, with a consequent performance hit. Still, assuming the hardware uses modular arithmetic, offset binary is at least trivial to implement for the basic operations.

One situation in which offset binary doesn't play particularly well is that of sign-extension, which was trivial in ones' and two's complement represnetations. The biggest issue in the case of offset binary is, obviously, that the offsets in the smaller and larger data types are likely going to be different, although usually not arbitrarily different (biases are often related to the size of the data type).

At least in the case of the ‘natural’ bias (in both the smaller and larger data types), sign extension can be implemented straightforwardly by going through the two's complement equivalent representation: flip the most significant bit of the smaller data type, propagate it to all the remaining bits of the larger data type, and then flip the most significant bit of the larger data type. (In other words: convert to two's complement, sign extend that, convert back to offset binary with the ‘natural’ bias.)

What does a bit pattern mean?

We're now nearing the end of our discussion on integer math on the computers. Before getting into the messy details of the first common non-integer operation (division), I would like to ask the following question: what do you get if you do 10100101 + 01111111?

Divide and despair

To conclude our exposition of the joys of integer math on the computers, we now discuss the beauty of integer division and the related modulus operation.

Since division of the integer e by the integer o only gives an integer (mathematically) if e is a multiple of o, the concept of ‘integer division’ has arised in computer science as a way to obtain an integer d from e/o even when o does not divide e.

The simple case

Let's start by assuming that e is non-negative and o is (strictly) positive. In this case, integer division gives the largest integer d such that d*o ≤ e. In other words, the result of the division of e by o is truncated, or ‘approximated by defect’, however small the remainder might be: 3/5=0 and 5/3=1 with integer division, even though in the latter case we would likely have preferred a value of 2 (think of 2047/1024, for example).

The upside of this choice is that it's trivial to implement other forms of division (that round up, or to the nearest number, for example), by simply adding appropriate correcting factors to the dividend. For example, round-up division is achieved by adding the divisor diminished by a unit to the divident: integer divisoin (e + o - 1)/o will give you e/o, rounded up: (3+5-1)/5 = 7/5 = 1, and (5 + 3 - 1)/3 = 7/3 = 2.

Division by zero

What happens when o is zero? Mathematically, division by zero is not defined (although in some context where infinity is considered a valid value, it may give infinity as a result —as long as the dividend is non-zero). In hardware, anything can happen.

There's hardware that flags the error. There's hardware that produces bogus results without any chance of knowing that a division by zero happened. There's hardware that produces consistent results (always zero, or the maximum representable value), flagging or not flagging the situation.

‘Luckily’, most programming languages always treat a division by zero as an exception, which by default causes a program termination. Of course, this means that to write robust code it's necessary to sprinkle the code with conditionals to check that divisions will successfully complete.

Negative numbers

If the undefined division by zero may not be considered a big issue per se, the situation is much more interesting when either of the operands of the division is a negative number.

First of all, one would be led to think that at least the sign of the result would be well defined: negative if the operands have opposite sign, positive otherwise. But this is not the case for the widespread two's complement representation with modular arithmetic, where the division of two negative numbers can give a negative number: of course, we're talking about the corner case of the largest (in magnitude) negative number, which when divided by -1 returns itself, since its opposite is not representable.

But even when the sign is correct, the result of integer division is not uniquely determined: some implementations round down, so that -7/5 = -2, while others round towards zero, so that -7/5 = -1: both the choices are consistent with the positive integer division, but the results are obviously different, which can introduce subtle but annoying bugs when porting code across different languages or hardware.


The modulo operation is perfectly well defined for positive integers, as the reminder of (integer) division: the quotient d and the reminder r of (integer) division e/o are (non-negative) integers such that e = o*d + r and r < o.

Does the same hold true when either e or o are negative? It depends on the convention adopted by the language and/or hardware. While for negative integer division there are ‘only’ two standards, for the modulo operation there are three:

  • a result with the sign of the dividend;
  • a result with the sign of the divisor;
  • a result that is always non-negative.

In the first two cases, what it means is that, for example, -3 % 5 will have the opposite sign of 3 % -5; hence, if one would satisfy the quotient/reminder equation (which depends on whether integer division rounds down or towards zero), the other obviously won't. In the third case, the equation would only be satisfied if the division rounds down, but not if the division rounds towards zero.

This could lead someone to think that the best choice would be a rounding-down division with an always non-negative modulo. Too bad that rounding-down division suffers from the problem that -(e/o) ≠ (-e)/o.


Integer math on a computer is simple only as far as you never think about dealing with corner cases, which you should if you want to write robust, reliable code. With integer math, this is the minimum of what you should be aware of:

Rounding modes in OpenCL

Introduction (history lost)

OpenCL 1.0 supported an OPENCL SELECT_ROUNDING_MODE pragma in device code, which allowed selection of the rounding mode to be used in a section of a kernel. The pragma was only available after enabling the cl_khr_select_fprounding_mode extension. Support for this extension and the relative pragma(s) has been removed from subsequent version of the standard, with the result that there is no way at all in the current OpenCL standard to have specific parts of a kernel use rounding modes different from the default, except in the explicit type conversion functions with the relevant _rt* suffix.

A consequence of this is that it is currently completely impossible to implement robust numerical code in OpenCL.

In what follows I will explore some typical use cases where directed rounding is a powerful, sometimes essential tool for numerical analysis and scientific computing. This will be followed by a short survey of existing hardware and software support for directed rounding. The article ends with a discussion about what must, and what should, be included in OpenCL to ensure it can be used as a robust scientific programming language.

Why directed rounding is important

Rationale #1: assessing numerical trustworthiness of code

In his paper How Futile are Mindless Assessments of Roundoff in Floating-Point Computation, professor William Kahan (who helped design the IEEE-754 floating-point standard) explains that, given multiple formulas that would compute the same quantity, the fastest way to determine which formulas are numerically trustworthy is to:

Rerun each formula separately on its same input but with different directed roundings; the first one to exhibit hypersensitivity to roundoff is the first to suspect.

Further along in the same paper, Kahan adds (emphasis mine):

The goal of error-analysis is not to find errors but to fix them. They have to be found first. The embarrassing longevity, over three decades, of inaccurate and/ or ugly programs to compute a function so widely used as ∠(X, Y) says something bleak about the difficulty of floating-point error-analysis for experts and nonexperts: Without adequate aids like redirected roundings, diagnosis and cure are becoming practically impossible. Our failure to find errors long suspected or known to exist is too demoralizing. We may just give up.

Essential tools for the error-analysis of scientific computing code cannot be implemented in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the impossibility of specifying the rounding direction.

Rationale #2: enforcing numerical correctness

Directed rounding is an important tool to ensure that arguments to functions with limited domain are computed in such a way that the conditions are respected numerically when they would be analytically. To clarify, in this section I'm talking about correctly rounding the argument of a function, not its result.

When the argument to such a function is computed through an expression (particularly if such an expression is ill-conditioned) whose result is close to one of the limits of the domain, the lack of correct rounding can cause the argument to be evaluated just outside of the domain instead of just inside (which would be the analytically correct answer). This would cause the result of the function to be Not-a-Number instead of the correct(ly rounded) answer.

Common functions for which the requirements might fail to be satisfied numerically include:


when the argument would be a small, non-negative number; to write numerically robust code one would want the argument to sqrt be computed such that the final result is towards plus infinity;

inverse trigonometric functions (asin, acos, etc)

when the argument would be close to, but not greater than 1, or close to, but not less than -1; again, to write numerically robust code one would want the argument to be computed such that the final result is rounded towards zero.

A discussion on the importance of correct rounding can again be found in Kahan's works, see e.g. Why we needed a floating-point standard.

Robust coding of analytically correct formulas is impossible to achieve in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the lack of support for directed rounding.

Rationale #3: Interval Analysis

A typical example of a numerical method for which support for directed rounding rounding modes in different parts of the computation is needed is Interval Analysis (IA). Similar arguments hold for other forms of self-verified computing as well.

Briefly, in IA every (scalar) quantity q is represented by an interval whose extrema are (representable) real numbers l, u such that ‘the true value’ of q is guaranteed to satisfy l ≤ q ≤ u.

Operations on two intervals A = [al, au] and B = [bl, bu] must be conducted in such a way that the resulting interval can preserve this guarantee, and this in turn means that the lower extremum must be computed in rtn (round towards negative infinity) mode, while the upper extremum must be computed in rtp (round towards positive infinity) mode.

For example, assuming add_rtn and add_rtp represent additions that rounds in the suffix direction, we have that C = A + B could be computed as:

cl = add_rtn(al, bl);
cu = add_rtp(au, bu);

In OpenCL 1.0, add_rtn and add_rtp could be defined as:

gentype add_rtn(gentype a, gentype b) {
 return a + b;
gentype add_rtp(gentype a, gentype b) {
 return a + b;
/* restore default */

The same functions could be implemented in C99, in FORTRAN, in MATLAB or even in CUDA (see below). In OpenCL 1.1 and later, this is impossible to achieve, even on hardware that supports rounding mode selection.

Applicative examples

From the rationales presented so far, one could deduce that directed rounding is essentially associated with the stability and robustness of numerical code. There are however other cases where directed rounding can be used, which are not explicitly associated with things such as roundoff errors and error bound estimation.

Rounding down for the neighbors list construction in particle methods

Consider for example an industrial application of mesh-less Lagrangian such as Smoothed Particle Hydrodynamics (SPH).

In these numerical methods, the simulation domain is described by means of ‘particles’ free to move with respect to each other. The motion of these particles is typically determined by the interaction between the particle and its neighbors within a given influence sphere.

Checking for proximity between two particles is done by computing the length of the relative distance vector (differences of positions), and the same distance is often used in the actual computation of the influence between particles. As usual, to avoid bias, both the relative distance vector and its length should be computed with the default round-to-nearest-even rounding mode for normal operations.

To avoid searching for neighbors in the whole domain for every operation, implementations often keep a ‘neighbors list’ of each particle, constructed by checking the proximity of candidate particles once, and storing the indices of the particles that fall within the prescribed influence radius.

Due to the mesh-less nature of the method, neighborhoods may change at every time-step, requiring a rebuild of the neighbors list. To improve performance, this can be avoided by rebuilding the neighbors list at a lower frequency (e.g. every 10 time-steps), assuming (only in this phase) a larger influence radius, taking into account the maximum length that might be traveled by a particle in the given number of time-steps.

When such a strategy is adopted, neighbors need to be re-checked for actual proximity during normal operations, so that, for maximum efficiency, a delicate balance must be found between the reduced frequency and the increased number of potential neighbors caused by the enlarged influence radius.

One way to improve efficiency in this sense is to round towards zero the computation of the relative distance vector and its length during neighbors list construction: this maximizes the impact of the enlarged influence radius by including potential neighbors which are within one or two ULPs. This allows the use of very tight bounds on how much to enlarge the influence radius, without loss of correctness in the simulations.

Directed rounding support

Hardware support for directed rounding

x86-compatible CPUs have had support for setting the rounding mode by setting the appropriate flags in the control registers (either the x87 control word for FPU, or the MXCSR control register for SSE). Similarly, on ARM CPUs with support for the NEON or VFP instruction set, the rounding mode can be set with appropriate flags in the FPCSR

AMD GPUs also have support for rounding modes selection, with the granularity of an ALU clause. As documented in the corresponding reference manuals, the TeraScale 2 and TeraScale 3 architectures support setting the general rounding mode for ALU clauses via the SET_MODE instruction; Graphics Core Next (GCN) architectures can control the rounding mode by setting the appropriate bits in the MODE register via the S_SETREG instruction.

Additionally, the following hardware is capable of directed rounding at the instruction level:


as documented in the CUDA C Programming Guide, Appendix D.2 (Intrinsic Functions), some intrinsic functions can be suffixed with one of _rn,_rz,_ru,_rd to explicitly set the rounding mode of the function;

CPUs with support for the AVX-512 instruction set

the EVEX prefix introduced with AVX-512 supports the rounding mode to be set explicitly for any given instruction, overriding the MXCSR control register, as documented in the Intel® Architecture Instruction Set Extensions Programming Reference, section 4.6.2: “Static Rounding Support in EVEX”.

Software support for directed rounding

At the software level, support for the rounding mode at the processor level can be accessed in C99 and C++11 by enabling the STDC FENV_ACCESS pragma and using fesetenv() (and its counterpart fegetenv()).

In MATLAB, the rounding mode can be selected by the system_dependent('setround', ·) command.

Some FORTRAN implementations also offer functions to get and set the current rounding mode (e.g. IBM's XL FORTRAN offers fpgets and fpsets).

CUDA C exposes the intrinsic functions of CUDA-enabled GPUs that support explicit rounding modes. So, for example, __add_ru(a, b) (resp. __add_rd(a, b)) can be used in CUDA C to obtain the sum of a and b rounded up (resp. down) without having to change the rounding mode of the whole GPU.

Even the GNU implementation of the text-processing language Awk has a method to set the rounding mode in floating-point operations, via the ROUNDMODE variable.

All in all, OpenCL (since 1.1 on) seems to be the only language/API to not support directed rounding.

What can be done for OpenCL

In its present state, OpenCL 1.1 to 2.0 are lagging behind C99, C++11, FORTRAN, MATLAB and CUDA (at the very least) by lacking support for directed rounding. This effectively prevents robust numerical code to be implemented and analyzed in OpenCL.

While I can understand that core support for directed rounding in OpenCL is a bit of a stretch, considering the wide range of hardware that support the specification, I believe that the standard should provide an official extension to (re)introduce support for it. This could be done by re-instating the cl_khr_select_fprounding_mode extension, or through a different extension with better semantics (for example, modelled around the C99/C++11 STDC FENV_ACCESS pragma).

This is the minimum requirement to bring OpenCL C on par with C and C++ as a language for scientific computing.

Ideally (potentially through a different extension), it would be nice to also have explicit support for instruction-level rounding mode selection independently from the current rounding mode, with intrinsics similar to the ones that OpenCL defines already for the conversion functions. On supporting hardware, this would make it possible to implement even more efficient, yet still robust numerical code needing different rounding modes for separate subexpression.


When it comes to the OpenCL programming model, it's important to specify the scope of application of state changes, of which the rounding mode is one. Given the use cases discussed above, we could say that the minimum requirement would be for OpenCL to support changing the rounding mode during kernel execution, and for the whole launch grid to a value known at (kernel) compile time.

So, it should be possible (when the appropriate extension is supported and enabled) to change rounding mode half-way through a kernel. The new:

kernel some_kern(...) {
    /* kernels start in some default rounding mode,
     * e.g. round-to-nearest-even. We do some calculations
     * in this default mode:
    do_something ;

    /* now we change to some other mode. of course this
     * is just pseudo-syntax:

    /* and now we do more calculations, this time
     * with the new rounding mode enabled:

The minimum supported granularity would thus be the whole launch grid, as long as the rounding mode can be changed dynamically during kernel execution, to (any) value known at compile time.

Of course, a finer granularity and a more relaxed (i.e. runtime) selection of the rounding mode would be interesting additional features. These may be made optional, and the hardware capability in this regard could be queried through appropriate device properties.

For example, considering the standard execution model for OpenCL, with work-groups mapped to compute units, it might make sense to support a granularity at the work-group level. This would be a nice addition, since it would allow e.g. to concurrently run the same code with different rounding modes (one per work-group), which would benefit applications geared towards the analysis of the stability of numerical code (as discussed in Rationale #1). But it's not strictly necessary.

Our lives are short

Some time ago someone on FriendFeed asked if anybody (else) would celebrate their kid's 1000th day. Among the negative answers, someone remarked that they'd rather celebrate the 1024th. And as hardened computer geek, that was my first thought as well.

But why stop there? Or rather: if your baby is young, why start there?

So we started collecting the other power-of-two dates (power-of-two-versaries1?) for our baby, and after asking about a few from WolframAlpha, I set up to write a Ruby script to do the computations for me, and the first question that arose was: what's the highest (integer) power of two that should be considered?

Since the script wasn't ready yet, I asked WolframAlpha again, to quickly discover that the best we can do comes significantly short of 216 days, which is almost 180 years (179 years, 5 months, and a few days, with the actual number of days depending on how many leap years are covered by the timespan)2.

Now, as it happens, 16 bits is the (minimum) width of the short data type in the C family of programming languages; in fact, on most platforms and systems, 16 bits its exactly the width of the short data type.

As it turns out, our lives are indeed (unsigned) short.

  1. if anybody has a better name for them, please tell. ↩

  2. most modern-day human adults can aspire at hitting three power-of-two-versaries at best: 213 (22 years, 5 months, and the usual bunch of days) as a young adult, 214 (44 years, 10 months and a week, day more day less) and finally 215 (89 years, 8 months and a half), for the lucky ones. ↩

Free versus Open: the Maps case


(or skip this and jump straight to the actual content)

‘Free’ is an interesting word in English, particularly when pertaining to software and services. Although the original meaning of the word was pertaining to the concept of freedom, it has gained extensive use as a short form for “free of charge”, i.e. gratis, without requiring any (direct, obvious) payment.

Now, while context can usually help clarify which meaning is intended in modern usage, there are situations in which one of them is intended but the other is understood. For example, the sentence “that slave is now free” can mean that it has attained freedom, or that it is being given away (to another slave owner.)

A context where the ambiguity cannot be resolved automatically is that of software and services; in fact, the duplicity of the meaning has been plaguing the free software movement since its inception, which is why now the FLOSS acronym is often used: Free/Libre Open Source Software. The Libre is there to clarify what that Free is supposed to mean.

Of course, free (as in freedom) software and services tend to also be free of charge (and thus gratis), which is what makes them appealing to the large public who is not particularly interested in the ideology behind the software freedom movement.

And specifically to highlight how important the difference between the two concepts is, I'm going to discuss two distinct approaches to making worldwide maps available to the public: one which is free (as in gratis), but essentially closed, and the other which is free (as in libre), and open.

An introduction

Recently, Google has started offering “tourism boards, non-profit, government agencies, universities, research organizations or other entities interested” the possibility to borrow Google's Street View Trekker to help extend Google Maps' Street View.

To clarify, Google is “offering” these subjects the opportunity to expend the subjects' human resources to expand Google's own database. A company whose business is data gathering is giving other entities the “opportunity” to contribute to their data gathering efforts —for free1.

In simpler words, this is a bank asking people to donate money to them, but spinning it as an opportunity.

Google Maps

An objection that I'm sure will be raised is that this is not really like a bank asking for monetary donations, because banks' services are not free (of charge). For example, they loan money at a cost (the interest rate). By contrast, Google's services (and particularly Google Maps and its Street View feature) are free (of charge).

The objection is deeply flawed by the misconception about what Google services are, misconception driven by the inability to realize the difference between Google's consumers and Google's customers.

Google's consumer services (most publicly known Google services: Search, Mail, Maps, Picasa, now Plus) are as much a service as the cheese in a mouse trap is a meal. Their purpose is not to provide the service to the consumer, but to gather data about him or her.

Google is in the business of data gathering. Gathering data about people, gathering data about things people might be interested in. Selling this data to its customers (directly or indirectly) is what Google makes money off: the main income stream is advertisement, targeted advertisement that relies on your usage of Google's consumer services to see what you might be interested in. (And I'm not even getting into the NSA debacle because that would really steer us off topic.)

The key point is understanding who owns and controls the data, which in the case of Google Maps and Street View is Google. While the data is (currently) being made available back to Google's consumers free of charge, in post-processed form, that data remains solidly in the hands of Google, that may choose to use it (or not) as they see fit.

To the potential question “why would Google ever not make the data accessible?” (through its consumer services), the correct answer is, sadly, why not.

Google is in fact (in)famous for discontinuing services now and then, the most recent one being its Reader feed aggregator, the upcoming one being the personalized iGoogle page. But there is in fact one service that was discontinued so silently most people even failed to notice it got discontinued: wireless geolocation.

Google and wireless geolocation

Wireless geolocation is the computation of the location of a wireless device based on which wireless transmitters (of known location) it sees and the strength of the signal. This is typically used with cellphones, for example, based on which cell towers they see and the respective signal strength. It can be used with WiFi devices (laptops, smartphones, tablets, whatever) based on which wireless routers are visible —provided you know the position of the routers.

Now, as it happens, when Google was driving around in their Google Street View Cars snapping pictures for their Street View consumer service, they were also gathering information about wireless routers. The data thus gathered could be used by another Google consumer service, the geolocation API: you could ask Google “where am I if I see such and such routers with such and such signal strengths?” and Google would provide you with an approximate latitude and longitude.

(And let's skip the part where Google was collecting much more than the information needed for geolocation, something that got them in trouble in Germany, although it only led to a ridiculously low fine.)

The wireless geolocation service was available to the public, but that access has been discontinued since 2011, and access to a similar service is only available for business (essentially to allow Android to keep using it), with stricter controls on the its access. So Google still has the data, it still uses it, but the services based on it are not available to the general public anymore. What would you think if something like this happened to data you were “offered” to contribute?

User contributions

In fact, Google interest in external contributions is not limited to the recent offer to use a Trekker: Google Maps now has a number of options to allow users to contribute, ranging from changes and fixes to the map itself, to geotagged panoramic photos publicly shared on Picasa (which can be used for Street View).

I suspect that Google has learned from the experience of OpenStreetMap (which I will discuss later on) how powerful ‘crowdsourcing’ can be, while requiring much less resources on the company's side.

So you can contribute to make Google Maps better. The question is: should you? Or rather, would you? If you're willing to spend whatever small amount of time to contribute to global mapping, why would you do it for a company for which this is business?

Open Street Map

OpenStreetMap (Wikipedia article) was born in 2004 in the UK with the aim of providing a free (as in freedom) map of the world.

It's important to note right from the start the huge difference between how OSM is free versus how Google Maps is free: the latter provides a service that is available to consumer free of charge, the former provides mapping data which is not only available free of charge to anybody, but the use of which is also subject to very little restrictions (the actual license is the Open DataBase License, which, as explained here, essentially, allows anyone to access, modify and make derivatives of the data, provided proper attribution is given and derivatives are shared with the same liberal terms).

So there are two distinctive differences.

The first difference pertains what is made available. Google Maps provides a front-end to Google's data: the visualization (and related services), available mostly through their website and smartphone applications. By contrast, OpenStreetMap provides the actual mapping data, although a slip-map to access it in human-usable form is also provided on the website.

The second difference pertains the terms of use of what is made available. Although Google allows (in fact, encourages) embedding of the maps in other websites (free of charge within certain limits), the Terms of Service are otherwise pretty restrictive. By contrast, the license under which OSM data is made available is quite liberal, in that it only prevents misappropriation of the data, or the imposing of further restrictions on its use (I debate the paradox of restricting restrictions elsewhere in Italian).

OpenStreetMap, as any other collaborative effort for a knowledge base, is such that it benefits from anybody's contribution, but in perfect reciprocity anybody can benefit from it. This is in contrast to situations (such as that of Google Maps) where there is one main entity with dominant interests and control on the data (Google benefits from user contributions, and Google again benefits from consumers using its services, and it can arbitrarily limit their use by third parties).

There are commercial interests in OpenStreetMap. While some are essentially unidirectional (Apple, for example, used OSM data in its photo application for the iPhone —at first without attribution, thereby actually violating the license), others try to build a two-way relationship.

For example, at Flickr they use OSM data for (some of) their maps, and they also introduced OSM-related machine tags that can be used to associate photos to the places they were taken at. Yahoo (the company that owns Flickr) and Microsoft allow usage of their satellite and aerial photos for ‘armchair mapping’ (more on this later). MapQuest (formerly, the mapping website) has an Open alternative that relies on OSM, and they have contributed to the open-source software that drives OpenStreetMap (the renderer, the geocoding and search engine, the online editor), and they have funded the improvement of the actual data.

In some nations, OSM data comes (partially) from government sources, either directly (government-sponsored contributions) or indirectly (through volunteer work from government data). In some ways, it's actually surprising that governments and local administrations are not more involved in the project.

Considering that OSM contribution is essentially voluntary, the amount of information that has been added is actually amazing. Of course, there are large inhomogeneities: places that are mapped to an incredible detail, others where even the most basic information is missing: this site maps the density of information present throughout the world, showing this discrepancy in a spectacular fashion.

Why use OSM

Many (most, likely) end users are not particularly interested with the ideology of a project, nor with the medium and long term consequences on relying on particular vendors. For the most part, what end users are interested in is that a specific product delivers the information or service they seek in an appropriate manner.

In this sense, as long as the information they need is present and accessible, a user won't particularly care about using OpenStreetMap or Google Maps or any other particular service (TomTom, Garmin, Apple Maps, whatever): they will usually go with whatever they have available at hand, or with whatever their cultural context tends to favor.

On the other hand, there are a number of reasons, ranging from the ethical to the practical, why using an open, free (as in freedom) service such as OpenStreetMap should be preferred over opting-in to proprietary solutions. (I will not discuss the ethical ones, since they may be considered subjective, or not equally meaningful to everybody.)

On the practical side, we obviously have a win of OSM over paid proprietary solutions: being open and free (as in freedom), the OSM data is available free of charge as well.

But OSM also wins —somewhat unexpectedly— over other free-of-charge services such as Google Maps, as I found out myself, in a recent discovery that brought me back to OpenStreetMap after my initial, somewhat depressing experience with it over four years ago: the Android application for Google Maps does not offer offline navigation.

Finding that such an otherwise sophisticated application was missing such a basic function was quite surprising. In my case (I own a Transformer Infinity without cellphone functionality) it also rendered Google Maps essentially useless: the application allows you to download map data for offline usage2, which is useful to see where you are even when you can't connect to the Internet, but the functionality to get directions from one place to another is not actually present in the application itself: it's delegated to Google's server.

I was amazed by the discovery, and I'm still wondering why that would be the case. I can understand that optimal routing may depend on some amounts of real-time information, such as traffic conditions, that may only be available with an Internet connection, but why would the navigation features be completely relying on the online service?3

Since the lack of offline navigation meant the Google Maps app on Android was useless for me, I started looking for alternatives, and this is how I found out about, and finally settled for, OsmAnd, an open source4, offline navigator for Android that uses open data (from OpenStreetMap, but also e.g. from Wikipedia).

The existence of applications such as OsmAnd is excellent to explain the importance of open data: when Google Maps does not offer a particular service, it is basically impossible for anybody else to offer it based on their data. By contrast, OpenStreetMap offers no services by itself (aside from basic map rendering), but gives other projects the opportunity —and this time we really mean opportunity, not in the ironic sense we used when discussing Google's outreach to get manpower for free— to provide all possible kinds of services on top of their data.

There are in fact a number of applications, both commercial and not, that provide services based on OSM data. They all benefit from the presence and quality of the data, and they often, in one way or another, give back to OSM. The relevance of OSM is not just in it being a free world mapping website. It's also in the healthy ecosystem which is growing around it.

More interesting, OpenStreetMap sometimes wins over any (gratis or paid) services also in terms of quality and amount of mapped data. This happens whenever the local interest for good mapping is higher than the commercial interests of large external companies providing mapping services. Many small, possibly isolated communities (an example that was pointed out to me is that of Hella, in South Iceland) tend to be neglected by major vendors, as mapping them tends to be high-cost with very little or no return, while local mappers can do an excellent job just being driven by passion.

Why not use OSM

For the end user there are some equally obvious reason why they should not, or cannot, use OpenStreetMap, the most important being, unsurprisingly, lack or low quality of the data.

Although the OSM situation has distinctly improved over time, it's quite evident that there are still huge areas where Google and other proprietary providers of mapping services have more detailed, higher quality data than OSM. Of course, in such areas OpenStreetMap cannot be considered a viable alternative to services such as Google Maps.

It should be noted however that the OSM data is not intrinsically ‘worse’ than the data available from proprietary sources such as Google. In fact, Google itself is well aware of the fact that the data they have is not perfect, which is why they have turned to asking users for help: the amount of manpower required to refine mapping data and keep it up-to-date is far from trivial, and this is precisely where large amounts of small contributions can give their best results.

(Of course, the point then is, who would you rather help refine and improve their data?)

Another important point to be considered, as highlighted by the disclaimer on OSM's own website, is that their data should not be considered the end-all-and-be-all of worldwide mapping; there are use cases for which their data, as complete and detailed as it may be, should still not be used, as its reliability cannot be guaranteed, and it's in no way officially sanctioned. (Of course, similar disclaimers also apply to other map service providers, such as Google itself and MapQuest.)

There are finally types of data which OSM does not collect, because they are considered beyond the scope of the project: things such as Street VIew, or real-time information about public transport, or even the presence and distribution of wireless transmitters (for geolocation purposes). For this OSM obviously can't be used, but this doesn't necessarily mean that Google is the only viable alternative. (More on this later.)

Why (and how to) contribute to OSM

There is a very simple, yet important reason to contribute to OpenStreetMap: the more people are involved, the more everyone benefits from the improvements in the amount and quality of the data, in sharp contrast to the actual beneficiaries of your donated time and efforts to assist a company that thereafter gains control of the data you provide. In other words, if you plan on spending time in improving map data, it would be recommendable to do it for OpenStreetMap rather than a proprietary provider such as Google.

Moreover, contributing nowadays is much simpler than it was in the past, both because of the much more extensive amount of data already available (yes, this makes contributing easier) and because the tools needed to actually provide new data or improving the existing ones are more generally available and easier to use.

I first looked into OpenStreetMap around 2008 or 2009, at a time in which the state of the database was still abysmal (in my whereabouts as in most of the rest of the world). Contributing also required nontrivial amounts of time and resources: it required a GPS device which satisfied some specific conditions in terms of interoperability and functionality, and the use of tools that were everything but refined and easy to use. I gave up.

Things now are much different: if you are in the northern hemisphere (or at least one of the ‘western’ countries), chances are that most of your whereabouts have already been mapped to a high level of detail, so that your efforts can be more focused and integrated. Moreover, dedicated tools such as JOSM or even in-browser editors are available and (relatively) user-friendly (considering the task at hand). Finally, data is much easier to collect, with GPS receivers built in most common smartphones and numerous applications specifically designed to assist in mapping.

Indeed, while trying out the aforementioned OsmAnd to see how viable a navigation app it would have been, I found out a couple of places in my whereabouts where the data was not accurate (e.g. roundabouts not marked as such) or was out of date (former crossing recently turned into roundabouts). This was what finally got me into OSM contribution, as fixing things turned out to be quite easy, when starting from the data already present.

There are a number of ways to contribute to OpenStreetMap, with varying degree of required technological prowess, time investment and relevance of the changes.

The simplest way to contribute to OSM, Notes, has been introduced quite recently; in contrast to other methods it doesn't even require an account on OSM, although having one (and logging in) is still recommended.

The purpose of Notes is to leave a marker to report a problem with a specific location in the map, such as missing or wrong data (such as a one-way street not marked as such or with the opposite direction). Notes are free-form contributions that are not an integral part of the actual map data. Rather, more experienced mappers can use Notes to enact the actual necessary changes on the data, thereby ‘closing’ the Note (for example, fixing the one-way direction of the street).

Notes are a powerful feature since they allow even the less experienced users to contribute to OSM, although of course manual intervention is still needed so that the additional information can be merged with the rest of the data.

Any other contribution to OSM requires an account registered with the site, and the use of an editor to change or add to the actual map data. The website itself offers an online editor (two of them, actually), which can be practical for some quick changes; more sophisticated processing, on the other hand, are better done with external editors such as the aforementioned JOSM.

The simplest change that can be done to map data is the addition or correction of information about Points of Interest (POIs): bars and restaurants, hotels, stations, public toilets, newsstands, anything that can be of interest or useful to residents and tourists alike.

POIs are marked using tags, key-value combinations that describe both the kind of Point and any specific information that might be relevant. For example, amenity=restaurant is used to tag a restaurant, and additional tags may be used to specify the type of cooking available, or the opening hours of the business.

Tagging is almost free-form, in the sense that mappers are free to choose keys and values as they prefer, although a number of conventions are used throughout the map: such common coding is what allows software to identify places and present them to the end-user as appropriate. Most editors come with pre-configured tag sets, allowing less experienced user to mark POIs without detailed knowledge of the tag conventions.

In fact, tags are used everywhere around OSM, since the spatial data itself only comes in two forms: points, that mark individual locations, and ‘ways’, ordered collections of points that can mark anything from a road to a building, so that tags are essential to distinguish the many uses of these fundamental types5.

Contributing to the insertion and improvement of POIs is mostly important in areas where most of the basic information (roads, mostly) has already been mapped.

In less fortunate places, where this information is missing, the best way to contribute is to roll up your sleeves and start mapping. This can be done in two ways.

The preferred way is to get ‘on the ground’ with some kind of GPS receiver (nowadays, most smartphones will do the job nicely) and some way to record your position over time, as you walk or drive around. The GPS tracks thus collected can then be imported into an OSM-capable editor, cleaned up, tagged appropriately and uploaded to OpenStreetMap.

Lacking such a possibility, one can still resort to ‘armchair mapping’, tracing satellite or aerial maps for which this kind of usage has been allowed (e.g. those by Yahoo and Microsoft). Of course, the information thus tracked is more likely to be inaccurate, for example because of incorrect geolocation of the imagery, or because the imagery is simply out of date. Such an approach should thereby only be chosen as a last resort.

Who should contribute to OSM

The obvious answer to such a question would be ‘everybody’, although there quite a number of possible objections.

For example, an interesting paradox about OSM is that the ones better suited to generate the data are not necessarily those that would actually benefit from it: locals have the best ground knowledge about their whereabouts, but exactly because of this they are also the least likely to need it from OSM.

This is where the reciprocity in the benefits of using OpenStreetMap comes into play: with everyone taking care of ‘their curb’, users benefit from each other's contributions.

Of course, there are some parties, such as local administrations and tourism boards, for which accurate mapping is beneficial per se; yet, there aren't many cases in which they are directly involved in the improvement of OSM data. While this may seem surprising, there are many possible explanations for this lack of involvement.

There are, of course, legal reasons: aside from possible licensing issues that the administrations would have to sort out (due to the liberal licensing of OSM data), there is also the risk that an involvement of the administrations could somehow be misrepresented as an official sanctioning of the actual data, a dangerous connotation for content which still maintains a high degree of volatility due to possible third party intervention6.

There is also the issue of knowledge about the existence of OpenStreetMap not being particularly widespread; as such, there is lack of a strong motivation in getting involved. (This, of course, is easily solved by spreading the word.) What's worse, even when the existence of OSM is known, the project is lightly dismissed as an amateurish knock-off of more serious services such as Google Maps.

As an aside, the latter problem is not unique to OSM, and is shared by many open projects in their infancy7 —think e.g. how the perception of Linux has changed over the years. The problem is that this triggers a vicious circle: the less complete OpenStreetMap is, the less it's taken seriously; the less it's taken seriously, the less people contribute to it, making it harder to complete.

This is another reason why every contribution counts: the need to break out of the vicious circle, reach a critical mass such that people will consider it normal to look things up in OpenStreetMap (rather than on other, proprietary services) and eventually fix or augment it as appropriate.

The easiest way to start getting involved is with the addition of Points Of Interest that are personally ‘of interest’. You have a preference for Bitcoins? Help map commercial venues that accept them. You have kids? Help map baby-friendly restaurants and food courts. Are you passionate about Fair trade? Guess what you can help mapping. You get tired easily while walking around? Map the benches. Did you just book a few nights in a hotel which is missing from the map? Add it.

And most of all, spread the world. Get people involved.

Other open map-related services

OpenStreetMap has a rather specific objective, which excludes a number of map-related information. For example, OSM does not provide nor collects street-level imagery, and thus cannot replace StreetView. It also doesn't provide or collect information about wireless transmitters, and thus cannot be used for wireless geolocation. It also doesn't provide or collect real-time information about traffic or public transport, and thus cannot be used for adaptive routing.

As such, OSM cannot be considered an integral replacement for Google Maps (or other non-open mapping services), even when the actual ground map data is on par or even superior (yes it happens). This is where other services, —similarly open, and often integrated with OSM itself— can be of aid, although their current status and quality is often significantly inferior both compared to the current status and quality of OpenStreetMap itself and (of course) compared to the proprietary solutions.

Wireless geolocation

For wireless geolocation, there are actually a number of different solutions available. The largest WiFi mapping project (WiGLE) provides data free of charge, but under a very restrictive license, and thus cannot be considered open by any standard, so we will skip over that.

OpenBMap, active since 2009, can be considered open by most standards: it provides client and server software under the GPL, and it provides the collected data (both raw and processed) under the same Open Database License as OpenStreetMap.

At the time of writing, the OpenBMap database is not very strong (less than 900K data points are present in the processed files that can be downloaded from the website). In itself, this is an issue that is easily remedied, since data gathering for wireless networks is trivial (when compared to e.g. ground mapping) and can be fully automated: improving the database is therefore just a matter of spreading the word and having more people contribute.

As driven by the best intentions as the project can be, however, contributions to it are brought down by an overall amateurish presentation, both at the website level (the aesthetics and layout could use some refinement) and at the software level: albeit open source, its development is not managed as openly as it could be8, which makes collaboration harder.

A more recent project is OpenWLANmap. This project also provides open source software for wireless geolocation, but the openness of its database is more dubious. The license is not clearly indicated anywhere on the website (although the GNU Free Documentation License is distributed with the database downloads), and the downloadable database is only a subset of the data used by the website (about 1.6M of more than 5M data points), leading to the question about what happens to the user-submitted data. In this sense, its openness could be challenged, until these issues are resolved.

Yet another similar project is Geomena, started more or less at the same time as OpenBMap, but with a different licensing (Creative Commons instead of the ODBL). This project seems to be particularly focused on presenting an easy-to-use API both for querying and for contributing to the database. However, quite a few links are broken and the project doesn't seem to have moved much forward both in terms of application development and in terms of database growth (at the time of writing, just about 25K access points are claimed to be available, and the link to download the data is not functioning).

This fragmentation, with application bits and partial data bases spread out across different projects, none of which manages to provide a complete, well-organized, functional solution, is probably the most detrimental situation we could have. Getting these project together, sharing the data base as well as the efforts to provide accessible data and applications would be beneficial to all.

Street-level imagery

While gathering information about wireless networks can be trivially automatized thanks to the widespread diffusion of smartphones and similar devices, the kind of street-level imagery that would be useful to provide an open alternative to Google StreetView is quite laborious to take without specialized hardware. Photo stitching applications and camera software with automatic support for 360° photography can come in handy, but having to do this manually every few meters remains a daunting task.

Additionally, pictures taken may need to be cleaned up by masking out sensitive data such as faces, car license plates or whatever else might need masking depending on where the photo was taken.

These are probably —currently— the most significant obstacles to the creation of a competitive StreetView alternative. Despite them, a few projects that try to provide street-level imagery have been born more or less recently.

We have for example the most obviously-named OpenStreetView, with strong ties to OpenStreetMap and the aim to become a repository of open-licensed street-level imagery. Other projects, such as Geolocation.ws, use a different approach, acting as a mapping hub of open-licensed, geolocalized photos hosted by other services (Panoramio, Flickr, etc).

While the considerable lack of imagery and the difficulty in obtaining it are undoubtedly the biggest issues these projects face, the unrefined user interfaces and consequent reduced usefulness aren't exactly of assistance.

Real-time traffic and public transport

This is probably the data which is hardest (possibly impossible) to obtain without direct involvement of the interested parties. While routes, stops, and even expected timetables can be mapped and integrated into the standard OpenStreetMap database, real-time information such as actual bus departure time and route progress, or temporary issues such as strikes, abnormal high traffic conditions, or roadworks are completely outside the scope of the OpenStreetMap data and impossible to maintain without a continuous stream of information coming from somewhere or someone.

Talking about openness for such volatile data which can almost only be provided by a central controller is also less important in some ways. A more interesting subject for this topic would be some form of common standard to have access to this data, in place of the plethora of proprietary, inhomogeneous APIs made available by a variety of transport systems throughout the world.

Still, it would be interesting if something was cooked up based on a principle similar to that used for Waze, the crowd-sourced crowd-avoidance navigation system recently acquired by Google9. In fact, it wouldn't be a bad idea if an open alternative to Waze was developed and distributed: enhancing it to include alternative transportation methods (on foot, by bus, by bicycle10) would have the potential of turning it into a viable tool, even surpassing Waze itself.

Heck, it would even be possible to use the last open-source version of the Waze source code as a starting point; of course, openness of the collected data would this time become a strong point in competing against the proprietary yet still crowd-sourced alternative, especially when combined with smart integration with OpenStreetMap and its flourishing ecosystem.

There's good potential there to set up the infrastructure for a powerful, open routing engines with real-time information providers. It just needs someone with the courage to undertake the task.

Some conclusions

There is little doubt, in my opinion, on the importance of open mapping data. The maturity reached by OpenStreetMap over the years is also an excellent example of how a well-focused, well-managed, open, collaborative project can achieve excellent results.

The power of crowd-sourcing is such that OSM has often reached, when not surpassed, the quality of proprietary mapping services, and this has become so evident that even these proprietary mapping services are trying to co-opt their consumers into contributing to their closed, proprietary databases by disguising this racking up of free manpower as an opportunity for the volunteers to donate their time to somebody else's profit.

The biggest obstacle to OpenStreetMap, as with any collaborative project, is getting people involved. The improvements in technology have made participation much easier than it was at the project inception, and the increasing amount and quality of base ground data makes it also much easier to get people interested, as the basic usability of the project is much higher (it's easier to get started by fixing small things here and there than starting from scratch an uncharted area).

There are also other map-related data which is not collected by OpenStreetMap, since it is deemed outside its scope. While other projects have tried stepping in to cover those additional aims, their success so far has been considerably inferior, due sometimes to fragmentation and dispersal of efforts, sometimes to hard-to-overcome technical issues. It is to be hoped that in time even these project will find a way to come together and break through just as OSM managed, disrupting our dependency on commercial vendors.

  1. You will notice an Italian tag for this article. Sorry, I couldn't miss the chance. ↩

  2. update (2013-07-10): apparently in the just-recently released version 7 of the app, offline maps feature has been ‘almost removed’: you can cache the area which is being shown with a completely non-obvious command (“OK maps”, who the fsck was the genius that came up with this), but the sophisticated management of offline maps that was present in earlier versions is gone. ↩

  3. this cannot be about computational power of the devices, since other applications offering offline routing for Android are available; this cannot be about the algorithm depending on some pre-computed optimal partial routes, since these could be downloaded together with the offline data. One possible explanation could be that offline routing is more likely to find less optimal routes due to lack of useful real-time information and limited computational power of the devices, and Google would rather offer an ‘all or nothing’ service: either you get the optimal routing computed on Google servers, or you don't get any routing at all —but this sounds stupid, since a simple warning that the route could not be optimal when offline would be sufficient. Another possible explanation is that offline routing prevents the kind of data gathering that Google is always so eager about; sure, Maps could still provide that information on the next chance to sync with Google, but that would make the data gathering obvious, whereas Google is always trying to be discreet about it. ↩

  4. while the application is open source, it is distributed for free only with reduced functionality, and only the paid version has all features built in, unless you compile application for yourself. ↩

  5. points and ways can additionally be collected in ‘relations’, which are used to represent complex information such as bus routes or collections of polygons that represent individual entities, but delving this deep into the more advanced features of OSM is off-topic for this article. ↩

  6. in other words, even if the data contributed by official sources could be considered official, against the OSM disclaimer, it could still be as easily subject to editing from other users, and thus the metadata about who made changes and when would rise to a much higher importance than what it has now; this could be solved with approaches such as ‘freezing’ these kind of official contributions, but this would require a change in the licensing terms, and go against the hallmark of OpenStreetMap, its openness. ↩

  7. and make no mistake that OpenStreetMap is still in its infancy, although it has been running already for almost 10 years now and is now reasonably usable in large parts of the world. Mapping is a daunting task, and the amount of information that can still be collected is orders of magnitude higher than what has already been done. ↩

  8. the project seems to follow a strategy where the source code is released with (when not after) the public release of the applications, in contrast to the more open approach of keeping the entire development process public, to improve feedback and cooperation with external parties. ↩

  9. in fact, the acquisition of Waze by Google is another indication of how much Google values the opportunity (for them) to use crowd-sourced data gathering to improve the (online-only) routing services they offer to their consumers, thereby further locking in consumers into using their services. ↩

  10. of course, cyclists already have OpenCycleMap, the OpenStreetMap-based project dedicated to cycling routes. ↩

RaiTV, Silverlight e i video

Get the code for
UserJS/Greasemonkey per usare HTML5 invece di Silverlight su RaiTV:


‘Mamma’ RAI, la concessionaria del servizio pubblico radiotelevisivo in Italia, è entrata nel terzo millennio rendendo le proprie trasmissioni disponibili online sul sito RaiTV, sia in streaming sia in archivio.

Quando la piattaforma RaiTV è stata costruita, la scelta sul software da utilizzare per la distribuzione è caduta su Silverlight, un prodotto con cui la Microsoft mirava a soppiantare il già esistente (e ben più supportato) Macromedia Flash.

La caratteristica principale di Silverlight è che si appoggia alla piattaforma Microsoft .NET che, nonostante le presunte potenziali intenzioni, non è multipiattaforma: è pur vero che esistono implementazioni incomplete e non perfettamente funzionanti di .NET per Linux e Mac, ma in entrambe queste altre piattaforme, guarda caso, Silverlight (o equivalenti) non funziona(no) bene.

Senza andare quindi ad indagare sulle possibili motivazioni dietro la scelta di questa piattaforma, il risultato netto è che il sito non è (perfettamente) fruibile senza avere Windows. È interessante notare che questa mancanza di generalità nella fruibilità del sito non è direttamente frutto (come in altri casi) della monocultura web di cui mi sono trovato a parlare in altri casi (il plugin per Silverlight funziona anche in altri browser, in Windows), ma è comunque legata al (sempre meno) dominante monopolio della Microsoft, dentro quanto fuori dal web.

Resta comunque il fatto che il sito istituzionale di un servizio pubblico non è (‘universalmente’) fruibile, e se questo è sempre qualcosa da contestare, lo è particolarmente quando il famoso canone RAI è sulla buona strada per diventare una tassa ‘a tappeto’, a prescindere dal fatto non solo che il cittadino voglia vedere trasmissioni RAI, ma persino dal fatto che possa: per esempio, in una casa come la mia, dove non vi sono televisioni ed i computer sono tutti con Linux, le trasmissioni RAI sono semplicemente inaccessibili.

L'effeto mobile

Che la dipendenza da Silverlight in RaiTV sia sostanzialmente spuria, almeno per quello che riguarda i video d'archivio, è evidente già semplicemente guardando il codice delle pagine: gli indirizzi dei video, in vari formati (WMV, MP4, H264) sono messi in bella vista, spesso già nello head del documento.

Volendo, quindi, si può procedere ‘a manina’, aprendo il sorgente del documento (cosa che si può fare in qualunque browser), cercando il videourl apposito, e passarlo al proprio programma preferito per poterlo finalmente vedere. Non proprio quello che si definisce un web ‘accessibili’.

Sul sito ‘genitore’ Rai.it, inoltre, studiando anche solo superficialmente il codice JavaScript con cui le pagine reagiscono ad alcune richieste (come vedere l'ultimo telegiornale o ascoltare l'ultimo giornale radio) si nota subito che il codice in questione prevede la possibilità che Silverlight non sia disponibile, ma limitatamente alle piattaforme mobile: iOS (sui gadget Apple) o Android: se il browser dice di essere su una tale piattaforma, il codice JavaScript provvede ad usare i tag video e audio introdotti con l'HTML5 e disponibili su queste piattaforme mobile.

La domanda è quindi: perché questo tipo di accesso ai contenuti non è reso disponibile anche sui normali desktop? Perché imporre l'uso di Silverlight in questi casi? La scelta è discutibile non solo per la chiusura della piattaforma di distribuzione della Microsoft, ma anche insensata: se il contenuto c'è, perché tenerlo nascosto?

(Peraltro, ad esempio, uno dei (presunti) benefici di Silverlight è la possibilità di adattare lo streaming video alla banda disponibile (abbassando la qualità per evitare salti o interruzioni in caso di “internet lenta”): paradossalmente, questa funzione sarebbe ben più utile su mobile (dove Silverlight generalmente non può essere utilizzato —non ricordo sui due piedi se Windows su mobile lo supporta), che non su desktop, dove generalmente si è collegati con un'ADSL (che si suppone) funzionante.)

È anche vero che appoggiarsi esclusivamente al supporto moderno per audio e video in HTML5 escluderebbe comunque chi (per un motivo o per un altro) ha un vecchio browser che non supporta questi tag. La risposta è semplice, e viene dalla possibilità offerta dal supporto multimediale di HTML5 di ‘ricadere’ su altre scelte quando i tag (o i formati!) non sono supportati al browser.

La struttura dovrebbe quindi essere la seguente: tag audio o video con le appropriate source multiple, in modo che ciascuna piattaforma (desktop o mobile) possa scegliere la più adatta, con un fallback alla situazione corrente: questo permetterebbe ai contenuti da essere fruibili quasi universalmente, per di più con il beneficio di una maggiore uniformità di codice, senza necessità di imporre manualmente i ‘casi speciali’ come viene attualmente fatto attraverso il JavaScript presente sulle pagine.

User JavaScript

Se (o finché) alla Rai non avranno il buon senso di applicare il suddetto suggerimento, la visione dei siti Rai in Linux (e Mac) richiede un intervento manuale (andarsi a cercare le URL dei video, da scaricare o aprire in programmi esterni).

Ma possiamo fare di meglio, sfruttando la possibilità offerta dalla maggior parte dei browser moderni (Chrome e derivati, Opera, Firefox e derivati —questi ultimi con l'ausilio dell'estensione GreaseMonkey) di concedere a script utenti di manipolare la pagina.

È a questo scopo che nasce questo user script, che agisce (quasi) esattamente come suggerito in chiusura del precedente paragrafo. Quasi, perché per una scelta personale il fallback su Silverlight è soppresso: lo script si prendere quindi briga di sostituire l'oggetto Silverlight con l'appropriato tag multimediale HTML5, con gli indirizzi pubblicizzati sulla pagina stessa.

Con la versione corrente dello script, dovrebbero essere finalmente fruibili senza intoppi non solo i video d'archivio su RaiTV (almeno su browser che supportino i codec utilizzati), ma anche alcune pagine del sito ‘madre’ Rai.it.

Buona visione.

A horizontal layout

This is a pure
CSS challenge:
no javascript,
no extra HTML

Webpages, mostly for historical reasons, are assumed to develop vertically: they have a natural horizontal extent that is limited by the viewport (screen, page) width, and an indefinite vertical extent realized through scrolling or pagination.

These assumptions on the natural development of the page are perfectly natural when dealing with standard, classic text layouts (for most modern scripts), where the lines develop horizontally, legibility is improved by limiting the line length and stacking of multiple, shorter lines is preferred to single long lines. In a context where it isn't easy to flow long texts into columns, the vertical development with constrained width is the optimal one.

However, these assumptions become instead an artificial constraint when columns with automatic text flow are possible, and this is exactly what this challenge is about: is it possible to achieve, purely with CSS, a layout that is vertically constrained to the viewport height, but can expand horizontally to fit its content?

The solution that doesn't work

Let's say, for example, that we have the following situation: there's a #page holding a #pageheader, #pagebody and #pagefooter ; the #pagebody itself normally contains simply a #content div that holds the actual page content. We want the #pageheader to sit on top, the #pagefooter to sit at the bottom, and the #pagebody to sit in the middle, with a fixed height, and extending horizontally to arbitrary lengths.

In principle, this should be achievable with the following approach: set the #content to use columns with a fixed width (e.g., columns: 30em) and to have a fixed height (e.g., height: 100%), which lets the #content grow with as many columns as needed, without a vertical overflow: by not constraining the width of any of the #content parents, I would expect it to grow indefinitely. The CSS would look something like this:

html, body, #page {
    height: 100%;
    width: auto;
    max-width: none;

#content {
    width: auto;
    max-width: none;
    height: 100%;
    columns: 30em;

#pagebody {
    width: auto;
    max-width: none;
    height: 100%;
    padding-top: 4em;
    padding-bottom: 8em;

#pageheader {
    position: fixed;
    top: 0;
    width: 100%;
    height: 4em;

#pagefooter {
    position: fixed;
    bottom: 0;
    height: 8em;

Why (and how) doesn't this work? Because there is an implicit assumption that the content will always be limited by the viewport. By setting the html, body, #page and #pagebody heights to 100%, we do manage to tell the layout engine that we don't want the layout to extend arbitrarily in the vertical direction, but even by specifying width: auto and max-width: none, we cannot prevent the width from being limited by the viewport.

What happens with this solution is that the #content overflows its containers, instead of stretching their width arbitrarily (which is what would happen instead in the vertical direction): this is clearly visible when setting up ornaments (e.g. borders) for #page and/or #pagebody. While this results in a readable page, the aesthetics are seriously hindered.

The challenge proper

How do you ‘unclamp’ the HTML width from the viewport width, with pure CSS? How do you allow elements to stretch indefinitely in the horizontal direction?

A potential alternative

In fact, there is a potential alternative to the unclamped width, which relies on the experimental ‘paged’ media proposed by Opera achievable, in the experimental builds of Opera supporting this feature, by setting #content { overflow-x: -o-paged-x} or #content { overflow-x: -o-paged-x-controls}.

In this case, the page would still be clamped to the viewport width (and in this sense this cannot be considered an actual solution to the challenge), but the content would be browsable with ease, and the aesthetics would not be spoiled by the mismanagement of the overflow.

An Opera Requiem? Part II

The news this time aren't as bad the previous rumor. And this time it's news, not rumors: an official announcement has been made on the Opera Developer News that the Opera browser will switch to WebKit as rendering engine and to the V8 JavaScript engine.

Despite the nature of Opera as minority browser (very small userbase compare to other browsers, despite the announcement of ‘over 300 million users’ done in the same article), this announcement has started making quite some noise on the web, with lots of people siding with/against the choice, for a variety of reasons.

Honestly, I think this is ‘bad news’, not so much for Opera itself, but for the (future, upcoming) ‘state of the web’ in general.

On its impact for Opera

Let's start from the beginning, and summarize a few things.

So far, Opera's rendering engine(s) (currently Presto) has been the fourth (third, before the surge of WebKit) major rendering engine, beyond Trident (the one in Internet Explorer) and Gecko (the one in Mozilla Firefox). It was completely developed in-house, independently from the others, and it has for long held the crown as being the fastest, most lightweight and most standard-compliant renderer (I'll get back to this later).

Of course, Opera is not just its rendering engine: there are lots of user interface ideas that were pioneered in this browser (but made famous by other browsers), and the browser itself is actually a full-fledged Internet suite, including a mail and news client, an IRC client, a BitTorrent client, and so on and so forth. Amazingly, this has always been done while still keeping the program size and memory footprint surprisingly small.

With the announcement of the migration to WebKit as rendering engine, one of the things that used to make Opera unique (the speed, lightness and compliance of its rendering engine) disappears. This doesn't mean that the reason for Opera to exist suddenly ceases: in fact, its UI, keyboard shortcuts, development tools etc are the things that make me stick with Opera most of all.

Opera can still maintain (part of) its uniqueness, and even continue innovating, because of this. Not everybody agrees, of course, because the general impression is that by switching to WebKit Opera just becomes “another Chrome skin” (forgetting the WebKit was born from Konqueror's KHTML engine, and became WebKit after being re-released by Apple, cleaned up and refactored, as Safari's rendering engine).

And in fact, its UI and tools will keep being for me a primary reason to stick to Opera.

While the argument “if Opera is just a skin over Chrome or Chromium, why would I choose to use that instead of the original?” is probably going to make Opera lose a bunch of its 300 million users, I feel that the focus is being kept on the wrong side of the coin, because it focuses specifically on the impact of the choice for Opera itself, which is not the major problem with the migration, as I will discuss later.

There are also a number of people that remark how the change is a smart move on Opera's part, because it means less development cost and higher website compatibility.

On its impact on the web

And this, which an astoundingly high number of people see as a good thing, is actually what is horribly bad about Opera's migration to WebKit. The “it's good” argument goes along the line that the reduction in the number of rendering engines is good for Opera itself (breaks on less sites), and (more importantly) it's also good for all the web developers, that have to check their site against one less rendering engine.

And this, that they see as good, is bad. Actually, it's worse than bad, it's horrible. It's the kind of lazy thinking that is fed by, and feeds, a vicious circle, monocultures (sorry, it's in Italian). It's the kind of lazy thinking that leads to swamping.

Unsurprisingly, this is exactly the same thing that IE developers warn about: now that they find themselves on the losing side of the (new) browser wars, they suddenly realize how standard compliance is a good thing, and how browser-specific (or engine-specific, in this case) development is bad. And they are right.

Monoculture, and the consequent swamping, is what happened when IE6 won the last browser war, and it is what will happen —again— now that WebKit is becoming the new de facto “standard implementation”. There are a number of websites that fail to work (at all or correctly) when not used with WebKit: this is currently due to their use of experimental, non standard features, but it's still problematic.

It's problematic because it violates one of the most important tenets of web development, which is graceful degradation. It's problematic because it encourages the use of a single, specific rendering engine. It's problematic because as this ‘convention’ spreads, other rendering engines will start to be completely ignored. It's problematic because getting out of monocultures is slow and painful, and falling into one should be prevented at all costs.

Some people think that the current situation cannot be compared with the IE6 horror because the new emerging monoculture is dominated by an open-source engine. And I have little doubt that an open-source monoculture is better than a closed-source one, but I also think that this barely affects the bad undersides of it being a monoculture.

Monocultural issues, take II

The biggest issue with monoculture is that it triggers a vicious circle of lazyness.

This may seem paradoxical in the current context, because one of the reasons WebKit-specific websites choose WebKit is because it has lots of fancy, fuzzy, funny, interesting experimental features. People choose WebKit because it innovates, it introduces new things, it provides more power and flexibility! Which is true, as it was true about Trident when IE6 won the war.

But these experimental features don't always mature into properly-functioning, accepted standards. Building entire websites around them is dangerous, and wrong when no fallback is provided. (Unless the entire purpose of the website is to test those features, of course, but we're talking about production here.)

One of the complaints I've read about Opera is that the support of its Presto engine for the more recent CSS3 standard is lacking, as in it implements less of it than WebKit. This is quite probably true, but in my experience the compliance of the Presto engine to the parts of the standard that it actually implements is much better than any other rendering engine.

What is happening with WebKit is that hundreds of ‘new features’ are being poorly implemented, and their poor implementation is becoming the de facto standard. As people target WebKit, they only care about working around the WebKit bugs. This is bad for two reasons: one, it prevents other engines that don't have those bugs from presenting those pages correctly; two, it demotivates bugfixing in WebKit itself.

As I happen to use Opera as my primary browser, I tend to design for Presto first, and then check against WebKit (with Chromium) and Gecko (with Firefox) if the page still renders correctly. So far, whenever I've found unexpected rendering in either of the other browsers, it has been because those browsers violate the standard. In fact, checking the rendering of a webpage in other browser is still the best way to check where the bug lies. (Just an example of a bug in Chromium that I came across while working on this page.) Still now, when reporting bugs, the first question that is often posed is “how does this thing behave in other browsers?”

If all browsers use WebKit, there is suddenly no more motivation in fixing the bug, even when the standard claims a different behavior should be expected, and the different behavior is the sane one. If there is only one implementation, even if it's wrong and this creates a problem, it is quite likely that the bug will go unfixed, especially if fixing it will break sites that have adopted the wrong behavior as being the ‘good’ one.

The loss of Presto as a rendering engine is not something that should be cherished with relief. It's something that should be cried with desperation, because it's a nail in the coffin of web development.

My hope, at this point, is that Opera will do something really helpful, and it is to fix all of the horrible messy non-compliant stuff that currently cripples WebKit. They started already. Let's at least hope that this continues, and that their patches will make their way upstream: and even if they don't, let's at least hope that they will keep the fixes in their implementation.

(But if the switch is motivated (also) by the need to cut development resources —an hypothesis which is likely to be at least partially true— how much can we hope that these kind of contribution will continue?)

Transformer Infinity: an expensive toy

In late 2012 I decided it was finally time to gift myself with a Transformer Infinity, a piece of hardware (or rather a class of hardware) I had had my eyes on for sometime. After a couple months of usage, I can finally start writing up my thoughts on its hardware and software.

This ‘review’ should be read keeping in mind that I'm not an ‘average user’: I'm not the ‘intended target’ for this class of devices, I'm a power user that prefers control to “kid-proof interfaces”, and I have some very specific needs for what I could consider the ultimate device. This is particularly important, and will be stressed again, in discussing software. And all of this will come together in the conclusions (there's also a TL;DR version for the lazy).

The hardware

One of the things, if not the thing, I like the most about the Transformer line is the brilliant idea of enhancing the practical form of the tablet with the possibility of converting it to a netbook: after all, one of the things that had been bothering me about the whole tablet concept was the impracticality of the on-screen keyboard, stealing reading estate to offer something on which typing for long periods is not exactly the most comfortable experience.

While it is possible to use external (bluetooth, typically) keyboards with other tablets, the simple yet brilliant idea in the Transformer is to have the keyboard as an integral (yet detachable, and separately bought) part of the tablet, additionally acting as a cover. The idea was very poorly copied also by the Microsoft Surface, that however misses some of the key points that make the Transformer so good, such as the fact that the TF keyboard is actually a full-fledged docking station, also offering extra battery and additional connectors.

This feature has thus been the major selling point of the TF while I was evaluating which tablet to get, once I was settled on getting one. The other factors that came into play were screen resolution and connectivity. The next competitor in line was Google's Nexus, that while sporting a much better resolution than the TF700T, was missing any kind of external media support: the Transformer, in addition to a micro-SD slot on the tablet also features an USB port on the docking station (yes, you can plug USB pen drives into it). Oh, and of course the fact that the 10" version of the Nexus was not actually available in my country also influenced the decision.

In the end I decided that the TF700T had a high enough resolution for my tastes. In fact, screen resolution is the reason why I finally got the Transformer Infinity rather than the Padphone, this other brilliant idea from Asus of having a smartphone that gets embedded in a tablet which is itself essentially like a Transformer (supporting the same keyboards/docking stations). If the tablet component of the Padphone didn't lag behind the actual Transformer line in terms of hardware, I would have definitely shelled the extra euros to get it.

And while we're talking about screen, I'm among those that doesn't like the wide formats (16:9, 16:10) which is currently standard (if not the only option) for monitors; however, I do believe that these formats are a good idea compared to the 4:3 ratio of auld times and modern iPads when it comes to tablets.

Indeed, the most annoying part about widescreen monitors (for computers) is that a lot of the available screen estate is wasted for many common usages (everything that revolve around text, essentially), and while they do come handy with multiple text windows side by side or when forced to read long-ass lines (such as some wide tables and stuff like that), they are not really a good alternative to just a bigger, higher-resolution (4:3) display, as the widescreen allows reading longer text lines at the expense of the number of text lines. And like or not, much of our computer usage (even if it's just social networking) still revolves around text.

However, what is annoying in computer monitors is actually a bonus point on tablets: since these devices are often used in portrait mode, with the longer dimension being kept vertical, their widescreen format is actually a long screen format, keeping more text lines in view and requiring less page-flipping. And it's not only about text: when reading full-page comics, the widescreen (longscreen) format actually wastes less screen estate, typically, than the 4:3 format, at least in my experience (but then again it might depend on what format the comics you read are in).

Stuff I don't like

Although I'm overall pretty satisfied with the Transformer Infinity hardware, there are a few things I don't like.

The first issue I have is with the glossy display. I have an issues with glossy displays in general, not just on this tablet. I hate glossy displays. I find it astounding that after years of efforts to make computer monitors anti-glare, no-reflection and overall less straining for the eyes of long-term users (when computers meant office space and work), the last 10 years have seen this fall back to displays that can only be decently used in optimal lighting conditions.

And no, no IPS (plus or nonplus) or other trick is going to solve the problem of a horrendously reflective surface. It helps, but it doesn't solve the problem. Even my colleague, die-hard Mac fan, finally had to acknowledge that the purportedly unreflective glossy display in the latest MacBook Pros is still more straining than the crappiest matte display in suboptimal lighting conditions, i.e. almost always.

But sadly, matte displays don't seem to be an option on tablets, that I can see (ebook readers do have them, though). So regardless of how much it bothers me, there seems to be no alternative to watching myself in the mirror when looking at darker content. Seriously, can some manufacturer come up and offer matte displays please? I'm even willing to spend a couple of extra euros for that (not 50, but up to 10 I would accept, even knowing that the process costs just a few cents per display).

The second thing I don't like about the Transformer Infinity is the connector. When I unpacked the TF700T (which I got before the docking keyboard), I was seriously pissed. What the heck Asus, I spend my days mocking iPad users for their ass proprietary connector and you play this dirty trick on me? Not cool.

Then I realized that the connector is actually the same connector that ties the tablet to the docking keyboard, and I realized that a standard USB port would have not made sense. While the external USB port of the dock, its SD card slot and the keyboard could possibly all have been made accessible to the tablet by presenting the dock as an unpowered USB hub, it would have been impossible to also allow charging the pad from the dock, which is in fact one of the most useful feature the Transformer has.

Tightly related to the connector issue is the power issue: although the other end of the power cable for the Transformer is a standard USB cable, you can't typically charge it from a standard power source, except for slow trickle charging with the device off or at least in standby, due to the power draw. Asus' own “wall wart” (the wall plug/USB adapter), on the other hand, detects the presence of the Transformer and can feed it more current (15V, 2A if I'm not mistaken) thereby allowing faster charging and making it possible to charge the device even while in use (very useful for bedside use at the end of the day, when the device battery is likely to be close to exhaustion)

I honestly wouldn't mind if the industry came up with a common standard that worked around the current limitations of USB, at least for device charging: even the N900, Nokia's best although now obsolescent phone, is a little picky on its power source, refusing to charge from some low-cost ‘universal’ chargers (it does work correctly with the wall wart that shipped with an HTC smartphone we bought a couple of years ago, so it does work with non-Nokia chargers).

Finally, not really necessary but a bonus point of docking keyboard would have been an Ethernet port: Asus themselves have found a very smart way to keep the port ‘thin’ when not in use, a solution that they use in their latest ultrabooks (or whatever you want to call thin, 11" laptops), a solution that I believe could be employed also in the docking keyboard of the Transformer. Its absence is not really a negative point (how often are you going to need to use a network cable with a device as portable as a tablet), but would have been a nice addition for its use in netbook form.

The software

The Transformer Infinity is an Android tablet. Most of what I'm going to say here is therefore about Android in general, except for the few things that have been ‘enhanced’ or otherwise changed by Asus.

Android, for me, is interesting, because it's probably the first (successful) example of large-scale ‘macroscopic’ deployment of the Linux kernel beyond the ‘classic’ server or workstation use (only recently trickled down into domestic use with Ubuntu and related distributions). (By macroscopic I am here referring to the systems with which the user interacts frequently, thereby excluding embedded systems —think of the many Linux-based ADSL modem/routers.)

While Android shares a very important part of its core with ‘classic’ Linux distributions (and even there, not really, since the Linux kernel in Android is heavily modified and it has only been recently that its changes have started trickling upstream into the main Linux source), the userspace part of Android, and specifically the middleware, the software layer between the Linux kernel and the actual user applications, is completely different.

Because of this, Android is actually the first system that suddenly motivates the FSF insistence on having the classic Linux systems be called GNU/Linux rather than simply Linux. On the other hand, the userspace in classic Linux system is not just GNU (and it's not like the X server, or desktop environments such as KDE, are insignificant components), so isn't just GNU/Linux just a little arrogant?

But I digress. The fact that Android is not a classic Linux distribution, however, is an important point, especially for someone like me, for reasons that I'm going to explain in the following.

Android, much like iOS, is an operating system designed for devices whose main target use is (interactive) consumption rather than production. Sure, there are applications available for both systems that can exploit the device features in creative ways, but even these are mostly focused on personal entertainment than anything else.

It's not like the operating systems actively prevent more sophisticated and heavy-duty usages: it's just that they don't particularly encourage it, since the usually limited hardware of the devices they run on wouldn't make productivity particularly comfortable.

After all, even I, a tinkerer and power user, finally bought the tablet having comic book reading in mind as its primary use (although admittedly 700€ is a little too much for just that).

For this intended target, Android is exceptionally well-designed. Thanks also to the very tight integration with the wide range of ‘cloud’ services offered by Google, it provides a very functional environment right from the start; and since it's “all in the cloud”, you don't even have to worry about synchronization among devices. All very fine and dandy —as long as you don't care having all your data in the hands of a single huge company whose main interest is advertising.

As if selling your soul to Google wasn't enough, Asus adds some of its own, by keeping track of your device with ridiculously extreme precision, even when geolocation services are disabled. I would recommend not carrying your Asus-branded Android device when committing crimes (I don't actually know if this “phoning home” thing is Asus-specific or general for Android), but the feature could come in handy if somebody stole it.

Aside from these creepy aspects, as I was saying, Android is actually quite nice. The software choice so far has also been rather satisfactory: aside from the games that I bought from the Humble Indie Bundles, the must-have Simon Tatham's Puzzles collection (which I have everywhere) and a few others available for free (many with ads), the most important piece of software I took care of installing was Perfect Viewer, which I obviously used mostly as a comic book reader.

In fact, Perfect Viewer is an excellent example to introduce what I really hate about Android: control. Perfect Viewer has a very useful feature, which is the ability to access files on remote machines. Why is this feature useful? Because Android doesn't provide it by default.

This, in my opinion, is a horrible failure on the part of the operating system: it should be its duty, after all, to provide a unified method to access remote files, which would be transparently available to all applications. Why should every application reimplement this basic functionality? Result: you can't peruse your home-server-stored media collection from VLC on Android, but you can peruse your graphics novel collection because one application went the extra mile to implement the feature.

The failure of Android to actually provide a built-in method to access remote directories is particularly grave considering that there is no practical reason why this shouldn't be available: the kernel (Linux) is very apt at mounting remote filesystems with a variety of protocols, and Android itself is already designed to expose mount points transparently to applications (easily seen when making use of the (micro-)SD card slots available on devices that provide them).

So not providing the possibility to mount remote shares is actually a design choice that require disabling feature Linux can provide. And I find it interesting that the web offers a plethora of tutorials to guide people through the gimmicks necessary to make the feature available (gimmicks that include ‘rooting’ your device to gain complete control of it). I find this interesting because it shows that it's not just ‘power users’ like me that need this feature (unsurprisingly, as media collections on home servers are common, and growing in popularity, and tables are good for media consumption —if you have a way to actually access the stupid media).

One is taken to wonder why is this feature not available in Android's stock builds. Sadly, the only reason I can think for this is that this forces people to unnecessarily use online services (such as —oh right— the ones offered by Google) that provide selective, targeted, and pricier alternatives to the widespread home server approach.

But I see this as a single instance of a more general problem with Android, i.e. the lack of control from the user. There's a lot in Android happening ‘behind the scenes’, and much of it is something which is intentionally hidden from the user, and which the user is actively prevented from operating on.

While the devices where Android runs are general purpose devices (like all computers), the operating system is designed to only allow exposure to selected features in selected ways. And even though it's not as bad as the competition (for example, in contrast to iOS, enabling “out of band” installations, i.e. installation of applications not downloaded from “official” channels like the Google Play Store, is a simple option in the settings), it's still a strong contribution to the war against general purpose computing (a topic on which I have a lot of things to say but for which I haven't yet found the time to patiently write them down).

Compare this with Maemo, the stock operating system of the N900 (ah, sorry, that's in Italian only for the time being): a full-fledged Debian-based Linux distribution; while the device is still very easy to operate, the underlying power and flexibility of the (GNU and more) classic Linux userspace remains accessible for those that want it. Maemo showed pretty clearly that you don't need to sacrifice power and flexibility to offer ease of use —unless that's what you actually want to do. And the fact that you do want to do that is for me an extremely negative sign.

There are efforts to make Android more power-user friendly, even without requiring hacks or rooting, the most significant probably being applications such as Irssi ConnectBot and the Terminal IDE. However, there's only so much they can do to work around some intrinsic, intentional deficiencies in the operating system.

Some conclusions

Ultimately, I was actually hoping to be able to exploit the convertible nature of the Transformer Infinity to make the device supplant my current ‘bedtime computer’, a Samsung N150 netbook that has been faithfully serving us since we bought it when we married.

At the hardware level, the TF700T could quite easily do it: it has the same screen size, with higher resolution (although the Samsung display does have the benefit of being matte), the keyboard is similarly sized, the battery (especially when docked) lasts longer, the CPU is better, the GPU is better, the webcam is better (and there are two of them), the amount of RAM is the same. There are some things in which the Transformer falls behind, such as having a smaller hard-disk, or less connectors, but these are things that don't normally have a weight in the usage I have for my netbook.

Where the Transformer falls really behind, though, is in the software space. The netbook has Windows preinstalled (and I'm keeping it that way so for those rare emergencies where an actual Windows installation might be needed), but I have a nifty USB pendrive with a Linux distribution on it, which I boot from to have exactly the system that I need: all of tools are there, it's configured to behave the way I want it, and so on and so forth. On the TF700T, I can't boot from the USB pendrive (I'd have to prepare a new one anyway, because of the different hardware architecture, but that wouldn't be difficult). I wouldn't even need to, in fact, if only Android wasn't such a crippled Linux.

It's no surprise that there are efforts underway to be have both the Android and the more classical Linux userspaces are your hands, such as this one, that gives you both an Android and Debian systems running on the same (Android) kernel. It's not perfect (for example, it still relies heavily on the Android subsystem for keyboard handling, and the X server must be accessed ‘remotely’), but it's a step in the right direction.

My ideal system? A Dalvik virtual machine running on something like Maemo or Meego. There actually was a company (Myriad Group) working on something like this (what they call “Alien Dalvik”): not open source, though, nor universally accessible (it's for OEMs, apparently). Pity.


I like (with a couple of caveats) the hardware of the Transformer Infinity (TF700T) and its capability of becoming a netbook by adding the mobile dock. I wish the software (Android) were friendlier to power users, though, to better exploit this.

GPGPU: what it is, what it isn't, what it can be

A little bit of history

My first contact with GPGPU happened during my first post-doc, although for reasons totally unrelated to work: an open-source videogame (an implementation of the Settlers of Catan boardgame) I was giving small contributions to happened to have a header file which was (legitimately) copied over from the GPGPU programming project.

This was 2006, so the stuff at the time was extremely preliminary and not directly supported by the hardware manufacturers, but it did open my eyes to the possibility of using graphic cards, whose main development was geared towards hard-core gaming, for other computational purposes, and particularly scientific ones.

So where does GPGPU come from?

The term GPU (Graphic Processing Unit) emerges in the mid-90s, to describe graphic cards and other video hardware with enough computational power to take care of the heavy-duty task of rendering complex, animated three-dimensional scenes in real time.

Initially, although GPUs were computationally more gifted than their predecessors whose most complex task was blitting (combining rectangular pixel blocks with binary operators such as AND, OR or XOR), their computational power was limited to a set of operations which is nowadays knows as the “fixed-functions pipeline”.

The barebone essentials you need to render a three-dimensional scene is: a way to describe the geometry of the objects, a way to describe the position of the light(s), and a way to describe the position of the observer. Light and observer positions are little more than points in three-dimensional space (for the observer you also need to know which way is ‘up’ and what his field of view is, but those are details we don't particularly care about now), and geometries can be described by simple two-dimensional figures immersed in three-dimensional space: triangles, squares. Of course, since simple colors will not get you far, you also want to paint the inside of these triangles and squares with some given pictures (e.g. something that resembles cobblestone), a process that is called ‘texturing’.

Once you have the geometry (vertices), lights and observer, rendering the scene is just a matter of doing some mathematical operations on them, such as interpolation between vertices to draw lines, or projections (i.e. matrix/vector products) from the three-dimensional space onto the two-dimensional visual plane of the observer. Of course, this has to be done for every single triangle in the scene (and you can have hundreds, thousands, hundreds of thousands or even millions of triangles in a scene), every time the scene is rendered (which should be at least as often as the screen refreshes, so at least some 50, nowadays 60, times per second).

Fixed-function pipelines in GPUs are therefore optimized for very simple mathematical operations, repeated millions (nowadays even billions) of times per second. But as powerful as you can get, there are limits to where simple triangles and a naive lighting model can get you: and this is why, by the end of the XX century, hardware support for shaders started popping up on GPUs.

Shaders are programs that can compute sophisticated lighting effects (of which shadows are only a small part). Since the effects that may be achieved with shaders are very varied, they may not be implemented within the classic fixed-function pipeline. Dedicated computational hardware that could execute these programs (called kernels) had to be introduced.

And suddenly, video cards were not fixed-function devices anymore, but had become programmable, even though still with limitations and peculiar behavior: shader kernels are programs that gets executed on each vertex of the geometry, or on each pixel of the scene, and only a limited number of computational features were initially available, since the hardware was still designed for the kind of manipulation that would be of interest for 3D rendering.

However, with all their limitations, GPUs now had a very interesting feature: you could tell them to do a specific set of operations on each element of a set (vertex, pixel). The essence of parallel programming, with hardware designed specifically for it. So why not abuse this capability to do things which have nothing to do with three-dimensional scene rendering?

This is where GPGPU started, with some impressive (for the time) results. Of course, it was all but trivial: you had to fake a scene to be rendered, pass it to the card, ask it to render it and manipulate the scene data with some shader kernels, and then get the resulting rendered scene and interpret it as the result of the computation. Possible, but clumsy, so that a number of libraries and other development tools started appearing (such as Brook) to make the task easier.

As usage of GPUs for non-graphical tasks spread, hardware manufacturers started to realize that there was an opportunity in making things easier for developers, and the true power of GPGPU was made available.

The first ‘real’ GPGPU solutions started appearing between 2006 and 2007, when the two (remaining) major GPU manufacturer (ATi —shortly after acquired by AMD— and NVIDIA) realized that with minimal effort it was possible to expose the shader cores of the GPU and make them available beyond the simple scope of scene rendering.

Although buffers, texture engines and shader cores were now made accessible outside of the rendering pipeline, their functional behavior was not altered significantly, something that has a significant impact on their optimal usage patterns and some behavior peculiarities that inevitably arise during the use of GPUs as computing devices.

The GPU is (not) my co-processor

Before the Pentium, Intel (and compatible) CPUs had very limited (floating point) math capabilities, since they were deemed unnecessary for the common market. If you were a scientist or other white-collar worker that needed fast floating-point computations, you could however shell money for an FPU (Floating-point Unit), an auxiliary processor specialized in floating-point operations; these units were marked with a numerical code which was the same as the CPU, except for the final digit, a 7 instead of a 6: so you would have the 8087 next to an 8086, or a 387 next to a 386; and by ‘next’ I mean physically next to it, because the socket where the FPU had to be inserted was typically adjacent to the socket of the CPU.

The FPU as a co-processor started disappearing with the 486, which had two variants, whose high-level one (the 486DX) had an FPU integrated in the actual CPU. With the introduction of the Pentium, the FPU started being a permanent component of the CPU, and it started evolving (it had remained essentially unchanged since the inception) to support the famous extended ‘multimedia’ instruction sets (MMX, 3DNow!, the various SSE generations, up until the latest AVX extension) of subsequent CPUs. (And by the way, the fact that the FPU and MMX functionalities were implemented in the same piece of hardware had a horrible impact on performance when you used both at the same time. But that's a different topic.)

One of the tenets of GPGPU (marketing) is that the GPU can be used as a co-processor of the CPU. However, there are some very important differences between a co-processor like the FPU, and the GPU.

First of all, the FPU was physically attached to the same bus as the CPU, and FPU instructions were part of the CPU instruction set: the CPU had to detect the FPU instructions and either pass control to the FPU or decode the instructions itself and then pass the decoded instruction to the FPU. Secondly, even though the FPU has a stack of registers, it doesn't have its own RAM.

By contrast, the GPU is more like a co-computer: it has its own RAM, and its own instruction set of which the CPU is completely unaware. The GPU is not controlled by the CPU directly, as it happens with a co-processor, but rather the software driver instructs the CPU to send the GPU specific bits which the GPU will interpret as more or less abstract commands such as “load this program (kernel)”, “copy this data”, “execute this other program (kernel)”.

Since all communication has to go through the PCI bus, naively using the GPU as a coprocessor is extremely inefficient: most of the time would be spent just exchanging commands and data; this, in fact, was one of the reasons why the old GPGPU approach based on the graphics stack ended up consistently underperforming with respect to the expectable GPU speed.

The most efficient use of the GPU is therefore as an external machine, communication with which should be limited to the bare minimum: upload as much data as possible at once, load all the programs, issue the programs in sequence, and don't get any (intermediate) data back until it's actually needed on the CPU. It's not just a matter of offloading heavy computations to the GPU: it's about using a separate, complex device for what it was designed for.

How much faster is a GPU?

When the GPGPU craze found its way into marketing (especially with NVIDIA's push for their new CUDA technology), the GPUs were boasted as cheap high-performance co-processors that would allow programs to reach a speed-up of two orders of magnitude (over a hundred times faster!), and a large collection of examples showcasing these incredible benefits started coming up. The orders of magnitude of speed-ups even became the almost only topic of the first published ‘research’ papers on the subject.

Although such incredible speed-ups are quite possible when using GPUs, the reality is quite more complex, and a non-negligible part of these speed-ups are actually possible even on standard CPUs. To understand more in detail what practical speed-ups can be expected, we have to look at the fundamental areas where GPUs perform (potentially) much better than CPUs (computational power and memory bandwidth), and the conditions under which this better performance can actually be achieved.

Faster memory (?)

Let us look at memory first. It's undeniably true that GPU memory is designed to have a much higher bandwidth than the RAM normally mounted on the computer motherboard (hereafter referred to as ‘CPU memory’): in 2007, when the GPGPU started being officially supported by hardware manufacturers, GPUs' memory had peak theoretical bandwidths ranging from 6.4 GB/s (on low-end GPUs using DDR2 chips for memory) to over 100 GB/s (on high-end cards using GDDR3 chips). By comparison, CPUs usually had DDR2 chips, whose performance ranges from 3.2 GB/s to 8.5 GB/s. Now (2012) GPUs can reach bandwidths of almost 200 GB/s with GDDR5 memory, whereas the best CPUs can hope for is less than 20 GB/s on DDR3.

Since the bandwidth is almost consistently an order of magnitude higher on GPUs than on CPUs, one should expect an order of magnitude in speed-up for a problem that is memory-bound (does a lot of memory access and very little computations), assuming it can get close to the theoretical bandwidth peak and assuming the data is already on the device.

We'll talk about the problem of putting data on the device later on, but we can mention a few things about reaching the peak bandwidth already, without getting into too much details.

The silver lining in the higher bandwidth on GPUs is latency. While CPU to (uncached) RAM access latency is usually less than 100ns, on GPUs this is 3 to 5 times higher; and the first GPUs had no cache to speak of (except for textures, but that's a different matter, since textures also have lower bandwidth). Of course, GPUs have specific methods to cover this high latency: after all, a GPU is optimized for moving large slabs of data around, as long as such data is organized ‘appropriately’, and the memory access are designed accordingly.

Therefore, memory-bound GPU algorithms have to be designed in such a way that they make as much as possible use of these latency reduction techniques (coalescing on NVIDIA GPUs, fastpath usage on AMD GPUs), lest they see their performance drop from being 10 times faster than on CPU to being no more than 2 or 3 times faster. These remarks are particularly important for the implementation of scatter/gather or sorting algorithms.

Faster computing (?)

Of course, where GPUs really shine is not in juggling data around, but in doing actual computations on them: gamer GPUs passed the (theoretical) teraFLOPS barrier in 2008 (Radeon HD 4850), when the best (desktop) CPU of the time fell short of achieving some theoretical 60 gigaFLOPS, and most common ones couldn't dream of getting half as much.

But from 20 gigaFLOPS to 1 teraFLOPs there's only a factor of 50: so where do the claimed two orders of magnitude in speedup come from? Unsurprisingly, the difference comes from a consistent underutilization of the CPUs. We'll leave that aside for the moment, though, and focus instead on the impressive (theoretical) performance sported by GPUs.

The first thing that should be mentioned about GPUs is that they are not designed to be fast in the sense that CPUs are fast. For years, CPU performance was strongly tied to the frequency at which it operates, with a theoretical upper limit of one instruction per cycle, which would mean that a CPU running at 1GHz couldn't do more than one (short) billion operations per seconds. These days, the fastest desktop CPUs run at over 3GHz, while the fastest GPUs have computing clocks which are at about 1GHz, or even less.

However, GPUs are designed for massively parallel tasks, such as running a specific sequence of instructions on each element of a set of vertices or pixels, with each element being processed independently (or almost independently) from the other. The shaders in GPUs are made up by a large number of processing elements collected in multiprocessors, with each multiprocessor capable of executing the same single instruction (sequence) on a large number of elements at once.

In some sense, GPUs can be seen as a collection of SIMD (Single Instruction, Multiple Data) processors (typically, 10 or more), each with a very wide vector width (typically 32 for NVIDIA, 64 for AMD); while modern CPUs are also SIMD-capable, with their MMX and SSE instructions, and can also sport multiple cores, they have less SIMD lanes (typically 4 or 8), and less cores (2 to 6) than GPUs.

The GPU programming tools and languages expose this massive parallel computing capability, and make it very easy to exploit it. The simplest GPU programs consist in a kernel, i.e. sequence of instructions that are to be executed (typically) on a single element of a set, which is given to the GPU to be run on all the elements of a given set.

By contrast, exploiting the vector capabilities of modern CPUs and their multiple cores require complex programming techniques, special instructions which are barely more abstract than their hardware counterparts, and complex interactions with the operating system to handle multiple threads to be distributed across the cores.

In other words, it's much easier to exploit the massively parallel nature of the GPUs than it is to exploit the available parallel computing capabilities of the CPUs. And this is where the two orders of magnitude in performance difference come from: CPUs are rarely used as more than single-core scalar processors.

Still, even when comparing well-designed CPU programs with well-designed GPU programs it's not surprising to see minutes of runtime be reduced to seconds. If you see less than that, you're probably doing something wrong, and if you're seeing much more than that, your CPU program is probably far from being optimal.

The question then becomes: how hard is it to write a well-designed GPU program as opposed to a well-designed CPU program? But this is a question I'll leave for later. For the time being, let's just leave it at: non-trivial.

Up and down (loads)

As previously mentioned, GPUs normally have their own memory, separate from the system memory (this is not true for integrated GPUs, but they deserve a separate paragraph). Therefore, using the GPU involves transferring data to it, and then retrieving the results when the kernel(s) have finished their operation.

The time spent uploading data to the GPU and downloading data from it is not necessarily insignificant: through a PCI express 2.0 ×16 link you can hope for an 8 GB/s transfer rate, but 5 or 6 GB/s are more likely; and this is pretty close to being the top of the line. When compared to the GPU or even the CPU memory bandwidth, this can be a very significant bottleneck.

This, combined with the relatively small amount of memory available on GPUs (less than a gigabyte when GPGPU started, slightly over the gigabyte four years later), poses an interesting paradox on the convenience of GPUs.

On the one hand, GPUs are most convenient when processing large amounts of data in parallel: this ensures, together with well-designed algorithms, that the GPU hardware is used full-scale for an adequate amount of time.

On the other hand, there's a limit to the amount of data you can load at once on the GPU: desktops today are commonly equipped with 4 gigabytes of RAM or more (dedicated workstations or servers can easily go in the tens of gigabytes), so they can typically hold larger amounts of data. The only way to process this on standard desktop GPUs is to do it in chunks, which means uploading the first chunk, processing it, downloading the result, uploading the new chunck, and so on.

Luckily enough, GPUs are typically equipped with asynchronous copy engines, so the situation is not as dreary as it would be. In many cases, especially with modern devices, it is possible to overlap computations and host/device data transfers so as to hide the overhead of the data exchange. This, in fact, is one of the many ways in which GPGPU programming can become complex when optimal performance is sought.

Is it still worth it?

Even if two orders of magnitude may not be possible to achieve without extensive programming efforts to produce extremely optimized code, the simple order of magnitude one can get for most trivially parallelizable problems is most often worth the time necessary to reimplement computationally-heavy code to use GPGPU.

One of the most interesting feature of the shared memory parallel programming approach needed for GPGPU is that, when it can be employed, it's a much more future-proof way of coding than serial execution. The reason is clear: while serial execution can only improve by using faster processors (and there are upper physical limits which are getting closer by the year to how fast a scalar processor can go), parallel algorithms can get faster by ‘just’ adding more computationa units. In theory, a perfectly parallel algorithm will take half the time to run on twice the cores, and while the reality is less ideal, the performance gain is still quite perceptible.

The hidden benefits of GPGPU

In many ways, the most important contribution of GPGPU to the domain of computer science and software engineering has not been the actual performance benefits that a lot of fields have seen from the availability of cheap parallel computing platforms.

There's indeed a much longer-term benefit, that will be reaped over the next years, and it's precisely the shift from serial to parallel programing we just mentioned. Before GPGPU, parallel programming was left to the domain of expensive computational clusters and sophisticated programming techniques; GPGPU has shown that there are huge opportunities for shared-memory parallel programming even on the lower end of the market.

The reborn interest in parallel programming triggered by GPGPU is gradually leading to the development of an entirely new mentality both in terms of software development and hardware realities. Although it is still to be seen how it will ultimately pan out, there are significant signs that we're only starting to scratch the surface of technologies that can revolutionize computing to an extent that could only be compared with the effects of the commercialization of the Internet twenty years ago, and the introduction of the first microcomputers twenty years before that.

GPGPU is bleeding out of the GPU market, in an interesting combination of paradoxical feedbacks and returns. There are people that have implemented ray-tracing using GPGPU: the GPUs go back to their intended task, but through features that were designed to make them usable outside of their domain. At the same time, CPUs gain more powerful parallel computing features, and integrated CPU/GPU solutions bring the GPU more in line with the standard co-processor role marketing wanted to sell GPGPU for.

We are starting to see a convergence in technolgy. At this point, the only danger to the rich potential of this dawning era is the petty commercial interest of companies that would rather see the market collapse under fragmentation than prosper without their dominance.

Let us hope that this won't happen.

Linux and the desktop

Discussing this after the recent debate that involved big names such as Linus Torvalds and Miguel de Icaza may seem a little inappropriate, but I guess I'll have to count this against my usual laziness for writing stuff up when I think it instead of waiting for it to become a fashion.


The reason why the topic has recently re-emerged (as it periodically does) has been a write-up by the afore-mentioned Miguel de Icaza, titled What killed Linux on the desktop.

According to the author of the rant, there are two main reasons. The first was the generally disrespect for backwards compatibility, in the name of some mystic code purity or code elegance:

We deprecated APIs, because there was a better way. We removed functionality because "that approach is broken", for degrees of broken from "it is a security hole" all the way to "it does not conform to the new style we are using".

We replaced core subsystems in the operating system, with poor transitions paths. We introduced compatibility layers that were not really compatible, nor were they maintained.

paired with a dismissing attitude towards those for which this interface breakage caused problems. The second problem has been, still according to de Icaza, incompatibility between distributions.

Miguel de Icaza then compares the Linux failure with the success of Apple and its operating system, which apparently did things “the way they should have been done” (my words), highlighting how Mac OSX is a UNIX system where things (audio, video, support for typical content formats) works.

Karma whoring

There's little doubt that “Linux on the desktop” is a hot topic with easy polarizing and bandwagoning. By mentioning three of the most famous problematic areas in the most widely adopted Linux distribution(s) (audio support, video support and support for proprietary audio and video formats), de Icaza basically offered the best bait for “me-too”-ing around the Internet.

And unsurprisingly this is precisely what happened: a lot of people coming up with replies to the article (or referencing the article) with little more than “yeah! exactly! these were exactly the problems I had with Linux!”.

An optimist could notice how many people have reacted this way, combine with those that have reacted the opposite way (“BS, I never had a problem with Linux!”) and be happy about how large the pool of (desktop) Linux users and potential users is. On the other hand, the whole point of the article is to (try and) discuss the reasons why many of these are only potential users, why so many have been driven off Linux despite their attempts at switching over to it.

Linus, Linux and The CADT model

The first point of de Icaza's critique is nothing new. It's what Jamie Zawinski coined the term CADT, Cascade of Attention-Deficit Teenagers, for. However, the way in which de Icaza presents the issue has two significant problems.

One is his use of «we», a pronoun which is somehow supposed to refer to the entire Linux developer community; someone could see it as a diplomatic way of not coming out with the specific names and examples of developers and project that break backwards compatibility every time (which would be ‘bad form’), while at the same time putting himself personally in the number of people that did so.

The other is how he tries to follow the dismissing attitude back to Linus Torvalds, which by position and charisma may be considered the one that «sets the tone for our community», assuming that Linus (and the kernel development community) feeling free to break the internal kernel interfaces even at minor releases somehow give userspace developers the entitlement to do the same about external interfaces.

These two points have sparked a debate in which Linus himself (together with other important Linux personalities) intervened, a debate that has made the news. And the reasons for which the debate sparked is that these two points are among the most critical issues indicating what's wrong with the article. Since in the debate I find myself on the opposite camp of Miguel de Icaza (and, as I found out later, mostly in Linus' camp), I'm going to discuss this in more detail, in a form that is more appropriate for an article than for a comment, as I found myself doing so far.

Kernel, middleware and user space

I'm going to start this explanation with a rough, inadequate but still essential description of the general structure of a modern operating system.

First of all, there's the kernel. The kernel is a piece of software that sits right on top of (and controls and receives signal from) the hardware. It abstracts the hardware from the rest of the operating systems, and provides interfaces to allow other pieces of the operating system to interact with the hardware itself. Linux itself is properly only the kernel, which is why a lot of people (especially the GNU guys) insist on calling it GNU/Linux instead; after all, even Android uses the Linux kernel: it's everything else that is different.

By application one usually means the programs that are executed by the user: web browsers, email clients, photo manipulation programs, games, you name it. These user space applications, which is what users typically interact with, don't usually interact directly with the kernel themselves: there's a rather thick layer of libraries and other programs that ease the communication between user space applications and the kernel. Allow me to call this layer ‘middleware’.

Example middleware in Linux and similar systems includes the first program launched by the kernel when it finished loading (typically init), the C library (libc, in Linux often the one written by the GNU project) and things that manage the graphical user interface, such as the X Window System (these days typically provided by the X.org server in Linux).

All the components of the various layers of the operating system must be able to communicate with each other. This happens through a set of interfaces, which are known as APIs (Application Programming Interfaces) and ABIs (Application Binary Interfaces), some of which are internal (for example, if a kernel module has to communicate with something else inside the kernel, it uses an internal kernel API) while others are external (for example, if the C library needs to communicate with the kernel, it does so using an external kernel API).

Interface stability and application development

Let's say that I'm writing a (user space) application: a photo manipulation program, an office suite, whatever. I'm going to develop it for a specific operating system, and it will be such a ‘killer app’ that everybody will switch to that operating system just for the sake of using my application.

My application will use the external interfaces from a number of middleware libraries and applications (for example, it may interface with the graphics system for visualization, and/or with the C library for file access). My application, on the other hand, does not care at all if the internal interfaces of the kernel, or of any middleware component, change. As long as the external interfaces are frozen, my application will run on any future version of the operating system.

A respectable operating system component never removes an interface: it adds to them, it extends them, but it never removes them. This allows old programs to run on newer versions of the operating system without problems. If the developers think of a better way to do things, they don't change the semantics of the current interface; rather, they add a new, similar interface (and maybe deprecate the old one). This is why Windows APIs have call names with suffixes such as Extended and whatever. This is why we still have the (unsafe) sprintf alongside the (safe) snprintf in the POSIX C library specification.

Let me take the opportunity to highlight two important things that come from this.

One: the stability of internal interfaces is more or less irrelevant as far as user space applications are concerned. On the other hand, stability of external interfaces is extremely important, to the point that it may be considered a necessary condition for the success of an operating system.

Two: it may be a little bit of a misnomer to talk about interface stability. It's perfectly ok to have interfaces grow by adding new methods. What's important is that no interface or method is removed. But we'll keep talking about stability, simply noting that interfaces that grow are stable as long as they keep supporting ‘old style’ interactions.

Are Linux interfaces stable?

Miguel de Icaza's point is that one of the main reasons for the failure of Linux as a desktop operating system is that its interfaces are not stable. Since (as we mentioned briefly before) interface stability is a necessary condition for the success of an operating system, his reasoning may be correct (unstable interfaces imply unsuccessful operating system).

However, when we start looking at the stability of the interfaces in a Linux environment we see that de Icaza's rant is misguided at best and intellectually dishonest at worst.

The three core component of a Linux desktop are the kernel, the C library and the X Window System. And the external interfaces of each of these pieces of software are incredibly stable.

Linus Torvalds has always made a point of never breaking user space when changing the kernel. Although the internal kernel interfaces change at an incredible pace, the external interface is a prime example of backwards compatibility, sometimes to the point of stupidity. { Link to round table with Linus mentioning examples of interfaces that should have never be exposed or had issues, but were still kept because programs started relying on the broken behavior. }

A prime example of the interface stability is given by the much-critiqued sound support, which is an area where the Linux kernel has had some drastic changes over time. Sound support was initially implemented via the ironically-named Open Sound System, but this was —not much later— replaced by the completely different Advanced Linux Sound Architecture; yet OSS compatibility layers, interfaces and devices have been kept around since, to allow old applications using OSS to still run (and produce sound) on modern Linux versions.

This, by the way, explains why Linus was somewhat pissed off at de Icaza in the aforementioned debate: if developers in the Linux and open source worlds had to learn anything from Linus, it would have been to never break external interfaces.

Another good example in stability is given by the GNU C Library. Even though it has grown at an alarming pace, its interface has been essentially stable since the release of version 2, 15 years ago, and any application that links to libc6 has forward-compatibility essentially guaranteed, modulo bugs (for example, the Flash player incorrectly used memcpy where they should have used memmove, and this resulted in problems with audio in Flash movies when some optimizations where done to the C library; this has since been fixed).

But what is the most amazing example of stability is the X Window System. This graphical user interface system is famous for having a client/server structure and being network transparent: you can have ‘clients’ (applications) run on a computer and their user interface appear on another computer (where the X server is running). X clients and server communicate using a protocol that is currently at version 11 (X11) and has been stable for 25 years.

The first release of the X11 protocol was in 1987, and an application that old would still play fine with an X11 server of today, even though, of course, it wouldn't be able to exploit any of the more advanced and sophisticated features that the servers and protocol have been extended with. The heck, Linux didn't even exist 25 years ago, but X.org running on Linux would still be perfectly able to support an application written 25 years ago. How's that for stability?

If the three core components of a Linux desktop operating system have been so stable, why can Miguel de Icaza talk about “deprecating APIs”, “removing functionality” and “replacing core subsystem”, and still be right? The answer is that, of course, there have been some very high-profile cases where this has happened.

The prime example of such developer misbehavior is given by GNOME, a desktop environment, something that sits on top of the graphical subsystem of the operating system (X, in the Linux case) and provides a number of interfaces for applets and applications to present a uniform and consistent behavior and graphical appearance, and an integrated environment to operate in.

Applications can be written for a specific desktop environment (there are more than one available for Linux), and for this it's important for the desktop environment (DE, for short) to provide a stable interface. This has not been the case with GNOME. In fact, the mentioned CADT expression was invented specifically for the way GNOME was developed.

We can now start to see why Linus Torvalds was so pissed off at Miguel de Icaza in the mentioned debate: not only the Linux kernel is one of the primary examples of (external) interface stability, so trying to trace CADT to Linus is ridiculous, but GNOME, of which Miguel de Icaza himself has been a prominent figure for a long time, is the primary example of interface instability.

The «we» Miguel uses to refer to the open source and Linux community as a whole now suddenly sounds like an attempt to divert the blame for a misbehavior from the presenter of the argument itself to the entire community, a generalization that has no basis whatsoever, and that most of all can't call for Linus as being the exemplum.

Ubuntu the Breaker

Of course, the GNOME developer community is not the only one suffering from CADT, and in this Miguel is right. Another high-profile project that has had very low sensitivity to the problem of backwards compatibility in the name of “the new and the shiny” is Ubuntu.

This is particularly sad because Ubuntu started with excellent premises and promises to become the Linux distribution for the ‘common user’, and hence the Linux distribution that could make Linux successful on the desktop. And for a few years, it worked really hard, and with some success, in that direction.

But then something happened, and the purpose of Ubuntu stopped being to provide a solid desktop environment for the ‘common user’, and it started being the playground for trying exciting new stuff. However, the exciting new stuff was brought forward without solid transition paths from the ‘old and stable’, with limited if any backwards compatibility, and without any solidification process that would lead the exciting new stuff to actually be working before gaining widespread usage.

This, for example, is the way PulseAudio was brought in, breaking everybody's functioning audio systems, and plaguing Ubuntu (and hence Linux) with the infamous idea of not having a working audio system (which it still has: ALSA). Similar things happened with the other important subsystems, such as the alternatives to the System V init traditionally used (systemd and upstart); and then with the replacement of the GNOME desktop environment with the new Unity system; And finally with the ‘promise’ (or should we say threat) of an entirely new graphical stack, Wayland, to replace the ‘antiquate’ X Window System.

It's important to note that none of these components are essential to a Linux desktop system. But since they've been forced down the throat of every Ubuntu user, and since Ubuntu has already gained enough traction to be considered the Linux distribution, a lot of people project the abysmal instability of recent Ubuntu developments onto Linux itself. What promised to be the road for the success of Linux on the desktop became its worst enemy.

Common failures: getting inspiration on the wrong side of the pond

There's an interesting thing common to the persons behind the two highest-profile failures in interface stability in the Linux world: their love for proprietary stuff.

Miguel de Icaza

Miguel de Icaza founded the GNOME project (for which we've said enough bad things for the moment), but also the Mono project, an attempt to create an open-source implementation of the .NET Framework.

His love for everything Microsoft has never been a mystery. Long before this recent rant, for example, he blamed Linus for not following Windows example of a stable (internal) kernel ABI. At the time, this was not because it ‘set the wrong example’ for the rest of the community, but because it allegedly created actual problems to hardware manufacturer that didn't contribute open source drivers, thereby slowing down Linux adoption due to missing hardware support.

As you can see, the guy has a pet peeve with Linus and the instability of the kernel ABI. When history proved him wrong, with hardware nowadays gaining Linux support very quickly, often even before release, and most vendors contributing open source drivers (more on this later), he switched his rant to the risible claim that instability of the kernel ABI set ‘a bad example’ for the rest of the community.

It's worse than that, in fact, since the stability of the Windows kernel ABI is little more than a myth. First of all, there are at least two different families of Windows kernels, the (in)famous Win9x series and the WinNT series. In the first family we have Windows 95, Windows 95 OSR2, Windows 98, Windows ME (that family is, luckily, dead). In the second family we have the old Windows NT releases, then Windows 2000 (NT 5.0), Windows XP (NT 5.1), Windows Vista (6.0), Seven (6.1). And not only are the kernel families totally incompatible with each other, there are incompatibilities even within the same series: I have pieces of hardware whose Windows 98 drivers don't work in any other Win9x kernel, earlier or later, and even within the NT series you can't just plop a driver for Windows 2000 into Windows 7 and hope it'll work without issues, especially if it's a graphic driver.

However, what Windows has done has been to provide a consistent user space API (Win32) that essentially allows programs written for it to run on any Windows release supporting it, be it either of the Win9x family or of the WinNT family.

(Well, except when they cannot, because newer releases sometimes created incompatibilities that broke older Win32 applications, hence the necessity for things such as the “Windows XP” emulation mode present in later Windows releases, an actual full Windows XP install within Windows, sort of like WINE in Linux —and let's not talk about how the new Metro interface in the upcoming Windows 8 is going to be a pain for everybody. We'll talk about these slips further down.)

But WINE and Mono will be discussed later on in more details.

Mark Shuttleworth

Mark Shuttleworth is the man behind Canonical and ultimately Ubuntu. Rather than a Microsoft fan, he comes out more on the Apple side (which is where Miguel de Icaza seems to have directed his attention now too). It's not difficult to look at the last couple of years of Ubuntu transformations and note how the user interfaces and application behavior has changed away from a Windows-inspired one to one to mimics the Mac OSX user experience.

This is rather sad (someone could say ‘pathetic’), considering Linux desktops have had nothing to envy of Mac OSX desktops for a long time: in 2006 a Samba developer was prevented from presenting on his own computer, because it was graphically too much better than what Mac OSX had to offer at the time.

But instead of pushing in that direction, bringing progressive enhancements to the existing, stable base, Ubuntu decided to stray from the usability path and shift towards some form of unstable ‘permanent revolution’ that only served the purpose of disgruntling existing users and reducing the appeal to further increase its user base.

The number of Ubuntu derivatives that have started gaining ground simply by being more conservative about the (default) choice of software environment should be playing all possible alarm bells, but apparently it's not enough to bring Canonical back on the right track.

The fascination with proprietary systems

So why are such prominent figures in the open source figures so fascinated with proprietary operating systems and environments, be it Microsoft or Apple? That's a good question, but I can only give tentative answers to it.

One major point, I suspect, is their success. Windows has been a monopolistically dominant operating system for decades. Even if we only start counting from the release of Windows 95, that's still almost 20 years of dominion. And the only thing that has managed to make a visible dent in its dominance has been Apple's Mac OSX. There is little doubt that Apple's operating system has been considerably more successful than Linux in gaining ground as a desktop operating system.

While there's nothing wrong with admiring successful projects, there is something wrong in trying to emulate them by trying to ‘do as they do’: even more so when you actually fail completely at doing what they really did to achieve success.

Windows' and Mac OSX' success has been dictated (among other reasons which I'm not going to discuss for the moment) thanks to a strong push towards a consistency between different applications, and the applications and the surrounding operating systems. It has never been because of this or that specific aesthetic characteristic, or this or that specific behavior; it has been for the fact that all applications behaved in a certain way, had certain sets of common controls, etc.

This is why both operating systems provide extensive guidelines to describe how applications should look and behave, and why both operating systems provide interfaces to achieve such looks and behavior —interfaces that have not changed with time, even when they have been superseded or deprecated in favour of newer, more modern ones.

Doing the same in Linux would have meant defining clear guidelines for application behavior, providing interfaces to easily follow those guidelines and then keeping those interfaces stable. Instead, what both GNOME (initially under Miguel de Icaza's guide) and Ubuntu (under Mark Shuttleworth's guide) tried to do was try to mimic this or that (and actually worse: first this then that) behavior or visual aspect of either of the two other operating systems, without any well-defined and stable guideline, and without stable and consistent interfaces: they tried to mimic the outcome without focusing on the inner mechanisms behind it.

In the mean time, every other open source project whose development hasn't been dazzled by the dominance of proprietary software has managed to chug along, slowly but steadily gaining market share whenever the proprietary alternatives slipped.

One important difference between dominant environments and underdogs is that dominants are allowed to slip: dominants can break the user experience, and still be ‘forgiven’ for it. Microsoft has done it in the past (Ribbon interface anyone? Vista anyone?), and it seems to be bound to do it again (Metro interface anyone?): they can afford it, because they are still the dominant desktop system; Appple is more of an underdog, and it's more careful about changing things that can affect the user experience, but they still break things at times (not all applications written for the first Mac OSX release will run smoothly —or at all— on the latest one). But the underdogs trying to emulate either cannot afford such slips: if they're going to be incompatible as much as the dominants are, why shouldn't a user stick with the dominant one, after all?

Linux and the desktop

And this leads to the final part of this article, beyond a simple critique to Miguel de Icaza's article. Two important questions arise here. Can Linux succeed in the desktop? And: does it actually matter?

Does it matter?

There has been a lot of talking, recently, about whether the desktop operating system concept itself is bound to soon fall into oblivion, as other electronic platforms (tablets and ‘smart’ phones) raise into common and widespread usage.

There is a reason why the so-called ‘mobile’ or ‘touch’ interfaces have been appearing everywhere: the already mentioned Metro interface in Windows 8 is a bold move into the direction of convergence between desktop and tablet and mobile interfaces; Mac OSX itself is getting more and more similar to iOS, the mobile operating system Apple uses on its iPods and iPads; even in the Linux world, the much-criticized Unity of the latest Ubuntu, and its Gnome Shell competitor, are efforts to build ‘touch-friendly’ user interfaces.

Unsurprisingly, the one that seems to be approaching this transition better is Apple; note that this is not because the Mac OSX and iOS user interfaces are inherently better, but simply because the change is happening gradually, without ‘interface shocks’. And there are open source projects that are acting in the same direction the same way, even though they don't try to mimic the Apple interface particularly.

The most significant example of an open source project that is handling the desktop/touch user interface convergence more smoothly is KDE, a desktop environment that in many ways has often tried (albeit sadly not always successfully) to be more attentive of the user needs. (In fact, I'd love to rant about how I've always thought that KDE would have been a much superior choice to GNOME as default desktop environment for Ubuntu, and about how history has proven me right, but that's probably going to sidetrack me from the main topic of discussion.)

If everything and everyone is dropping desktops right and left and switching to phones and tablets, does it really matter if Linux can become ‘the perfect desktop operating system’ or not?

I believe it does, for two reasons, a trivial one and a more serious one.

The trivial reason is that Linux, in the sense of specifically the Linux kernel, has already succeeded in the mobile market, thanks to Android, which is built on a Linux kernel. I'm not going to get into the debate on which is better, superior and/or more successful between Android and iOS, because it's irrelevant to the topic at hand; but one thing can be said for sure: Android is successful enough to make Apple feel threatened and to let them drop to the most anticompetitive practices and underhanded assaults they can legally (and economically) afford to avert such threat.

But there is a reason why the success of an open Linux system is important: when the mobile and tablet crazes will have passed, people will start realizing that there were a lot of useful things their desktops could do that their new systems cannot do.

They'll notice that they can't just plug a TV to their iPad and watch a legally-downloaded movie, because the TV will not be ‘enabled’ for reproduction. They'll start noticing that the music they legally bought from online music stores will stop playing, or just disappear. They'll notice that their own personal photos and videos can't be safely preserved for posterity.

They will start noticing that the powerful capability of personal computers to flatten out the difference between producer and consumer has been destroyed by the locked-in systems they've been fascinated by.

The real difference between the information technology up to now and the one that is coming is not between desktop and mobile: it's between open and locked computing.

Up until now, this contrast has been seen as being about “access to the source”, ‘proprietary’ software versus open source software. But even the closed-source Windows operating systems allows the user to install any program they want, and do whatever they want with the data; at worst it allowed you to replace the operating system with a different one.

This is exactly what is changing with the mobile market: taking advantage of the perception that a tablet or smartphone is not a computer, vendors have built the systems to prevent users from installing arbitrary software and doing whatever with their data. But the same kind of constraints are also being brought onto the desktop environment. This is where the Mac OSX market comes from, this is why Microsoft is doubling their efforts to make Windows 8 unreplaceable on hardware that wants to be certified: Secure Boot is a requirement on both mobile and desktop system that want to claim support for Windows 8, and on the classical mobile architecture (ARM), it must be implemented in such a way that it cannot be disabled.

Why this difference between ARM and non-ARM? Because non-ARM for Windows means the typical Intel-compatible desktop system, and this is where the current Linux distributions have waged wars against the Secure Boot enforcement.

And this is specifically the reason why it's important for an open system to be readily available, up to date and user-accessible: it offers an alternative, and the mere presence of the alternative can put pressure on keeping the other platforms more open.

And this is why the possibility for Linux to succeed matters.

Can Linux succeed?

From a technical point of view, there are no significant barriers for a widespread adoption of Linux as a desktop operating system. The chicken and egg problem that plagued it in the beginning (it doesn't have much support, so it doesn't get adopted; it's not adopted, so it doesn't get much support), in terms of hardware support, has long been solved. Most hardware manufacturer acknowledge its presence, and be it by direct cooperation with kernel development, be it by providing hardware specifications, be it by providing closed, ‘proprietary’ drivers, they allow their devices to be used with Linux; even though Linux is far from being the primary target for development, support for most hardware comes shortly if not before the actual hardware is made available.

There are exceptions, of course. NVIDIA, for example, is considered by Linus Torvalds the single worst company they've ever dealt with, due to their enormous reticence in cooperating with open source. The lack of support (in Linux) for the Nvidia Optimus dual-card system found in many modern laptops is a result of this attitude, but Linus' publicity stunt (“Fuck you, Nvidia!”) seems to have moved things in the right direction, and Nvidia is now cooperating with X.org and kernel developers to add Optimus support in Linux.

In terms of software, there are open source options available for the most common needs of desktop users: browsers, email clients, office suites. Most of these applications are in fact cross-platform, and there are versions available also for Windows and Mac OSX, and the number of people using them on those operating systems is steadily growing: for them, a potential transition from their current operating system to Linux will be smoother.

Some more or less widespread closed-source applications are also available: most notably, the Skype VoIP program (even though its recent acquisition by Microsoft has been considered by some a threat for its continuing existence in Linux) and the Opera web browser.

The WINE in Linux

There are, however, very few if any large commercial applications. A notable exception is WordPerfect, for which repeated attempt were made at a Linux version. Of the three attempts (version 6, version 8, and version 9), the last is a very interesting one: rather than a native Linux application, as was the case for the other two versions, Corel decided to port the entire WordPerfect Office suite to Linux by relying on WINE, an implementation of the Win32 API that tries to allow running Windows program under Linux directly.

The choice of using WINE rather than rewriting the applications for Linux, although a tactically sound strategy (it made it possible to ship the product in a considerably short time), was considered by many a poor choice, with the perception that it was a principal cause in the perceived bloat and instability of the programs. There are however two little-known aspects of this choice, one of which is of extreme importance for Linux.

First of all, WordPerfect Office for Linux was not just a set of Windows applications that would run under WINE in an emulated Windows environment: the applications where actually recompiled for Linux, linking them to Winelib, a library produced by the WINE project specifically to help port Windows applications to Linux. The difference is subtle but important: a Winelib application is not particularly ‘less native’ to Linux than an application written specifically to make use of the KDE or GNOME libraries. Of course, it will still look ‘alien’ due to its Windows-ish look, but no less alien than a KDE application on a GNOME desktop or vice versa, especially at the time (2000, 2001).

The other important but little-known aspect of Corel's effort is that it gave an enormous practical push to the WINE project. At the time when the port was attempted, the WINE implementation of the Win32 API was too limited to support applications as sophisticated as those of the WordPerfect Office suite, and this led Corel to invest and contribute to the development of WINE. The result of that sponsorship are quite evident when the status of WINE before and after the contribution is considered. Since Corel was trying their own hand at distributing Linux itself with what was later spun off as Xandros, the improved WINE was to their benefit more than just for the ability to support the office suite.

In the Linux world, WINE is a rather controversial project, since its presence is seen as an obstacle to the development of native Linux applications (which in a sense it is). However, I find myself more in agreement with the WINE developers, seeing WINE as an opportunity for Linux on the desktop.

It's not hard to see why. Desktop users mostly don't care about the operating system; they could be running PotatoOS for all they care, as long as it allows them to do what they want to do, and what they see other people doing. What users care about are applications. And while it's undoubtedly true that for many common applications there are open source (and often cross-platform) alternatives that are as good when not better of the proprietary applications, there are still some important cases where people have to (or want to) use specific applications which are not available for Linux, and possibly never will. This is where WINE comes in.

Of course, in some way WINE also encourages ‘laziness’ on the part of companies that don't want to put too much effort in porting their applications to Linux. Understandably, when Linux support is an afterthought it's much easier (and cheaper) to rely on WINE than to rewrite the program for Linux. And even when getting started for the first time, it might be considered easier to write for Windows and then rely on WINE that approaching cross-platformness with some other toolkit, be it using Qt, whose licensing for commercial applications makes it a pricey option, be it using GTK, whose Windows support is debatable at best, be it using wxWidgets, one of the oldest cross-platform widgets, or any less-tried option. In some sense, the existence of WINE turns Win32 into a cross-platform API, whose Windows support just happens to be extremely superior to that of other platforms.

It's interesting to observe that when LIMBO was included in the cross-platform Humble Indie Bundle V, it caused a bit of an uproar because it wasn't really cross-platform, as it relied on WINE. Interesting, Bastion, that builds on top of the .NET Framework (and thus uses the aforementioned Mono on Linux), didn't cause the same reaction, despite being included in the same package. Yet, to a critical analysis, an application written for the .NET Framework is not any more native to Linux than one written for the Win32 API.

If anything, the .NET Framework may be considered “not native” in any operating system; in reality, it turns out to be little more than a different API for Windows, whose theoretical cross-platformness is only guaranteed by the existence of Mono. It's funny to think that Mono and its implementation of the .NET Framework is seen in a much better light than WINE and its implementation of the Win32 API, even though under all respects they are essentially the same.

The lack of commercial applications

In some way, what Miguel de Icaza's rant tried to address was specifically the problem of the missing commercial applications, on the basis that no applications implies no users, and therefore no success on the desktop market. While the instability of the interfaces of some high-profile environments and the multitude of more or less (in)compatible distributions are detrimental and discouraging for commercial developers, the overall motivations are much more varied.

There is obviously the software chicken and egg problem: Linux doesn't get widespread adoption due to the lack of applications, applications don't support Linux because it doesn't have widespread adoption.

Another important point is the perceived reticence of Linux users to pay for software: since there are tons of applications available for free, why would Linux users think about buying something? Windows and Mac OSX users, on the other hand, are used to paying for software, so a commercial application is more likely to be bought by a Windows or Mac user than by a Linux user; this further reduces the relevance of the potential Linux market for commercial interests.

This line of reasoning is quite debatable: the Humble Indie Bundle project periodically packages a number of small cross-platform games which users can buy by paying any amount of their choice, and the statistics consistently show that even though there are significantly more bundles sold for Windows than for Linux, the average amount paid per platform is distinctly higher in Linux than in Windows: in other words, while still paying what they want, Linux users are willing to pay more, on average, which totally contradicts the perception about Linux users and their willingness to pay.

There's more: if piracy is really as rampant in Windows (starting from the operating system itself) as many software companies want us to believe, one should be led to think that Windows users are not particularly used to pay: rather, they are used to not paying for what they should be paying, in sharp contrast with Linux users who would rather choose applications and operating systems they don't actually have to pay for. In a sense, choosing free software rather than pirated one becomes an indicator of user honesty, if anything. But still, the perception of Linux users as tightwads remains, and hinders the deployment of applications for the platform.

It's only at this point, in third place so to say, that technical reasons, such as the instability of interfaces or the excessive choice in distributions and toolkits, become an obstacle. Should we target Linux through one of the existing cross-platform toolkits, or should we go for a distinct native application? Should this be targeting a specific desktop environment? And which toolkit, which desktop environment should be selected?

However, the truth is that these choices are not really extremely important. For example, Skype simply opted for relying on the Qt toolkit. Opera on the other hand, after various attempts, decided to go straight for the least common denominator, interfacing directly with Xlib. And of course, for the least adventurous, there's the possibility to go with WINE, in which case just contributing to WINE to help it support your program might be a cheaper option than porting your program to Linux; this, for example, is the way Google decided to go for Picasa.

Finally, of course, there are applications that will never be ported to Linux, Microsoft Office being a primary example of this. And for this there is sadly no direct hope.


There is one final issue with Linux on the desktop, and this is pre-installation. Most users won't go out of their way to replace the existing operating system with some other, because, as already mentioned, users don't usually care about the operating system.

This is the true key for Linux to the desktop: being the default operating system on machines as they are bought. However, none of the major desktop and laptop computer companies seem particularly interested in making such a bold move, or even make an official commitment at having full Linux support for their hardware.

One notable exception in this has been Asus, whose Eee PC series initially shipped with Linux as the only option for the operating system, even though later strong-arming from Microsoft led to it shipping both Linux and Windows machines (with the Windows ones having inferior technical specifications to comply with Microsoft's request that Windows machine shouldn't cost more than Linux ones).

It's interesting, and a little sad, that there are vendors that sell desktops, laptops and servers with Linux preloaded (see for example the Linux Preloaded website). The question is: why don't major vendors offer it as an option? And if the do, why don't they advertise the option more aggressively?

I doubt it's a matter of instability. It wouldn't be hard for them to limit official support to specific Linux distributions and/or versions that they have verified to work on their hardware: it wouldn't be different than the way they offer Windows as an option. And this makes me suspect that there's something else behind it; is Microsoft back to their tactics of blackmailing vendors into not offering alternatives?

Get Rid Of CD Images

A recent message on the Ubuntu mailing list proposing to drop the “alternate CDs” for the upcoming release of Ubuntu (12.10) had me thinking: why are modern Linux distributions still packaged for install in the form of CD images?

There are obvious historical reasons for the current situation. CDs and DVDs are the most common physical support to distribute operating systems, and Linux is no exception to this. Before broadband Internet got as widespread as it is today, the only or the cheapest way to get Linux was to buy an issue of a Linux magazine, as they commonly offered one or more CDs with one or more Linux distributions to try out and/or install.

Today, “install CDs” are still the most common, or one of the most common ways, to do a clean Linux install, even though one doesn't usually get them from a Linux magazine issue, but rather by downloading a CD image (.iso) from the Internet and then either burning it on an actual CD or putting it on a USB pendrive. In fact, I'd gather that using a pendrive is more common than actually burning a CD these days, and there's even a website dedicated to the fine art on using a pendrive to hold one or more install images for a number of different Linux distributions.

In contrast to Windows installation media, Linux install images, be they on CD or on pendrives, are useful beyond the plain purpose of installing the operating system: most of them are designed in such a way that they can also be used ‘live’, providing a fully-functional Linux environment that can be used, for example, as a rescue system (there are, in fact, images that are tuned for this), or just to try out the latest version of the operating system without actually installing anything (useful to test hardware compatibility or whatnot).

This is actually one more reason why pendrives are a superior option to CDs. Not only they are usually faster, and being rewritable can be reused when a new install comes out (and you are not left with hundreds of old CDs laying around): since they are not read-only they can be used to store permament data (and customizations) on the pendrive when the image is used as a ‘live’ system.

Finally, CDs (and CD images) have obvious, significant size constraints: This, for example, is one of the reasons why Debian, from which Ubuntu is derived, recently replaced with XFCE as the default desktop environemtn: the former was too large to fit on the (first) CD.

There is in fact one and only one reason I can think of why a CD would be preferrable to a pendrive, and that's when you want to install Linux on a machine that is so old that its BIOS doesn't allow booting from USB disks (or on a machine that doesn't have USB ports at all). So why would a modern Linux distribution (that probably won't even run on such an old system) still design its install images around CDs?

I suspect that, aside from legacy (“we've always done it like this”, so habit, toolchains, and general knowledge is geared towards this), one of the most important reasons is simplicity: a user can download an .iso image, double-click on the downloaded file and, whatever their current operating system (Windows, Mac OS X, Linux), they will be (probably) offered the option to burn the image on CD. That is, of course, assuming your computer does have a CD or DVD burner, which a surprisingly high number of more recent computers (esp. laptops) don't actually have.

By contrast, using a pendrive requires tools which are usually not as readily available on common operating systems (and particularly on Windows), or available but non-trivial to use (for example the diskimage utility on Mac OS X). Compare for example the instructions for creating a USB image for Ubuntu, and compare them with [the ones about CD installs]. Or consider the existence of websites such as the aforementioned PenDriveLinux, or LiLi (the latter being geared specifically towards the set up of live systems on USB keys).

There is also the matter of price: blank CDs and DVDs are still cheaper, per megabyte, than a USB flash disk, with prices that hover around 10¢ per CD and 50¢ per DVD. By contrast, the cheapest 1GB USB flash drives go for 1€/piece when bought in bulk: much more expensive for physical mass distribution.

The situation is a little paradoxical. On the one hand, USB would offer the chance to reduce the number of images, but on the other hand it would make downloads heavier, making physical distribution more convenient.

Consider this: a distribution like Ubuntu currently makes available something between 5 and 10 different installation image: there's the standard one, the alternative text-based (which they want to get rid of), the server one and the ‘Cloud’ one, for each of the available architectures (Intel-compatible 32-bit and 64-bit at least, without counting the ones that are also available for ARM); and then there's the localized DVD images.

A CD image is between 200 and 300MB smaller than the smallest USB flash drive that can hold it (CDs hold about 700MB, USB keys come in either 512MB or 1GB sizes); since CD images often share most of the data (especially if they are for the same architecture), an image could be specifically prepared to fit a 1GB USB disk and still be equivalent to two or three different images, and maybe even more, for example offering the functionality of both the standard and alternative CDs, or the server and Cloud images; I wouldn't be surprised if all four of them could fit in 1GB, and I'm sure they can all fit in a 2GB image.

Of course, this would mean abandoning the CD as a physical media for physical distribution, since such an image could only be placed on a USB disk or burned on DVD. It would also mean that downloading such an image would take more time than before (at the expense of those who would be satisfied with the feature of a single disk, but in favour of those that found themselves needing two different images).

If I remember correctly, Windows 98 was the first PC operating system that required a CD drive, since it shipped in the form of a floppy plus bootable CD (in contrast to the ridiculously large amount of floppies its predecessor came in). Modern Windows, as far I know, come in DVD format. I think it's time for modern Linux distributions to go beyond plain CDs as well, but in a smarter way than just “ok, we'll go to DVDs”.

I know that Linux can be made to run in the stranges of places, and it has been often used to revive older systems. But the question is: does the particular distribution you're still designing CD installation images for actually support such old systems? As an example, I think of two tightly related, but still significantly different, Linux distributions: Debian and Ubuntu. It may make sense to have a CD installation image for Debian, but I really don't see why it would be useful to have one for Ubuntu. Why?

Ubuntu is designed to be easily installable by the most ignorant user, its installer is heavily designed around graphic user interfaces, and the desktop environment it installs requires a decent video card: can a user without particular knowledge really install it (from CD) and use it on a system that doesn't have a DVD drive and doesn't support for booting from USB disks? I doubt it.

On the other hand, Debian is a ‘geekier’ distribution, it has a less whizzy installer, and by default it installs a much lighter desktop environment, if you really want to install one. You're much more likely to have success in running it on older hardware ‘out of the box’, or with minimal care during installation. A CD image to install this on such a system is a sensible thing to have; even a floppy disk to allow installation on systems that have a CD drive but can't boot from it would be sensible for Debian. For Ubuntu? Not so much.

Now, Ubuntu development these days seems to be all about revolutionizing this or that part of the user experience or the system internals: Ubiquity, Unity, Wayland, you name it, often with debatable results. Why don't they think about revolutionizing the installation media support instead? (Actually, I think I know: not enough visual impact on the end user.)

Printers, costs and content distribution

I've recently come across a rant about printers. The rant mixes in a number of complaints ranging from technology (e.g. paper jamming) to economics (e.g. the cost of ink cartridges). Since it's a rant, it should probably not be considered too serious, but it does raise some interesting points that I'd like to discuss. Of course, I'm going to skip on the obviously ridiculous complaints (such as the need to get out of bed), and go straight to the ones worth answering.

Let's start easy.

The first complaint is about all the stuff that has to be done before you can actually get to printing, such as having to plug the printer into the laptop and wait for startup times and handshakes and stuff like that.

This is actually an easy matter to solve, and that has been addressed by people that replied to the rant: get a network printer. The printer we have at home is not networked, but we have a small server, and the printer is plugged there and shared to all computers our small home network.

The paper jam complaint also doesn't carry much weight. As far as I know, even professional printing equipment jams from time to time, although of course miniaturized mechanisms such as the ones found on desktop printers are bound to jam more easily. The frequency of jams could probably be reduced by using better materials, but I doubt they could be eliminated altogether.

The biggest complain is, obviously, about ink cartridges and their cost (which the author of the rant hyperbolicly, but not even that much, states as being ten times more than diesel fuel). The author of the rant also remarks (correctly) that this is

how Epson, Lexmark, Canon, Brother et al. make money: They make shitty low-end printers the break easily so they need to be regularly replaced and make the ink cost ten times more than diesel fuel (a hyperbole that is close to accurate, btw) so they can have a steady flow of cash from those printers that do work.

Except that printer manufacturers don't make shitty low-end printers only; they also have mid-range and high-end products. So one would be taken to wonder, if the author of the rant is aware of this, why doesn't he invest a little more money in getting a better product that is going to give him less problems?

Getting a complex piece of hardware like a printer for less than 50€ (including the famous “initial cartridges”) and expecting it to last forever, with cheap consumables to go with it, is naive at best: the cheap printers are obviously sold at a loss, with money recouped by the sale of consumables. It's up to the buyer to be aware (and wary) of the mechanism, and choose accordingly, because there are alternatives; of course, they do require a higher initial investment.

The author of the rant goes on to compare the printer matter with video rental:

It’s bullshit the way Blockbuster was bullshit with late fees and poor customer service and high rental prices and yearly membership fees. Remember how Netflix and it’s similar services worldwide practically destroyed them? I really want some hipster engineer at Apple or Microsoft or anywhere to make a printer that Netflixes the fuck out of the consumer printing market.

The comparison, I'm afraid, is quite invalid. I'm not going to discuss the Blockbuster vs Netflix thing in detail, although I will mention that the Blockbuster model was not ‘bullshit’ (remember, you have to compare renting a DVD with buying a movie or going to the cinema; also, at least in my country Blockbuster had no membership fees, although the late fees were outrageous): it has been, however, obsoleted by the Netflix model.

The chief difference between the Blockbuster and Netflix models? The difference is that Blockbuster was on demand, while Netflix is a subscription. But is a subscription model intrinsically superior to an on-demand model?

The answer is, I'm afraid, no. The convenience (for the user) of one model over the other is entirely related to the cost ratio of the two services compared to the amount of service ‘use’ they get. If I only watch a movie once every two or three months, I'm much better off renting stuff one-shot when needed and then forget about shelling any money for the rest of my time. (Of course, most people watch way more movies per month, hence a subscription is usually better.)

Now let's think about this for a moment: the way Netflix disrupted Blockbuster was by offering a subscription service that was more convenient than the on-demand service offered by Blockbuster, for a lot of people.

The question is: would such a model really be applicable to printing? How would it even work? For 10 bucks a month you get a new (used) printer delivered to your door?

In fact, I really think that the current cheap printing business is the closest you can get to a subscription model, considering the enormous differences that exist between simple content distribution (which is what Blockbuster and Netflix do) and the production of complex devices such as printers.

In other words, the “Netflix revolution” has already happened in the printing business, except that it went in totally the opposite way for the consumer, while still being extremely profitable for the provider (as most subscription models are).

So what the author of the rant should probably aim at is to break out of the subscription model and go back to something more on demand. Can this be achieved?

There are many ways to work in that direction.

The cheaper way is to make heavy use of the various DIY cartridges refill kits, or referring to the knock-off or ‘regenerated’ cartridges instead of buying the official ones. However, most people probably know by experience that the quality of prints degrades in this case, something which in my opinion shows how there is actually a reason why ink cartridges are expensive, regardless of how overcharged they are.

Another, possibly smarter but more expensive (especially in the short run) solution has already been mentioned: don't buy a 30€ printer every six months; buy a mid-range printer and save in the long run. You can get professional or semi-professional color laser networked multifunction (including fax/scan capability) printers for around 500€. If you're willing to sacrifice on wireless and fax, even less than that. The toner cartridges aren't that more expensive than the ink ones (esp. in terms of price/pages), and they are much more durable (no more throwing away a cartridge because you didn't use the printer for a month and the ink dried up).

And finally, Print On Demand. You send them the file, they mail you the printed stuff. I've always been curious about this particular kind of service, and I see it making a lot of sense for some very specific cases. But I doubt the domestic use cases the author of the rants probably based his rant on would fit in this.

High resolution

When I got my first laptop about 10 years ago, I wanted something that was top of the line and could last several years with a modicum of maintenance. I finally opted for a Dell Inspiron 8200 that I was still using in 2008, and whose total price, considering also the additional RAM, new hard disk, replacement batteries and replacement cooling fans I opted or had to buy the course of that long period, was in the whereabouts of 3k€, most of which was the initial price.

One of the most significant qualities of that laptop, and the one thing that I miss the most still today, was the display, whose price accounted for something like half of the money initially spent on the laptop. We're talking about a 15" matte (i.e. non-glossy) display at a 4:3 aspect ratio, with 1600×1200 resolution (slightly more than 133 dots per linear inch), 180 cd/m² brightness at maximum settings. In 2002.

When six years later I had to get a new one for work reasons (the GeForce 2 Go on that machine was absolutly not something you could use for scientific computing, whereas GPGPU was going to be the backbone of all of my future work), I was shocked to find out there was no way to get my hands on a display matching the quality of what I was going to leave.

This was not just a matter of price: there were simply no manufacturer selling laptops with a high-quality display like the one of my old Dell Inspiron 8200. All available displays had lower resolution (110, 120 dpi top) and the only one that provided matte displays was Apple (that offered the option with a 50€ addition to the price tag).

I can tell you that dropping from the aptly named TrueLife™ Dell display to the crappy 1280×800 glossy display featured on my current laptop (an HP Pavilion dv5, if anybody is interested) was quite a shock. Even now, four years after the transition, I still miss my old display; and how could I not, when I see myself mirrored on top of whatever is on screen at the time, every time I'm not in perfect lighting conditions, i.e. most of the time?

Not that the desktop (or stand-alone) display was going any better: somehow, with the ‘standardization’ of the 1920×1080 as the ‘full HD’ display resolution (something that possibly made sense in terms of digital TV or home-theater digital media like DVD or BluRay discs, but is purely meaningless otherwise), all monitor manufacturer seemed to have switched to thinking ‘ok, that's enough’.

The dearth of high-resolution, high-quality displays has always surprised me. Was there really no market for them (or not enough to justify their production), or was there some other reason behind it? What happened to make the return-on-investment for research in producing higher quality, higher density, bright, crisp displays too low to justify ‘daring’ moves in that direction? Did manufacturers switch to just competing in a downwards spiral, going for cheaper components all around, trying to scrape every single possible cent of margin, without speding anything in research and development at all?

Luckily, things seem to be changing, and the spark has been the introduction of the Retina display in Apple products.

The first time I heard the rumors about a Retina display being available on the future MacBook Pro, I was so happy I even pondered, for the first time in my life, the opportunity to get an Apple product, despite my profound hate and despise for the company overall (a topic which would be off-topic here).

My second, saner thought, on the other hand, was a hope that Apple's initiative would lead to a reignition of the resolution war, or at least push the other manufacturers to offer high-resolution displays again. Apparently, I was right. Asus, for example, has announced a new product in the Transformer line (the tablets that can be easily converted into netbooks) with a 220 dpi resolution; Dell has announced that HD displays (1920×1080) on their 15.4" products will be available again.

How I wish Dell had thought about that four years ago. My current laptopt would have been a Dell instead of an HP. Now however, as the time for a new laptop approaches, I've grown more pretentious.

How much can we hope for? It's interesting to note that the Apple products featuring the higher resolution displays actually have a decreasing pixel density, with the iPhone beating the iPad beating the just-announced MacBook Pro. This should not be surprising: when the expected viewing distance between user and device is taken into consideration, the angular pixel density is probably the same across devices (see also the discussion about reference pixels on the WWW standard definition of CSS lengths).

However, as good as the Retina is, it still doesn't match the actual resolution of a human retina. Will the resolution war heat up to the point where devices reach the physical limits of the human eyes? Will we have laptop displays that match printed paper? (I don't ask for 600 dpi or higher, but I wouldn't mind 300 dpi).

An Opera Requiem?

Opera has been my browser of choice for a long time. As far as I can recall, I started looking at it as my preferred browser since version 5, the ad-supported version, released around the turn of the millennium. It wasn't until two or three versions later, however, that I could stick to it as my primary browser, given the number of websites that had troubles with it (some due to bugs in Opera, but most often due to stupidity from the web designer and their general lack of respect for the universality of the web).

While a strong open-source supporter, I've always considered myself a pragmatist, and the choice for Opera as my preferred browser (together with other software, such as WordPerfect, which is off topic here) has always been a clear example of this attitude of mine: when a close-source proprietary software is hands-down superior to any open-source alternative, I'd rather use the closed-source software, especially when it's free (gratis).

One of the reasons why I've always preferred Opera is that it has been the pioneer, when not the inventor, of many of the web technologies and user interface choices that are nowadays widely available in more popular browsers. I've often found myself replying with “oh, they finally caught up with features Opera has had for years?” when people started boasting of “innovative” features like tabs being introduced in browsers such as Firefox.

Opera was the first browser to have decent CSS support, and except for brief moments in history, has always been leading in its compliance to the specification (to the point of resulting in ‘broken’ rendering when CSS was written to fit the errors in other, more common, implementations).

Opera has sported a Multiple Document Interface (a stripped-down version of which is the tabbing interface exposed by all major browsers today) since its earlier release.

Opera has had proper support for SVG as an image format before any other browser (still today, some browsers have problems with it being used e.g. as a background image defined by CSS).

Opera has a much more sophisticated User JavaScript functionality than that offered by the GreaseMonkey extension in Firefox. (This is partly due to the fact that the same technology is used to work around issues in websites that autodetect Opera and send it broken HTML, CSS or JavaScript.)

In fact, Opera hasn't had any support for extensions until very recently, since it managed to implement most of the features provided by extensions to other browsers, in a single package, while still remaining relatively lightweight in terms of resource consumption and program size.

When the Mozilla suite was being re-engineered into separate components (Firefox the browser, Thunderbird for mail and news, etc) in the hopes of reducing bloat, Opera managed to squeeze all those components in a single application that was smaller and less memory-hungry than anything that ever came out of Mozilla.

When Firefox stopped supporting Windows 98, Opera could still run (even if just barely) on a fairly updated Windows 95 on quite old hardware. When Firefox started having troubles being built on 32-bit systems, Opera still shipped a ridiculously large amount of features in an incredibly small package. Feed discovery? Built-in. Navigation bar? Built-in. Content blocking? Built-in. Developer tools? Built-in.

Opera pioneered Widgets. Opera pioneered having a webserver built in the browser (Opera Unite). I could probably go on forever enumerating how Opera has constantly been one when not several steps ahead of the competition.

Additionally, Opera has long been available on mobile, in two versions (a full-fledged browser as well as a Mini version); and even the desktop version of the browser has the opportunity to render things as if it were on mobile (an excellent feature for web developers). Opera is essentially the only alternative to the built-in browser of the N900 (or more in general for Maemo). Opera is also the browser of the Wii, and the browser present in most web-enabled television sets.

Despite its technical superiority and its wide availability, Opera has never seen a significant growth in user share on the desktop, consistently floating in the whereabouts of a 2% usage worldwide (that still amounts to a few hundred million, possible more than half a billion, users).

I'm not going to debate on the reasons for this lack of progress (aside from mentioning people's stupidity and/or laziness, and marketing), but I will highlight the fact that Opera users can be considered “atypical”. Although I'm sure some of them stick to the browser for the hipster feeling of using something which is oh so non-mainstream, I'd say most Opera aficionados are such specifically because of its technical quality and its general aim towards an open, standard web.

Although the overall percentage of Opera users has not grown nor waned significantly in the last ten years or so, it's not hard to think of scenarios that would cause an en masse migration away from the browser (sadly, there aren't as many scenarios where people would migrate to it).

One of these scenarios is Opera being bought by Facebook, a scenario that may become a reality if the rumor that has been circulating these past days has any merit. I've even been contacted about this rumor by at least two distinct friends of mine, people that know me well as an Opera aficionado and Facebook despiser.

One of them just pointed me to a website discussing the rumor in an email aptly titled “Cognitive dissonance in 3, 2, 1, …”. To me, the most interesting part of that article were the comments, with many current Opera fans remarking how that would be the moment they'd drop Opera to switch to some other browser.

These are exactly my own feelings, in direct contrast with what the other friend of mine claimed with a Nelson-like attitude (“ah-ha, you'll become a ‘facebooker’ even against your will”), pointing to another website discussing the same rumor. But that would be like saying that I'd become a Windows user because of the Nokia/Microsoft deal, while I've just ditched Nokia for good.

In fact, an Opera/Facebook deal would have a lot of similarities with the Nokia/Microsoft deal that has sealed Nokia's failure in the smartphone market. For Facebook (resp. Microsoft), getting their hands on an excellent browser (resp. hardware manufacturer) like Opera (resp. Nokia) is an excellent strategic move to enter a market that they would have (resp. have had) immense troubles penetrating otherwise; for Opera (resp. Nokia), on the other hand, and especially for its users, the acquisition would be (resp. has been) disastruous.

In many ways, Facebook on the web represents the opposite of what Opera has struggled for; where Opera has actively pursued an open, interoperable web based on common standards and free from vendor lock-in, Facebook has tried to become the de facto infrastructure of a ‘new web’, with websites depending on Facebook for logins and user comments, and where the idea itself of websites starts losing importance, when just the “Facebook page” is sufficient. This is essentially the server-side (‘cloud’) counterpart of what Microsoft was done years ago with the introduction of the ActiveX controls and other proprietary ‘web’ features in Internet Explorer.

I'm not going to spend too many words on how and why this is a bad thing (from the single point of failure to the total loss of control over content, its management and freedom of expression), but even just the rumor of Opera, a browser that was actually going in the opposite direction by providing everybody with a personal web server to easily share content with other people from their own machine, is enough to send chills down my spine.

It's interesting to note that Opera has halted development of their Unite platform (as well as their Widgets platform) citing resource constraints (between Unite, Widgets, and the recently introduced Extensions, they did look a little as if they were biting off more than they could chew). And unless there will be Opera extensions developed to match the features formerly provided by the Unite and Widgets platform, their end of life marks a very sad loss for the “more power to the user” strategy that has made Opera such an excellent choice for the cognoscenti.

There are forms of partnership possible between Facebook and Opera that could be designed to benefit both partners without turning up as a complete loss for one at the benefit of the other, in pretty much the same way as Opera has built partnership with many hardware vendors to ship their browser. And since these markets are exactly what Facebook is interested in, I doubt that they would settle for anything less than buying out Opera altogether.

On a purely personal level, I'm going to wait and see this rumor through. If it does turn out to be true, I'll have no second thoughts in considering the Opera I like at its end of life, and I'll start looking into alternatives more in tune with my personal choices of server- and client-side platforms. (And what a step would that be, from actually considering applying for a job at Opera!)

Judging from the comments to the rumor on the Opera fora, I'm not the only Opera user thinking along those lines. One would wonder if Opera would still be worth that much to Facebook after its market share suddenly drops to something around zero, but if the only thing Facebook is interested in is the market penetration obtained by Opera pre-installations on mobile devices and ‘smart’ TV sets, how much would they even care for the desktop users that actually switched to that browser by choice?

{ This article will be updated when the rumor will be definitely confirmed or refuted. }

The CSS filling menu, part 2

This is a pure
CSS challenge:
no javascript,
no extra HTML

Consider the container and menu from the first part of this challenge. The idea now is to find a solution that not only does what was require in the first part, but also allows wrapping around when the natural width of the menu is wider than the width of the container (which, remember is a wrapping container for other content: we don't want the menu to disturb its layout).

Of course, when the menu doesn't fit in a single line, we want each row to fill up the container nicely, and possibly we want the items to be distributed evenly across lines (for example, 4 and 4 rather than 7 and 1 or 6 and 2, or 3 and 3 and 2 when even more lines are necessary).

Can this be achieved without extra HTML markup hinting at where the splits should happen?

The CSS filling menu

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container with a non-fixed width (for example, the CSS wrapping container from another challenge). Inside this container we have a menu (which can be done in the usual gracefully degrading unordered list), for which the number of items is not known (i.e. we don't want to change the CSS when a new item is added to the menu). The requirement is that the menu should be laid out horizontally, filling up the total width of the container, flexibly.

Extra credit

Additionally, we would like the menu entries to be uniformly spaced (we're talking here of the space between them, not the whitespace inside them, nor large enough borders), with the same spacing occurring all around the menu.

A (suboptimal?) solution

There is, in fact, a solution for this challenge. The idea is to use tables behind the scene, without actually coding tables into the HTML. If the menu is coded with the standard gracefully degrading unordered list, the ul element is set to display: table; width: 100% and the items are set to display: table-cell.

Note that this solution should not necessarily be considered ‘quirky’: the purpose of getting rid of tables from the HTML was to ensure that tables were not being used for layout, but only to display tabular content. There is no reason to not use the display: table* options to obtain table-like layouts from structural markup!

Additionally, we can also get the extra credit by using something like border-collapse: separate; border-spacing: 1ex 0, which is almost perfect, except for the fact that it introduces an extra spacing (1ex in this case) left and right of the menu. This can be solved in CSS3 using the (currently mostly unsupported) calc() operator, by styling the ul with width: calc(100% + 2ex); margin-left: -1ex.

Of course, in this case, to prevent the mispositioning of the menu in browsers that do not support calc, the margin is better specified as margin-left: calc(-1ex), that has exactly the same effect but only comes into action if the calc()ed width was supported as well.

The challenge proper

While I'm pretty satisfied with this solution, it's not perfect:

  • there are people that cringe at the use of tables, even if just in CSS form; a solution without CSS tables would thus be better;
  • when the container is smaller than the overall natural length of the menu, the menu will overflow instead of wrapping.

Note that without using the CSS tables, the uniform spacing is quite easy to set up (something as trivial as margin: 0 1ex for the entries, for example), but having the entries adapt their size to make the ul fill its container is rather non-trivial.

I'll actually consider the way to make it wrap nicely a different challenge.

The CSS shrinkwrapping container

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container (e.g. an ‘outer’ div) and inside this container we have N boxes with constrained width (i.e. width or max-width is specified). We want to lay out the boxes side by side, as many as fit inside the viewport, and we want the outer container to wrap these boxes as tightly as possible (considering, of course, all padding and margins). The container (and thereby the collection of boxes inside it) should be centered inside the viewport.

The problem here is that we want to lay out the inner boxes (almost) without taking the outer box into consideration, and then lay out the outer box as if it had {width: 100%; margin-left: auto; margin-right: auto}.

A suboptimal solution

There is, in fact, a suboptimal solution for this challenge. The idea is to fix the container width based on the width necessary to wrap the actual number of boxes that would fit at a given viewport width, and then let the box fill the container as appropriate (I prefer a display: inline-block, without floats, since this spaces out the boxes evenly).

For example, if we know that, considering padding and margins, the container would have to be large 33em when holding only one box, and that only one box would fit with a viewport smaller than 66em, and that only two boxes would fit with a viewport smaller than 98em, etc, we could use something like the following:

@media (max-width: 98em) {
    #content {
        width: 66em;
@media (max-width: 66em) {
    #content {
        width: 33em;

Now, the reason why this is a horrible solution. To work perfectly, it requires the following things to be known:

  • the number of boxes (one media query for each ‘additional box’ configuration is needed),
  • the width of each box (note that the solution works regardless of whether the boxes have the same or different widths, as long as each box width is known, in the order in which they are to be laid out).

The challenge proper

The question is: is it possible to achieve this effect without knowing the number of boxes and without writing an infinite (nor a ‘reasonably high’) number of media queries? Even a solution in the case of fixed, equal-width boxes would be welcome.

The CSS challenges

In the beginning was HTML, and HTML mixed structure and presentation. And people saw that this was a bad thing, so an effort was made to separate content structure from layout and presentation.

This resulted in the deprecation of all HTML tags and tag attributes whose main or only purpose was to change the presentation of the text, and in the birth of Cascading Style Sheets (CSS), to collect the description of the presentation and layout descriptions.

This was a very good thing. And in fact, CSS succeeded fairly well in achieving the separation of content from styling: it is now possible, using only structural (‘semantic’) HTML and CSS to achieve an impressive richness of colors, font styles and decorations.

However, while one of the purposes of CSS was to get rid of the use of ‘extra’ HTML (infamously, tables, but not just that) to control the layout, i.e. the positioning of elements on the page and with respect to each other, this has been an area where CSS has failed. Miserably.

So miserably, in fact, that sometimes it's not even sufficient to just add extra markup (container elements whose only purpose is to force some layout constraints): it might be necessary to resort to JavaScript just for the sake of obtaining the desired layout. And this, even before taking into consideration the various errors and deficiencies in the CSS implementations of most common layout engines.

I'm going to present here a number of challenges whose main purpose is to highlight limitations in the current CSS specifications: the things I'm going to ask for are going to be hard, if not impossible, to achieve regardless of the quality of the implementation, i.e. even on a layout engine that implemented everything in the current specification, and did it without any bugs whatsoever.

These challenges should be solved using only HTML and CSS, without any hint of JavaScript, and possibly without having to resort to non-structural markup in the HTML.

(Attentive people will notice that some these challenges have a remarkably close affinity with some of the features of this wok. This is not by chance, of course: one of the purposes of this wok is to act as my personal HTML testing ground for sophisticated features.)

{ And here, I might add in the future some further considerations and remarks which would not be considered challenges. }

Vettoriale manuale

Ho recentemente scoperto la bellezza dell'hand-editing dell'SVG: un po' come lo scrivere a mano l'HTML delle pagine web, ma assai più laborioso e spesso molto meno gratificante, soprattutto se, come nel mio caso, manca un senso estetico di appoggio all'abilità tecnica.

A dirla tutta, scrivere a mano questi verbosi formati di markup è estemamente tedioso, anzi faticoso, e pesa parecchio su polsi e sulle dita. La cosa non dovrebbe sorprendere: sono formati intesi più per la produzione e la consumazione da parte di macchine, che non per la modifica diretta da parte degli esseri umani. (In realtà, il discorso per l'HTML è leggermente più complesso.)

Per di più, scrivere SVG a mano significa fare grafica (SVG, dopo tutto, vuol dire scalable vector graphics) senza vederla. Abituati come siamo ad un mondo ‘punta e clicca’, anche per del semplice testo1, quanto più può sembrare strano, se non assurdo, fare grafica senza (un'interfaccia) grafica?

Ovviamente, l'opportunità o meno di lavorare senza un feedback grafico immediato dipende pesamente dal tipo di grafica che si deve fare (oltre che, ovviamente, dall'attitudine individuale). Per un rapido schizzo estemporaneo un classico programma di grafica (vettoriale) come Inkscape è certamente lo strumento ideale; ma vi sono alcuni casi (che discuteremo a breve) in cui lavorare ‘a mano’ è nettamente superiore.

Beninteso, anche quando si lavora a mano il feedback è necessario, per assicusarsi di aver scritto giusto, per controllare il risultato ed eventualmente migliorarlo; quando lavoro su un SVG, ad esempio, tengo sempre il file aperto anche in una finestra del browser, aggiornandolo quando finisco una iterazione di modifiche.

Quali sono dunque i casi in cui la stesura manuale di un verboso XML è preferibile ad una interfaccia grafica? Le risposte sono due, e benché antipodali sono strettamente legate da due fili conduttori: quello dell'eleganza e quello dell'efficienza.

L'SVG può essere considerato come un linguaggio estremamente sofisticato e complesso per la descrizione di figure in due dimensioni (figure descritte da segmenti, archi di cerchio e cubiche di Bézier), con ricche opzioni stilistiche su come queste figure (descritte geometricamente) devono apparire (colori, frecce, riempimenti).

In effetti, l'SVG è talmente complesso che è ben possibile che i programmi visuali a nostra disposizione semplicemente non supportino l'intera ricchezza espressiva del linguaggio; in tal caso, la possibilità di modificare l'SVG a mano si può rivelare preziosa (il già citato Inkscape, ad esempio, che usa una verione bastarda dell'SVG come formato nativo, permette anche modifiche manuali al codice interno dell'imagine).

Il caso opposto è quello di un disegno estremamente semplice: perché prendersi la briga di aspettare i lunghi minuti che spesso i programmi di grafica impiegano all'avvio, quando un semplice editor di testo può bastare?

Il più grosso vantaggio della codifica manuale rispetto all'uso di un classico programma per la grafica vettoriale è la netta semplificazione del file stesso: anche l'immagine più sempice, infatti, salvata da un programma di grafica, si trova infatti sommersa da una immensa e spesso ingiustifica tonnellata di informazioni supplementari che sono inessenziali, ma che riproducono le strutture di controllo utilizzate internamente dal programma stesso.

Così ad esempio, ho potuto ottenere una versione vettoriale del logo Grammar Nazi che occupa meno di metà dello spazio su disco rispetto a quella a cui è ispirata, senza perdere minimamente né in qualità né in informazione. Anzi, il mio approccio alla descrizione della G stilizzata risulta essere ben più comprensibile, essendo disegnato ‘in piano’ e poi ruotato/riscalato opportunamente.

Questo è proprio un altro vantaggio della scrittura manuale rispetto al disegno grafico: la possibilità di esprimere già a livello di codifica la distinzione tra il design della singola componente e le trasformazioni geometriche necessarie per la sua integrazione con il resto del disegno.

Benché questo sia anche una possibilità spesso offerta dai programmi visuali, l'informazione viene spesso sfruttata sul momento per deformare/riposizionare le componenti come richieste, ma non è preservata nel salvataggio su file, ed è quindi ‘persa’ dopo la sua applicazione: non essenziale per la versione definitiva di un progetto, ma alquanto scomodo nel periodo di design.

Scrivere a mano risulta quindi in file non solo più efficienti (cosa che può avere un impatto per l'utente medio, con ridotti tempi di caricamento o meno fatica da parte del computer nella rasterizzazione dell'immagne), ma anche più eleganti: un'esigenza un po' ‘segreta’ (in quanti si ritrovano abitualmente a guardare il codice sorgente di un file, piuttosto che il suo risultato?) e che per l'utente medio in genere non ha impatto (anche se può risultare talvolta opposto a quello dell'efficienza, richiedendo maggiori calcoli in fase di rasterizzazione).

La codifica manuale non è ovviamente la panacea: oltre ad essere (per qualcuno ingiustificatamente) laboriosa, ad esempio, non può sopperire ai limiti intrinseci del formato. L'SVG, ad esempio, manca della capacità di esprimere le dimensioni e le posizioni delle componenti in rapporto l'una all'altra, se non in casi molto semplici e ricorrendo a sofisticati artifici con raggruppament e dimensionamenti fatti con fattori di scala; in più, le costanti numeriche devono essere espresse in forma decimale e quindi, per valori quali π/3 o la sezione aurea, approssimate.

L'SVG, d'altro canto, non è l'unico linguaggio per la grafica vettoriale: programmi come MetaPost ed il suo progenitore MetaFont sono nati come linguaggi di programmazione per la grafica vettoriale, sono stati scritti con un occhio di riguardo per gli aspetti numerici della matematica della grafica vettoriale, e no soffrono dei limiti suenunciati dell'SVG; d'altronde, un paragone diretto tra MetaPost ed SVG è altamente inappropriato, tanto per le rispettive caratteristiche quanto per i rispettivi dominî di applicazione per cui sono stati intesi.

Il MetaFont nasce dalla mente follemente geniale di Donald Ervin Knuth con lo scopo di permette la generazione matematica di famiglie di caratteri per la stampa. I caratteri di un MetaFont sono descritte da cubiche di Bézier opportunamente parametrizzate e combinate, e questo principio (trasformando i caratteri in immagini e l'output rasterizzato in output vettoriale in formato PostScript) sarà pure la componente fondamentale del MetaPost.

MetaFont e MetaPost indicano tanto i programmi in sé quanto il linguaggio di programmazione (molto simile per entrambi) che permette agli utenti di sviluppare famiglie di caratteri o immagini vettoriali, con descrizioni di tipo matematico e relazionale (sono permesse descrizioni del tipo: traccia una curva dall'intersezione di queste altre due curve ai due terzi di quell'altra curva). Un file MetaPost è come il sorgente di un qualunque linguaggio di programmazione, e va compilato per la produzione di una o più immagini.

Per contro, l'SVG nasce come linguaggio di descrizione di immagini vettoriali, ed è mirato (seppur non in maniera esclusiva) alla fruizione del web, includendo pertanto funzionalità come la possibilità di descrivere semplici animazioni, eventualmente controllate mediante intersazione con l'utente.

D'altra parte, l'SVG si integra piuttosto bene con il JavaScript, il linguaggio di programmazione dominante sul web, e grazie a questo può assumere tutta una serie di capacità la cui mancanza lo rende in certi casi inferiore al MetaPost; d'altra parte, trovo personalmente molto fastidioso dover ricorre ad un linguaggio di programmazione ausiliario per la descrizione di immagini statiche.

Se nel MetaPost questo era una necessità legata alla natura intrinseca del programma (compensata dall'immensa flessibilità offerta dalla possibilità di esprimere i tratti salienti di un'immagine in maniera relazionale), la necessità di utilizzare il JavaScript in SVG per raggiungere certi effetti statici continua a pesare come una limitazione dell'SVG stesso.

Si potrebbe suppore che se non avessi avuto una precedente esperienza con la potente flessibilità del MetaPost, non avrei mai sentito i limiti dell'SVG come tali. Ne dubito: avrei comunque molto rapidamente trovato frustrante l'impossibilità di usare quantità numeriche ‘esatte’ lasciando al computer il compito di interpolare, avrei comunque sentito fortemente la mancaza di esprimere come tali le relazioni tra componenti diverse di un'immagine.

Piuttosto, quello che penso potrebbe essere un interessante compromesso è qualcosa di simile al Markdown (che permette di scrivere documenti HTML quasi come se fosse del testo semplice) per l'SVG. Se il MetaPost stesso si pensa sia troppo complicato, si potrebbe cominciare da qualcosa di più semplice, come Eukleides (pacchetto attualmente specializzato per la geometria).

Ovviamente, è importante che gli SVG prodotti da questi programmi siano quanto più minimalistici possibile, e quindi che in qualche modo riflettano, nel prodotto finale, quello spirito di eleganza, semplicità ed efficienza che caratterizza la codifica a mano rispetto all'uso di un'interfaccia grafica. E come il Markdown, dovrebbe permettere l'inserimento di codice SVG ‘nudo’. Quasi quasi mi ci metto.

  1. una nozione che io trovo raccapricciante: trovo faticoso già solo guardare la gente che stacca le mani dalla tastiera per selezionare del testo con il mouse, per poi andare a cliccare su un bottone per l'apposita funzione d'interesse (grassetto, corsivo, cancella, copia, whatever). ↩

RCS fast export

Get the code for
RCS fast export:


RCS is one of the oldest, if not the oldest, revision control systems (in fact, that's exactly what the name stands for). It may seem incredible, but there's still software around whose history is kept under RCS or a derivative thereof (even without counting CVS in this family).

Despite its age and its distinctive lack of many if not most of the features found in more modern revision control systems, RCS can still be considered a valid piece of software for simple maintenance requirements, such as single-file editing for a single user: even I, despite my strong passion for git, have found myself learning RCS, not earlier than 2010, for such menial tasks.

In fact, the clumsiness of RCS usage when coming from a sophisticate version control software like git was exactly what prompted me to develop zit, the single-file wrapper for git. And so I found myself with the need to convert my (usually brief, single-file) RCS histories to git/zit.

I was not exactly surprised by the lack of a tool ready for the job: after all, how many people could have needed such a thing? Most large-scale project had already migrated in time to some other system (even if just CVS) for which even quite sophisticated tools to convert to git exist. So I set down to write the RCS/git conversion tool myself: I studied the RCS file format as well as the git fast-import protocol, and sketched in a relatively short time the first draft of rcs-fast-export, whose development can be followed from its git repository.

The first release of the software was quite simple, supporting only linear histories for single files (after all, that was exactly what I needed), but I nevertheless decided to publish my work; who knows, someone else in the internet could have had some need for it.

In fact, what has been surprising so far to me has been the number of people that have had need for this small piece of software. Since the public release of the software, I've been contacted by some five or six different people (among which the most notable is maybe ESR) for suggestions/question/patches, and as with all software developed on a need/use basis, the capabilities of the script have hence grown to accommodate the needs of these other people.

In the current situation it can handle files with branched histories as well as multi-file projects with a linear history. It does not, however, currently support multi-file histories with branching, which is, unsurprisingly, the “most requested” feature at present times. I have actually been looking for a repository with this characteristics, to try and tackle the task, but it seems that finding one such repository is nigh impossible; after all, how many people still have RCS repositories around?

Design finlandese

Come ho già detto, l'N900 è un gran bel telefonino, la cui esperienza d'uso è però intralciata da qualche piccolo difetto di funzionamento. Il più strano è forse il problema del sensore di prossimità.

Per evitare che durante una chiamata il contatto con il viso o qualche mossa azzardata della mano possa chiudere le chiamate o caricare programmi non desiderati, il programma di telefonia legge il sensore di prossimità e blocca il telefonino se il sensore è chiuso.

Il sensore funziona ad infrarossi: è quindi accoppiato ad un LED a luce infrarossa che, se riflessa ad esempio dal viso dell'utente, torna al sensore invece di disperdersi nell'ambiente.

Cosa succede se nell'ambiente c'è un'altra sorgente di luce infrarossa sulle stesse frequenze del sensore? Il sensore ‘pensa’ di essere ostruito anche quando è libero; è quindi necessario che il sensore sia regolato per accettare interferenze ‘tipiche’ da sorgenti esterne.

Sorgenti tipo il Sole.

Solo che la quantità di luce (infrarossa) solare disponibile in zone come, che so, la Sicilia non è esattamente la stessa di zone come, che so, la Finlandia. Indovinate dove è stato tarato il sensore? (Suggerimento: di che nazionalità è la Nokia?)

(Davvero, per far funzionare il sensore correttamente basta ostruirlo parzialmente in modo da ridurre l'interferenza da luce solare.)

N900, o di come la Nokia ha scelto il suicidio dopo essersi aperta la strada per il futuro

Un paio d'anni fa —quando l'iPhone ormai spopolava tra i fighettini ed il concetto di smartphone si era abbondatemente esteso oltre quello di semplice Personal Digital Assistant con telefonino incluso, creando un nuovo mercato che andava ben oltre l'aspirante manager ed il suo Palm (prima) o il suo BlackBerry (dopo)— la Nokia, il cui Symbian era il sistema operativo mobile più diffuso (coprendo una gamma di prodotti che andava dai cellulari da Dash ai più sofisticati communicators), fece uscire un prodotto che per la prima volta in vita mia mi fece seriamente prendere in considerazione l'idea di prendere uno smartphone di fascia alta.

Il prodotto in questione era l'N900, uscito sul finire del 2009. L'unica cosa che allora mi trattenne dal prenderne possesso fu il prezzo, che si aggirava sui 600€ (un prezzo in realtà non inusuale per la classe dell'apparecchio, ma che ad esempio la Apple nascondeva dietro ‘offerte’ con contratti inestinguibili legati a questo o quel fornitore di connettività). Potete immaginare la mia sensazione di invidia quando ho scoperto che ad un corso libero tenuto all'università davano proprio questo modello —gratis— agli studenti (ingegneria informatica).

Qualche mese fa, approfittando della ‘zona compleanno’ e di una proposta via internet, ho deciso di prenderne uno di seconda mano ad un terzo del prezzo, ed ho finalmente avuto modo di giocarci a mio piacimento. Ed in breve posso dire che è stata forse la spesa (personale) più soddisfacente degli ultimi anni.

L'N900 si posiziona in evidente competizione con l'iPhone 3GS, uscito pochi mesi prima, con una interessante combinazione di pro e contro; una sintesi delle differenze si può trovare su questa pagina.

In realtà, l'unico ‘contro’ dell'N900 rispetto al concorrente Apple è nel touchscreen, resistivo nel Nokia (già questo ad alcuni può dare fastidio), ma soprattutto incapace di tracciare più dita, rendendo quindi impossibili i famosi gesti di pizzico e distensione per lo zoom. Per il resto, il Nokia vince praticamente su tutto, tranne lo spessore (un 60% in più, non tutto dovuto alla tastiera fisica scorrevole, che è uno dei punti di forza dell'N900): il display del Nokia ha una risoluzione quasi doppia, il Nokia ha sia una fotocamera posteriore (con flash e autofocus, ed una risoluzione superire a quella dell'iPhone) sia una anteriore (bassa risoluzione, per le videochiamate), il display del Nokia può essere usato sia con le dita sia con un pennino (incluso), il Nokia ha un ricevitore ed un trasmettitore FM (anche se, chi usa ancora le vecchie radio?), il Nokia ha una tastiera fisica (già detto), il Nokia ha un lettore per schede MicroSD, il Nokia ha la batteria sostituibile, il Nokia ha un'uscita video standard ed il cavo per la connessione ai televisori è incluso.

Infine, il Nokia ha Linux: non una macchina virtuale semi-proprietaria come il Dalvik di Android (su kernel Linux), non una variante proprietaria (iOS) del kernel open-source BSD dei telefoni Apple, ma una distribuzione Linux ad-hoc (Maemo) costituita quasi per intero da software open source.

Da quando sono entrato in possesso di questo giocattolino l'ho usato per un'infinità di cose: giocare, leggere libri e fumetti, scattare foto, girare filmati, amministrare il mio server, ascoltare musica, leggere e scrivere email, chattare e parlare via Skype e Google Talk. In sostanza l'unica cosa per cui non l'ho usato è stato telefonare, e questo principalmente perché non ho ancora trovato una SIM con buone tariffe per internet.

Molte delle cose per cui ho usato ed uso il telefonino hanno richiesto l'installazione di nuovi programmi; e benché sia disponibile un OVI store, io ho potuto trovare tutto quello che mi serviva nei repository ufficiali (dopo tutto si tratta sempre di una distribuzione Linux completa e basata su Debian). In sostanza, non ho dovuto spendere un centesimo più del costo del telefonino.

Non si può dire che l'N900 fosse perfetto: nell'uso quotidiano si possono riscontrare facilmente problemi anche molto fastidiosi che vanno da un'antenna non eccellente a qualche problema con il sensore di prossimità. Eppure, non sono certo stati questi a decretare il profondo insuccesso del tentativo della Nokia di entrare nel gioco degli smartphone di nuova generazione.

La Nokia, piuttosto, ha di fatto messo in atto un vero e proprio suicidio. L'N900, che altro non era che il primo passo verso un mercato in cui la Nokia stava entrando già con un notevole ritardo, è diventato invece, purtroppo, l'apice degli smartphone Nokia.

La strategia da seguire dopo l'uscita dell'N900 sarebbe dovuta essere focalizzata sul raffinamento ed il miglioramento del tipo di piattaforma già sperimentata con l'N900 ed il suo Maemo 5, per produrre nel minor tempo possibile un successore che ponesse rimedio ai limiti hardware e software del primo vero smartphone Nokia.

Dal lato hardware non c'era nemmeno nemmeno molto da fare: aggiungere funzionalità multi-touch al display, migliorare l'antenna interna e risolvere i problemi del sensore di prossimità sarebbero dovuti essere gli obiettivi principali. In nuove generazioni hardware, processori più potenti, batterie più capiente ed un design magari più sottile avrebbero reso insuperabili i successori dell'N900.

Ma è soprattutto sul lato software che la Nokia è caduta nel più infantile degli errori: ripartire da zero, proponendo una piattaforma completamente nuova, MeeGo, nata in teoria dalla fusione di Maemo con il Moblin della Intel: in sostanza, una terza alternativa ai due sistemi esistenti, da riprogettare dall'inizio e con la conseguente, inevitabile dilatazione dei tempi di uscita dei nuovi prodotti in un mercato che non aveva certo voglia di aspettare la Nokia.

A metà del 2010 la Nokia si trovava quindi con progetti interessanti cominciati (ma non finiti) tra le mani, il più importante dei quali l'acquisizione della Trolltech e per conseguenza il controllo sulle Qt, la più importante interfacce multipiattaforma in circolazione, e potenzialmente il punto d'incontro tra i mai nati Maemo 6 e Symbian4. Nello stesso periodo, l'azienda sforna qualche altro timido tentativo di smartphone basato sull'ormai moribondo Symbian, e l'unica nota veramente positiva dell'anno è la pubblicazione di una serie di aggiornamenti software che rendono l'N900, nei limiti imposti da un hardware dell'anno precedente, l'ottimo smartphone che mi ritrovo ora tra le mani.

Si sarebbe dovuto aspettare il 2011 per vedere i frutti dei nuovi progetti del 2010 —ed è infatti solo a giugno del 2011 che la Nokia renderà disponibile due nuovi modelli (N950 ed N9) ed il nuovo sistema operativo (Harmattan, il raccordo tra Maemo e Meego) che potrebbero farla tornare ad essere rilevante nel mercato degli smartphone.

Purtroppo, però, la latenza introdotta dalla reinvenzione di Maemo in MeeGo ed il crollo delle vendite dei prodotti Symbian portano alla decisione di un cambiamento di gestione a livello aziendale, e nel settembre 2010 l'allora CEO della Nokia viene sostituito da Stephen Elop, ex-direttore della divisione business della Microsoft (sostanzialmente, il responsabile di Microsoft Office per la release del 2010), che decide di cambiare rotta di nuovo: per sfondare nel mondo degli smartphone, secondo Elop, la Nokia dovrà appoggiarsi al più irrilevante dei sistemi operativi per smartphone, Windows Phone (precedentemente noto come Windows Mobile) della Microsoft.

Purtroppo per Elop, a fine 2010 la Nokia ha già in cantiere i successori dell'N900, e dopo la grande attesa che si è creata nel corso dell'anno per questi nuovi modelli, è impossibile impedirne l'uscita. La strategia adottata diventa quindi quella di renderli irrilevanti, distruggendo così la possibilità di una seria competizione con i nuovi modelli di iPhone della Apple, il principale singolo avversario contro cui la Nokia deve combattere.

L'N950 (il vero successore dell'N900) viene quindi rilasciato solo come developer preview per l'N9: non viene messo in vendita, ma viene reso disponibile solo a sviluppatori, attraverso strani procedimenti di selezione; la scelta è peraltro profondamente discutibile, poiché i due modelli hanno hardware abbastanza diverso (ad esempio, l'N950 ha una tastiera fisica, l'N9 no).

Per completare il suicidio, l'N9 viene reso disponibile solo su alcuni mercati (Finlandia, Hong Kong, Svizzera, India), e negato al resto del mondo. Tutto questo viene accompagnato da una gran fanfara per pubblicizzare l'uscita dei Lumia, i primi cellulari Nokia con il nuovo sistema operativo Microsoft.

Nonostante questi goffi e disperati tentativi di soffocare l'alternativa alla Microsoft, l'erede dell'N900 è talmente ambìto che varî rivenditori online rendono l'N9 disponibile anche su mercati (come quello italiano) che la Nokia aveva invece escluso. Persino altri modelli basati sull'ormai moribondo Symbian continuano a vendere più dei Lumia.

Se il CEO della Nokia non fosse stato un ‘cavallo di Troia’ mandato dalla Microsoft, la scelta da seguire per la Nokia sarebbe stata ovvia. Purtroppo, invece, ci ritroviamo in una situazione in cui a perdere sono sia la Nokia (che continua a perdere mercato ad un ritmo incredibile) sia gli utenti, che si ritrovano infine senza un degno successore per l'N900, quell'ottima combinazione di hardware Nokia (da sempre superiore alla concorrenza) e software di qualità che ne avrebbero potuto decretare il successo.

Verrebbe voglia di metter su un'azienda per costruire un clone dell'N950 e del suo compagno senza tastiera, per proseguire sulla strada che la Nokia ha scelto di abbandonare.

Reti asociali

Breve storia della socialità su Internet

Rispetto ad altre forme di comunicazione di massa, Internet si è sempre contraddistinta per la sua natura “da molti a molti”: sia in forma sincrona (IRC) sia asincrona (mailing list, newsgroup, forum) Internet ha sempre offerto la possibilità a tutti di raggiungere tutti. Fino ai tardi anni '90, per la maggior parte degli utenti questa possibilità era offerta in contesti che avevano molto della piazza e poco dell'individuale: pochi potevano permettersi una presenza fissa su internet con siti personali.

A cambiare questo sono stati la nascita dei blog (pagine personali in forma diaristica), il passaggio dalla loro cura manuale allo sviluppo di strumenti più o meno automatici per la loro gestione, ed infine il diffondersi di piattaforme che offrivano ‘a chiunque’ la possibilità di (ed in particolare lo spazio web necessario per) tenerne uno (LiveJournal, Blogger, i nazionali Splinder o ilCannocchiale, ed infine l'attualmente famosissimo WordPress).

Si avvia così un processo di individualizzazione di Internet, in cui il singolo assume (per il soggetto stesso) un peso sempre maggiore e le comunità cominciano a disgregarsi. Ai blog si affiancano siti in cui la pubblicazione e la condivisione del proprio (in forme non solo o non prevalentemente testuali) è il punto centrale: disegni (DeviantART), foto (Flickr, Zooomr), filmati (YouTube, Vimeo).

Per il singolo diventa sempre più facile pubblicarsi, ma sempre più difficile trovare e farsi trovare: il ruolo un tempo assunto principalmente dalle comunità che si aggrega(va)no in luoghi virtuali ben definiti (canali IRC a tema, gruppi specifici nell'immensa gerarchia dei newsgroup) viene progressivamente sostituito, in maniera oltremodo inefficiente, dalla rete di conoscenze (reali o virtuali).

L'apice di questo processo è la nascita dei cosiddetti social network, siti la cui spina dorsale non è più composta dai contenuti, bensì dai membri e dai modi in cui questi sono legati tra loro, invertendo così il rapporto tra utenti e contenuti che invece domina i servizi precedentemente menzionati.

Social network ed altre piattaforme

I social network non sono monchi della possibilità di pubblicare contenuti; anzi, un punto di forza su cui fanno leva per attirare utenti è la facilità con cui ‘tutto’ (testi, foto, video) può essere messo online, e soprattutto condiviso. Le possibilità offerte per la pubblicazione dei contenuti sono spesso di qualità nettamente inferiore a quelle offerte da piattaforme dedicate, ma sono per lo più sufficientemente buone (e soprattutto semplici) per l'utenza obiettivo preferenziale di questi servizi, con in più la comodità della centralizzazione.

Il social network si propaga facendo leva sulla natura sociale dell'animale umano, e la possibilità di condividere è lo strumento principale della nuova socialità virtuale. Così il social network diffonde la propria presenza oltre i limiti del proprio sito, e diventa lo strumento principale di diffusione anche di contenuti esterni. Prima dell'avvento dei social network, per un sito era importante essere ben indicizzato da un buon motore di ricerca; dopo l'avvento dei social network, per un sito diventa importante poter essere condiviso sui social network.

Ma la principale caratteristica che differenzia il social network dalle altre piattaforme è l'inversione dei rapporti tra utenti, prodotti, servizi e clienti. Mentre le altre piattaforme offrono servizi ai propri utenti, che sono anche i clienti, nei social network i servizi offerti pubblicamente sono solo un'esca per attirare utenti, le cui reti di connessioni ed i cui contenuti condivisi sono il prodotto da vendere ai clienti (principalmente, agenzie di pubblicità).

Il sottoscritto nella rete sociale

Già prima dell'avvento dei social network, i molti passaggi dell'evoluzione delle forme principali di interazione su Internet mi hanno lasciato molto tiepido. Non essendo quello che si definirebbe un early adopter, sono arrivato tardi su IRC (su cui permango tuttora), sui newsgroup e sulle mailing list (tra i quali rimango, e solo molto moderatamente attivo, solo su alcuni gruppi di interesse tecnico molto specifico), sui forum (che ho smesso quasi interamente di seguire). Ho aperto tardi un blog, e la mia presenza sui social network è pressoché inesistente.

Ogni approccio ad una nuova forma di interazione ha avuto origini e motivazioni ben precisi: persino i primi accessi ad internet (ben prima dell'arrivo della banda larga, quando ci si collegava a 56k —se andava bene— bloccando l'uso del telefono) furono motivati (ricordo che la prima connessione, usando uno di quei floppy di ‘prova internet per 15 giorni’, la feci per cercare un walkthrough per Myst, e stiamo quindi parlando dei primi anni '90).

I social network, invece, sfuggono tuttora al mio interesse: vuoi per la mia natura asociale (pardon: ‘selettiva’), vuoi per la mia totale estraneità (non priva di un certo disgusto) a quel vacuo entusiasmo per il numero di amici ed a quella passione quasi ossessivo-compulsiva per la condivisione sfrenata di ogni aspetto della propria vita nonché di quella degli altri tanto amplificata dai social network, vuoi per quella modaiolità intrinseca persino nella scelta dell'“ambiente” (ieri tutti su MySpace, oggi tutti su Facebook, domani tutti chissà dove), mi sono sempre sentito estraneo a queste forme di socialità virtuale.

Eccezionale veramente

Non nego tuttavia che alcuni aspetti del social networking possono essere utili, in determinati contesti o in particolari forme.


In ambito lavorativo, ad esempio, il ‘grafo sociale’ di un individuo, le persone che conosce (ed il suo giudizio su di loro) e quelle che lo conoscono (ed il loro giudizio su di lui) hanno spesso un'importanza non inferiore a quella delle qualifiche dell'individuo stesso (soprattutto quando il lavoro scarseggia e per qualità e per quantità).

Con quest'ottica mi sono iscritto a LinkedIn, social network incentrato sul lavoro, i cui utenti sono definiti dalla professione, dall'azienda per cui lavorano e da quelle per cui hanno lavorato, dall'istruzione che hanno ricevuto. Uno scarno profilo ed una curata selezione di contatti costituiscono la mia ‘partecipazione’ al social network.


L'altro aspetto che può rivelarsi utile dei social network è la centralità, ma piuttosto che incarnata in un ‘luogo unico’ per la pubblicazione dei contenuti, una centralità vista come punto di raccolta di contenuti pubblicati altrove.

Prima dei social network, lo strumento più comodo per seguire gli aggiornamenti di contenuti dei vari blog, siti di fotografia e quant'altro era l'utilizzo dei famosi feed; se della stessa persone si seguiva il blog, i video su YouTube, le foto su FlickR e quant'altro, ci si iscriveva a ciascuno dei feed separatamente. In questo, almeno una centralizzazione dei feed di tutti gli account sparsi per l'Internet avrebbe fatto comodo.

Ed è proprio su questo che ha puntato FriendFeed, ed è stato proprio questa sua principale caratteristica di aggregatore di contenuti, contenuti per i quali la scelta della piattaforma di appoggio rimane in mano agli utenti, che mi ha attirato.

Il profilo dell'utente FriendFeed è composto sostanzialmente dall'elenco dei feed che il social network si prenderà cura di controllare e raggruppare, per diffonderli poi automaticamente ai seguaci, ovvero a coloro che sono ‘iscritti’ all'account dell'utente. Per chi seguisse gente non iscritta a FriendFeed, questo social network permette anche la creazione di ‘amici immaginari’: nuovamente, nient'altro che elenchi di feed esterni raccolti a rappresentare un'unica persona virtuale.

Il punto chiave in tutto ciò è che FriendFeed non controlla i contenuti, ma aiuta molto a gestirli. D'altronde, non tutti i suoi utenti la vedono così, e non sono in pochi ad usare FriendFeed scrivendo messaggi e caricando foto direttamente sul social network.

La storia evolutiva di FriendFeed è stata intensa quanto breve, e si è conclusa con il suo acquisto, nel giro di meno di due anni, da parte di Facebook, mossa che per quanto favorevole agli sviluppatori di FriendFeed ha sostanzialmente sospeso, a tempo indeterminato, ogni speranza di sviluppo delle sue funzioni.

Così, ad esempio, difficilmente verrà aggiunto a FriendFeed il supporto per nuovi servizi e nuove piattaforme; difficilmente verrà migliorata la gestione della privacy, limitata alla possibilità di rendere privato il proprio account, senza ad esempio un'integrazione con la capacità di raggruppare gli amici in liste; difficilmente verrà migliorata la gestione dei gruppi, aggiungendo ad esempio la possibilità di pubblicare alcuni servizi direttamente su gruppi specifici.

Se l'approccio di FriendFeed ad una rete sociale rimane, a mio parere, il più intelligente e corretto, esso non sembra convincere coloro che invece vedono in quello tradizionale il modo migliore per sfruttare la ‘risorsa utente’. Così Google, nel suo ennesimo approccio al social network, dopo il fallimento di Wave e Buzz, opta per la strategià ‘à la Facebook’ in Google+: ed è questo, a mio parere, il grande errore di questo gigante della Rete che su altri fronti ha avuto invece tanti successi.

L'unica novità che Google+ offre in più rispetto a Facebook è infatti la gestione delle cerchie, e benché sia sorprendente che sulla rete sociale ci sia voluto tanto perché la distinzione tra gruppi di contatti molto diversi da loro fosse così ben integrata con il resto della piattaforma, non potrà mai essere questa la killer feature con cui Google+ potrà diventare per Facebook quello che Facebook è stato per MySpace.

Nel mio mondo ideal

{ Come progetterei io un social network. }

Vivere senza Windows(?)

Si può? È una domanda che dovrebbe venire spontanea a tutti quelli che continuamente si trovano ad affrontare virus, malfunzionamenti, appesantimenti ingiustificati del computer e quant'altro.

Invece, purtroppo con poca sorpresa, si scopre che la gente, principalmente per ignoranza, spesso anche per abitudine, e certamente anche per quella forma di conformistica pigrizia che ci fa preferire affrontare i problemi affrontati dalla maggioranza degli altri ai passi in più da compiere per averne molti meno ma diversi, la maggior parte della gente preferisce seguire la propaganda che vede in Windows il sistema “per tutti”, distinguendolo dallo chic alternativo della Apple e dal radical-comunista (e poco conosciuto) Linux.

Se però ci si pone davanti alla questione, la risposta (naturale quanto ovvia) è “dipende”. Dipende dal computer, dal tipo di uso che se ne fa, ed infine dal grado di interoperabilità richiesto con altri (e quindi, in definitiva, dai programmi che si intende usare).


Dal punto di vista hardware, il problema, sempre meno frequente, è legato alla possibilità del sistema operativo di farne uso; se i produttori, per ovvie ragioni di mercato, hanno sempre fornito dischi di installazione con i driver per Windows del loro hardware, la situazione con Linux non è sempre così sorridente: si va dall'hardware con un supporto completo che in alcuni casi supera persino in qualità quello per Windows, ad hardware di cui si è fortunati se si riesce a far sapere al sistema operativo che quel particolare pezzo è presente, passando per tutta la possibile gamma di varianti.

La situazione è in realtà sempre meno tragica, ed ormai è alquanto raro trovare hardware che sia completamente non supportato: allo stato attuale, credo che i lettori di impronte digitali siano grossomodo l'unica classe di hardware quasi totalmente inutilizzabile. Più spesso capita che al momento dell'uscita di un nuovo modello questo non sia immediatamente supportato (ho avuto un'esperienza negativa in tal senso con la tavoletta grafica Wacom Bamboo Pen&Touch, che però adesso uso senza problemi), o che alcune funzioni avanzate non siano configurabili con l'immediatezza delle interfacce "a prova di idioti" che spesso si trovano in Windows (stesso esempio della Wacom, per le funzioni multi-touch).

Ovviamente, il livello di supporto per le componenti e le periferiche dei computer è molto legato alla disponibilità del produttore a cooperare con il mondo Linux. Si riscontrano classicamente quattro livelli:

  1. produttori che contribuiscono attivamente al supporto con driver e strumenti open source (esempi: Intel, HP),
  2. produttori che contribuiscono attivamente al supporto con driver e strumenti proprietari (esempio: ATI, NVIDIA, Broadcom),
  3. produttori che forniscono le specifiche dell'hardware, e quindi rendendo possibile la scrittura di driver e strumenti open source, ma non contribuiscono attivamente con codice di alcun tipo (esempi: ACECAD, Wacom),
  4. produttori il cui hardware è supportato solo grazie al paziente lavoro di reverse-engineering di gente senza alcun legame con la casa produttrice (esempi: troppi).

Per potersi lasciare alle spalle Linux è quindi opportuno diventare un po' più oculati nelle scelte, a meno di non avere interessi smanettoni. Per fortuna, è sempre più difficile trovare cose che non funzionino “out of the box”, ed ancora più difficile trovare cose che non si possano fare funzionare con un attimo di pazienza e qualche rapida ricerca su internet.


L'uso più diffuso dei computer è dato (oggigiorno) probabilmente dalla navigazione in internet, seguita a ruota dall'uso di una suite per ufficio, o quanto meno del suo elaboratore testi (leggi: Word di Microsoft Office). Segue poi un po' di multimedialità, forse ascoltare musica e magari vedere qualche film, con utenze più smaliziate a cui interessa organizzare le proprie foto o i propri video (dal saggio di danza della figlia undicenne agli atti impuri con la compagna).

La scelta dell'applicazione per ciascun uso è, nuovamente, per lo più dettato dall'ignoranza: non sono pochi coloro per cui “la e blu sullo schermo” è internet (quando va bene) o Facebook (la parte di internet con cui si interfacciano il 96% del tempo, il restante 4% essendo YouTube, a cui magari arrivano da Facebook). Forse in questo caso non è proprio opportuno parlare di ‘scelta’.

Saltando la solita questione dell'inerzia (“sul computer c'è questo preinstallato, quindi uso questo”) un altro fattore determinante è l'interoperabilità, intesa specificamente in riferimento alla necessità di scambiare dati con altre persone. Se dalla monocultura web siamo finalmente usciti e sono ormai pochissimi i siti non correttamente fruibili senza Internet Explorer, per documenti di testo e fogli elettronici si continua a dipendere pesantemente dai formati stabiliti dalla suite per ufficio della Microsoft, nonostante per un accesso completo a questi documenti sia necessaria la suite stessa1.

La cosa è un po' paradossale, perché se davvero si puntasse all'interoperabilità ci si dovrebbe rivolgere a qualcosa di più universalmente disponibile, e quindi ad applicativi e formati che non siano legati ad uno specifico sistema operativo. Ma nuovamente l'inerzia e la necessità di compatibilità all'indietro con anni di monocultura (e la relativa legacy di documenti in quei formati) rendono difficile la transizione a soluzioni più sensate.

Cosa usare, e come

Per facilitare la transizione da Windows ad un altro sistema operativo, è meglio cominciare ad usare già in Windows stesso le stesse applicazioni che ci si troverebbe ad usare ‘dall'altra parte’. Prima di buttarsi a capofitto nell'ultima Ubuntu, ad esempio, è meglio rimanere nell'ambiente che ci è familiare (Windows), abbandonando però il nostro Internet Explorer (per chi lo usasse ancora), il nostro Microsoft Office, etc, per prendere dimestichezza con programmi equivalenti che siano disponibili anche sulle altre piattaforme. Questo spesso vuol dire rivolgersi al software open source, ma non sempre.

Si usi quindi ad esempio un browser come Opera, Firefox o Chrome per navigare in internet. Si usi lo stesso Opera di cui sopra o Thunderbird per gestire la posta. Gimp non sarà Photoshop, ma è un buon punto di partenza per il fotoritocco, ed Inkscape dà tranquillamente punti a Corel Draw se non ad Indesign. Come suite per ufficio LibreOffice (derivata dalla più nota OpenOffice.org) è una validissima alternativa al Microsoft Office, salvo casi particolari. DigiKam è eccellente per gestire le proprie foto (anche se forse non banale da installare in Windows; un'alternativa potrebbe essere Picasa, utilizzabile in Linux tramite Wine), VLC è un po' il media player universale, e così via.

Dopo tutto, le applicazioni sono ciò con cui ci si interfaccia più spesso, molto più che non il sottostante sistema operativo, usato per lo più per lanciare le applicazioni stesse ed eventualmente per un minimo di gestione (copia dei file, stampa).

Una volta presa dimestichezza con i nuovi programmi, la transizione al nuovo sistema operativo sarà molto più leggera, grazie anche agli enormi sforzi fatti negli ultimi anni (principalmente sotto la spinta di Ubuntu) per rendere Linux più accessibile all'utonto2 medio.

Ma a me serve …

Ci sono casi in cui non si può fare a meno di utilizzare uno specifico programma, vuoi perché in Linux non è disponibile un'alternativa, vuoi perché le alternative esistenti non sono sufficientemente valide (ad esempio non leggono correttamente i documenti su cui si sta lavorando, o mancano di funzioni essenziali).

La soluzione migliore, in tal caso, è offerta dalla virtualizzazione, alternativa più efficiente, su macchine recenti, al dual boot. Mentre con il secondo approccio si tengono sulla stessa macchina i due sistemi operativi, scegliendo quale utilizzare a ciascun avvio ed essendo eventualmente costretti a riavviare qualora si volesse anche solo temporaneamente utilizzare l'altro, la virtualizzazione consiste nell'assegnazione di risorse (memoria, CPU, un pezzo di disco) ad un computer appunto virtuale, emulato internamente dall'altro.

In tal modo, utilizzando Linux come sistema operativo principale, si può ‘accendere’ la macchina virtuale, avviando Windows in una finestra a sé stante che non interferisca con il resto del computer se non nelle forme imposte dalla virtualizzazione stessa.

Successi virtuali

Ho sperimentato personalmente e con successo questa situazione, che mi è tornata utile in almeno due momenti: la necessità di utilizzare Microsoft Office per la rendicontazione di un progetto che doveva seguire un ben preciso modello costruito in Excel, con tanto di macro ed altre funzioni per le quali l'OpenOffice.org di allora non forniva sufficiente compatibilità, e più recentemente per recuperare rubrica e messaggi dal mio cellulare non proprio defunto ma nemmeno proprio funzionante.

Ma il mio più grande successo in tal senso è stato un ingegnere incallito che usa per lavoro i computer dai tempi in cui 64K erano un lusso e doveva attendere la notte per poter utilizzare tutti e 256 i kilobyte di una macchina normalmente segmentata per il timesharing. Stiamo parlando di un uomo che è passato al Lotus 1-2-3 quando dal VisiCalc non si poteva più spremere una goccia, per poi restare con il QuattroPro sotto DOS finché i problemi di compatibilità non hanno superato i benefici dell'abitudine, e che ha riformattato il computer nuovo per poter rimettere Excel 95 su Windows XP per limitare al minimo indispensabile i cambiamenti rispetto alla sua macchina precedente.

Stiamo parlando di un uomo che si è convinto a mettere Linux solo dopo la terza irrecuperabile morte del suddetto Windows XP ed il mio ormai totale e definitivo (nonché abbastanza incazzato) rifiuto ad offrirgli il benché minimo aiuto per qualunque tipo di problemi gli si dovesse presentare con la sua beneamata configurazione. (E sinceramente non ne potevo più di sentirmi raccontare ogni volta di come Windows crashava, di come Excel rifiutava di salvare, di come questo, di come quest'altro.) Un uomo che si è convinto soltanto a condizione che (1) potessi trovargli sotto Linux qualcosa che potesse sostituire in maniera integrale le sue due uniche grosse applicazioni (Excel ed AutoCAD) senza fargli perdere alcunché del lavoro svolto fino ad allora e (2) gli offrissi aiuto ogni volta che avesse problemi con il nuovo sistema operativo.

Avendo già aiutato altre persone nella migrazione, il secondo punto non era affatto un problema: chi mi conosce sa bene che non mi sono mai rifiutato di aiutare gente che avesse problemi con il computer, ma ho recentemente maturato la decisione di rifiutare categoricamente aiuto a chi avesse problemi con Windows, con lo specifico obiettivo di far notare che il sistema operativo in questione non è affatto più ‘amichevole’ nei confronti dell'utente.

Per il primo punto, il problema è stato maggiore: anche l'ultima versione di LibreOffice continua ad avere problemi con i complicatissimi fogli Excel di mio padre, e nessun CAD disponibile per Linux regge minimamente il confronto con AutoCAD.

La virtualizzazione è quindi stata la soluzione da me proposta: una installazione pulita di Windows XP con solo i programmi in questione, in una macchina virtuale gestita da Linux; dati salvati in Linux su una directory accessibile come disco di rete dalla macchina virtuale; copia di riserva della macchina virtuale, con cui sovrascrivere quella in uso in caso si sviluppino problemi.

Fortunatamente, il supporto hardware per la virtualizzazione sul suo computer si è rivelato sufficiente ad un comodo utilizzo quotidiano. Sfortunatamente, l'ingegnere in questione non riesce a trovare la pazienza di imparare gli equivalenti in Linux di quell'infinità di piccoli programmini che era solito utilizzare sotto Windows, quindi l'installazione pulita si sporca poco dopo il ripristino, e Linux viene usato quasi principalmente per la navigazione in internet (tramite Firefox).

La situazione è però alquanto soddisfacente, con l'unico neo di non poter usufruire della complessa interfaccia basata su Silverlight che i siti RAI offrono per la visualizzazione delle trasmissioni (in particolare AnnoZero). Di Silverlight, tecnologia della Microsoft, esiste una parziale implementazione in Linux tramite Moonlight, ma a quanto pare il plugin permette solo la visualizzazione della pubblicità, mentre la trasmissione vera e propria rimane inaccessibile.

Anche senza tener conto del fatto che la stessa Microsoft sta pensando di abbandonare .NET e l'associato Silverlight per la prossima versione di Windows, la scelta della RAI (o di chi per lei; a chi è stato appaltato il lavoro della piattaforma web multimediale della RAI? Telecom? ) puzza da lontano di quel tipo di scelte che hanno favorito, nei lontani anni '90, la nascita di quella monocultura web dei danni della quale ho già parlato.

Fuga dalle monoculture

Comincia ora a vedersi la possibilità di un'emersione dalla monocultura Windows, con il diffondersi dei Mac dal lato trendy (e con un pericoloso rischio di sviluppo di una nuova monocultura che si sostituisca a quella esistente), e di Linux dall'altro. C'è da sperare che le quote di ciascun sistema raggiungano livelli tali da risanare l'ecosistema senza rischiare nuove degenerazioni. Fino ad allora, ci sarà sempre un po' di corrente contraria da affrontare, ma nulla di impossibile.

In questo ha sicuramente aiutato molto, come ho già detto, Ubuntu. Ultimamente, però, le nuove uscite sono state significativamente meno attraenti delle precedenti: tra una deriva ‘alla Apple’ dal punto di vista stilistico e funzionale ed alcune scelte troppo sperimentali per una piattaforma che si pone e propone come pronta per gli utonti, consiglio caldamente di permanere saldi sulla 10.04, attendendo con un po' di pazienza il decadere dell'attitudine ‘giocattolosa’ con cui Shuttleworth sta ultimamente gestendo Ubuntu, o magari l'emersione di una nuova alternativa un po' meno ‘coraggiosa’.

Le alternative a Windows ormai ci sono, e sono valide. Ma soprattutto, per fortuna, se ne stanno accorgendo tutti, indicando il superamento dell'ostacolo più grosso a qualunque progresso: il cambiamento di atteggiamento, di mentalità diffusa.

  1. è vero che sono disponibili viewer (per Windows) che non comprendano l'intera suite; è anche vero che la maggior parte dei formati, grazie a notevoli sforzi di reverse-engineering, sono ormai per lo più accessibili anche in altre suite, il problema della “compatibilità completa” rimane, e nonostante essa non sia garantita nemmeno da versioni diverse della suite MS, è anche vero che per i punti più delicati della formattazione altre suite possono differire più sensibilmente. ↩

  2. non si tratta di un errore di digitazione, bensí del termine spesso usato per indicare gli utenti con scarsa dimestichezza con gli strumenti informatici. ↩

API sociali

In attesa di attivare le funzioni interattive del wok (commenti, pagine a modifica libera, etc), ho iniziato a lavorare ad una forma minoritaria di integrazione con i social network. Ho colto l'occasione per ampliare il lavoro già fatto per il mio UserJS per mostrare i commenti di FriendFeed in qualunque pagina, del quale ho creato una versione specificamente mirata al wok. Le sue feature attuali sono

  • lavora su ogni pagina del wok (anche nella mia versione locale), nonché su ogni permalink presente nella pagina;
  • cerca tutte le entries su FriendFeed e su Twitter che facciano riferimento alla pagina/permalink in questione;
  • ignora le entries del sottoscritto, a meno che non abbiamo a loro volta commenti o like.

Il passo successivo che mi sarebbe piaciuto compiere sarebbe stato quello di aggiungere anche i riferimenti via Buzz, ed ho qui trovato il primo intoppo serio: non è possibile cercare activities che facciano riferimento ad un indirizzo preciso; la migliore approsimazione che si possa avere è cercare qualcosa per titolo, ma ovviamente i falsi positivi crescono così in quantità industriale.

Benché non sia certo questo ad impedire a Buzz di decollare come social network, è indubbio che è un limite che non l'aiuta per nulla, in special modo se confrontato con i limiti in cui si inciampa invece cercando di usare le API di FriendFeed1 o di Twitter2.

Lo script è ancora in formato UserJS/GreaseMonkey, scaricabile da qui in attesa di venire integrato con il wok stesso (repository). Testers welcome.

Nel frattempo penserò anche ad un modo intelligente di integrare qualcos'altro.

Aggiornamento al 2011-02-08

Scopiazzando qui e là (in particolare dai link ai social network in fondo agli articoli su Metilparaben) sono riuscito a cavare un piccolo ragno dal buco, scoprendo quali API di Buzz e FaceBook possono essere utilizzate quanto meno per ricavare il numero di commenti/riferimenti/apprezzamenti, se non il loro contenuto. Nel frattempo ho scoperto pure che la ricerca su Twitter restituisce solo contenuti, quindi dopo un po' pagine che sapevo avere riferimenti non ne rivelano.

Solo FriendFeed rimane il mio grande campione di sociabilità. Ma l'ho sempre sostenuto che era il social network “come doveva essere fatto”.

  1. la ricerca su FriendFeed non funziona più da un bel po' di tempo, né via sito, né via API, ed io sono fortunato a dover usare l'API in sola lettura e senza autenticazione, perché a sentire quelli che ci devono lavorare sul serio è messa proprio male. ↩

  2. Twitter attua un processo di risanamento del parametro di callback per JSONP che rende molto macchinoso chiamare una funzione a più parametri preassegnando i valor per i parametri che non sono i dati, cosa necessaria per caricare i dati di ciascun permalink distinguendone chiaramente le origini. ↩

Provveditorato agli Studi di Enna

Get the code for
UserJS/Greasemonkey fixer per il Provveditorato agli Studi di Enna:


Il sito del Provveditorato agli Studi di Enna è uno di quei siti istituzionali che, in quanto tale, dovrebbe essere ad accesso universale: dovrebbe, in altre parole, essere (facilmente) consultabile da qualunque browser, testuale, aurale o con interfaccia grafica, per Windows, Linux, Mac OS, Wii, cellulare o quant'altro.

Invece, figlio com'è della monocultura web dell'inizio del millennio, è orribile e disfunzionale. In particolare, una delle sue funzioni più importanti (la presentazione delle “ultime novità”) non solo è esteticamente offensiva, ma per giunta funziona solo in Internet Explorer. In aggiunta, il menu laterale richiede (inutilmente) il pesantissimo Java ed è, nuovamente, disfunzionale: i link ai rispettivi contenuti funzionano (nuovamente) solo in Internet Explorer.

Per vedere se fosse possibile porre rimedio a questi limiti, ho dovuto dare un'occhiata al codice costituente della pagina: vomitevolmente offensivo per qualunque sviluppatore web, è evidente il figlio tipico della cultura del “copincollo facendo presto e male, basta che funziona con IE6” alimentata dalla succitata e fortunatamente ormai passata monocultura web.

Per fortuna, mi è anche stato possibile sistemare almeno il più grosso dei problemi della pagina: uno script utilizzabile sia con Opera sia con l'estensione GreaseMonkey di Firefox e che rende finalmente visibili le news.

Anche il menu soffre di disfunzionalità, sia per la necessità di avere Java, sia per i collegamenti alle voci specificati in versione Windows e quindi inutilizzabili al di fuori della monocultura. Lo script pone rimedio anche a questo sostituendo il menu Java con un semplice menu HTML con opportuno stile CSS e correggendo gli indirizzi di destinazione.

Questo è il meglio che si può fare per ora. In particolare, mi sarebbe piaciuto effettuare la sostituzione del menu prima del caricamento di Java, ma la cosa purtroppo non è possibile con uno script per GreaseMonkey (avrei potuto farlo se mi fossi limitato agli UserJS di Opera).

(Lo sviluppo dello script può essere seguito dal suo repository git)

Monocultura nel web

La conclusione della prima browser war verso la fine dello scorso millennio portò ad una solida monoculura: “tutti” avevano Internet Explorer 6 (IE6) su Windows (generalmente senza nemmeno sapere esattamente cosa fosse, se non vagamente “l'internet” per i più sofisticati).

L'apparente o quantomeno temporanea vittoria della Microsoft divenne rapidamente un problema non solo per quei pochi alieni che non usavano IE1, ma anche per tutti gli omogeneizzati nella monocultura dominante, grazie alle gigantesche falle di sicurezza offerte dal browser Microsoft: l'insicurezza di Windows, avente come vettori principali di attacco proprio il browser ed il consociato programma di posta elettronica2, è un fardello con cui tuttora la Microsoft, e soprattutto i suoi utenti, devono fare i conti.

Ma l'eredità della monocultura d'inizio millennio non si limita a questo. La possibilità di creare “facilmente” pagine web quando non interi siti con strumenti poco adatti (quali ad esempio la suite per ufficio della stessa Microsoft), senza grandi conoscenze tecniche (bastante ad esempio spesso la capacità di usare un motore di ricerca e di copincollare codice) ha comportato la diffusione di pagine web di pessima qualità dal punto di vista tecnico, e soprattutto scarsamente fruibili, generalmente solo per ragioni estestiche, ma troppo spesso anche per questioni funzionali, su browser diversi da quello dominante. (Perché d'altronde sprecarsi per quel misero 10% di outsiders?)

Lo scarso interesse della Microsoft per il web come piattaforma ha fatto sì che la monocultura da lei dominata ne arenasse lo sviluppo, soprattutto in termini di interattività: per cinque anni abbondanti (un lunghissimo periodo in campo informatico), le potenzialità offerte dalla sempre più diffusa banda larga sono esulati dal linguaggio specifico del web (HTML), rimanendo dominio quasi incontrastato di tecnologie “supplementari”, prima Java, quindi Flash, disponibili un po' per tutti, e i famigerati (quanto pericolosi) controlli ActiveX specifici proprio di IE6.

Nel frattempo il lavoro dei pochi campioni di quel misero 10% di outsiders, piuttosto che arrendersi e gettare la spugna, ha lavorato dapprima per coprire il breve passo che lo separava dalle funzionalità offerte dal famigerato browser della Microsoft, e quindi per aggiungere nuove funzionalità. Dalle ceneri degli sconfitti dello scorso millenio è nato un web solido e sempre più capace, lasciando ben presto gli sviluppatori web davanti ad una scelta: creare contenuti seguendo i nuovi standard con un sempre più promettente futuro, o limitarsi alle possibilità offerte dal browser attualmente dominante ma dalla posizione sempre più incerta?

Con l'avvio della transizione al nuovo, la stagnante monocultura ha cominciato a manifestarsi sempre più evidentemente come la pesante palla al piede che era in realtà sempre stata: tecnicamente inferiore e limitante, un peso per gli sviluppatori, ed un pericolo per gli utenti.

Creare pagine web universali si è sempre più rivelato per la sua assurda natura: scrivere codice una volta per “tutti gli altri” e quindi ricorrere a penosi e complessi artifici per qualcosa che era non solo cronologicamente del secolo scorso. Persino la Microsoft stessa, quando il crescente successo delle alternative3 l'ha costretta infine a riprendere lo sviluppo di Internet Explorer, si è trovata ad avere come ostacolo principale proprio gli utenti che, affidatisi allora a programmi sviluppati in forma di controlli ActiveX per IE6, si sono ritrovati a non poter aggiornare il browser salvo perdere l'uso dei suddetti programmi, spesso necessari per lavoro.

E se la reticenza all'aggiornamento è il più grave problema contro cui deve combattere la responsabile della monocultura, l'eredità del “pensiero pigro” che l'ha accompagnata la devono invece pagare quegli utenti che si ritrovano ancora a combattere contro siti che, per loro natura, avrebbero sempre dovuto essere di universale accessibilità, ma che purtroppo, gravemente, non lo sono.


  1. vuoi per volontà, vuoi perché impossibilitati dall'uso di piattaforme diverse da Windows, quali Linux o Mac OS ↩

  2. Outlook Express ↩

  3. soprattutto Firefox, presto seguito da Safari e più recentemente da Chrome (per qualche motivo, la percentuale di utenti Opera non è cambiata mai molto, nonostante la sua superiorità tecnica) ↩

Why a Wok?

Si inizia un blog dando al mezzo di comunicazione il suo valore etimologico: web log, diario in rete. Si scoprono altri usi, dallo sfogo sentimentale allo zibaldone di riflessioni, dalla filosofia alla critica letteraria, dalle analisi storiche, politiche e sociologiche alla narrativa.

Si smette di tenere un blog in genere progressivamente, e per una varietà di motivi. Ma soprattutto, in una forma o nell'altra, per mancanza di tempo: vuoi perché si hanno talmente tante cose da fare da non trovare più il tempo di scriverle; vuoi perché non si hanno molte cose da fare, ed allora ci si dedica alla loro ricerca (e comunque non è che allora abbondi la voglia di scrivere: su cosa si scriverebbe, poi?); vuoi perché si perde semplicemente interesse nel tenere questa finestra aperta sul mondo, e si preferisce un metodo comunicativo più semplice ed immediato, un Twitter o un Tumblr o un abusato FriendFeed, ma soprattutto FaceBook; un metodo, soprattutto, che per ogni intervento non ci faccia sentire quella pressione che può derivare dall'idea di avere dei lettori, dal bisogno, conscio o inconscio, di offrire loro qualcosa di qualità.

Nel mio caso, oltre forse ad un misto di quanto sopra, si è trattato molto di una sensasione di limitatezza del formato preso in considerazione. E benché la limitatezza della piattaforma del mio precedente blog sia stata un forte incentivo a cercare alternative, nessuna di quelle da me viste (da Splinder a LiveJournal, passando per l'ormai diffusissimo WordPress) mi sono sembrate “quello che cercavo”.

Quello che cercavo

Molte delle mie esigenze per il sostituto del mio blog hanno radici nella mia natura molto geek di matematico e soprattutto di programmatore.

Ad esempio, l'esigenza di poter lavorare ai contenuti con i miei abituali strumenti di scrittura: Vim in un terminale, riposante schermo nero con testo bianco, senza fronzoli (se non eventuali sofisticherie quali il syntax highlighting) e distrazioni.

Ad esempio, la possibilità di pubblicare in maniera semplice ed immediata senza nemmeno dovermi preoccupare di aprire un browser.

Ad esempio, la possibilità di avere traccia della storia dei contenuti, tutte le fasi di modifica, tutte le revisioni.

Infine, dal punto di vista più esteriore, qualcosa che offrisse più che la classica interazione scrittore-lettore del blog. Più d'una volta, nella mia trascorsa vita da blogger, mi sono ritrovato tra le mani cose che richiedevano un'interazione più ricca, scambi più approfonditi, proposte o richieste che potessero più semplicemente essere soddisfatte da “gli altri”.

Da queste esigenze nasce il wok.

Il Wok

Rubo il termine dal nome della tradizionale pentola di origine cinese per almeno due buoni motivi. Il primo, di natura fonica, è data dalla somiglianza tra il termine “wok” ed un'opportuna commistione di “wiki” e “blog”. Il secondo, di natura invece funzionale, è legato alla grande flessibilità del Wok in cucina, dove può essere impiegato per tipi di cotture che vanno dal bollito al fritto passando per il rosolato e la cottura a vapore. Come piattaforma per la gestione di contenuti, ci si può aspettare da un wok la stessa flessibilità, e quindi la possibilità di ospitare (tutte) le (principali) forme di espressione (testuale) del web:

  • lo sfogo (localmente) individuale, forse anche “lirico”, per il quale interventi di seconde parti, anche quando benvenuti, rimangono esterni al corpo principale; ovvero i contenuti che caratterizzano in maniera sostanziale il blog;
  • il dibattito tra più parti, un tempo dominio di mailing list e newsgroup ed ormai dominato dai forum, laddove questi sopravvivano;
  • i contenuti che nascono, crescono e si rifiniscono grazie alla collaborazione di più partecipanti, il cui ambiente naturale è una Wiki;
  • banali, classiche, statiche pagine web “1.0”.

La base tecnica di questo wok è ikiwiki, un compilatore per wiki con svariati possibili usi (tra cui forum e blog), i cui contenuti sono semplici file e che si appoggia a sistemi esistenti di revision control (tra cui il mio favorito git) per preservarne la storia. Non è difficile capire perché l'abbia scelto, anche se personalizzazioni e rodaggio (che verranno qui documentati) sono necessari perché da questa base si possa raggiungere la vera natura del wok nella forma e nei modi a me più consoni.

Limiti tecnici correnti

Nella distribuzione ufficiale di ikiwiki mancano le seguenti capacità perché il wok mi sia tecnicamente soddisfacente:

  • categorizzazioni multiple: ikiwiki supporta di default solo i tag, per cui eventuali categorizzazioni supplementari (rubriche etc) devono essere implementate con plugin esterni; cose per cui può servire:
    • una migliore gestione delle collection; il nuovo sistema delle trail aiuta a facilitare la gestione delle collection, ma namespace per i tag aiuterebbero comunque a marcare in maniera più ‘discreta’ i capitoli;
    • draft e wip dovrebbero essere categorie distinte dai tag
  • un indice della pagina corrente nella barra laterale (questo si può risolvere con un po' di javascript, come fatto dal sottoscritto per la pagina principale del wok, ma una soluzione senza javascript sarebbe ovviamente preferibile)
  • barre funzionali multiple (sinistra, destra, sopra)
  • commenti nidificati
  • un modo per specificare quali pagine hanno un foglio di stile (es. demauro.css per le pagine taggate demauro, ma anche per tutte quelle che le comprendono!)
  • specificare in che lingua è ciascun articolo
  • la possibilità di specificare per un insieme di pagine che i link vanno risolti controllando in specifiche sottodirectory (ad esempio, Postapocalittica dovrebe cercare nel proprio glossario); feature implementata come linkbase, ma non ancora integrata nell'ikiwiki ufficiale.

Altri problemi riscontrati:

  • le note a piè di pagina non funzionano correttamente nelle pagine che raccolgono più documenti, poiché MultiMarkdown usa lo stesso anchor per note a piè di pagina con lo stesso numero ma appartenenti a documenti diversi problema facilmente aggirabile: MultiMarkdown utilizza come codice di riferimento quello indicato dall'utente; basta quindi usare codici di riferimento univoci nei documenti: non perfetto, ma funzionale;
  • le pagine autogenerate vengono attualmente aggiunte al sistema di revision control, nel ramo principale problema risolto dal meccanismo delle transient introdotto in recenti versioni di ikiwiki;

MultiMarkown e Ikiwiki

Chi usa MultiMarkdown con IkiWiki normalmente si appoggia alla versione Perl (2.x). Nella sua forma originale, questa presenta alcuni problemi:

  • non supporta correttamente l'HTML5, con problemi che si manifestano tipicamente nell'HTML prodotto da inline nidificate: pagina A include pagina B che include pagina C; risultato il markup di pagina C in pagina A è pieno di tag p fuori posto)
  • non supporta correttamente note a piè di pagina con più riferimenti nel testo: in una tale situazione, le note vengono duplicate, sempre con lo stesso identificativo (ma numeri diversi)

Entrambe questi problemi sono risolti nel fork di MultiMarkdown matenuto dal sottoscritto.

Irretiti invisibili significanti

Un essere umano che leggesse il calendario delle Letture di San Nicolò l'Arena non avrebbe grosse difficoltà ad identificarlo come tale. Fino a qualche giorno fa, un programma che ‘leggesse’ quella stessa pagina non ne avrebbe invece potuto estrarre i dati essenziali (ovvero le date ed i temi degli appuntamenti).

In questi termini, ciò che differenzia la macchina dall'uomo non è tanto un diverso rapporto tra qualità e quantità d'informazione, quanto piuttosto la diversa forma: la mente umana ha più agio nella comunicazione verbale (orale o scritta) composta in un linguaggio naturale, che è invece notoriamente difficile da elaborare automaticamente (e non parliamo poi dell'informazione visiva).

Mi soffermo sulla forma piuttosto che sulla qualità dell'informazione perché una valutazione qualitativa della comunicazione informale può essere solo contestuale, ed è intrinsecamente soggettiva (ma esistono valutazioni qualitative che non lo sono?). Ad esempio: la (potenziale, e talvolta ricercata) ambiguità del linguaggio naturale aumenta o diminiuisce la qualità dell'informazione comunicata?

Una visione della Rete —e qui parliamo di qualcosa che sicuramente interresserebbe lo Sposonovello, e forse anche Tommy David, ma non certo, ad esempio, Yanez— come mezzo universale per dati, informazione e sapere (secondo il suo padre fondatore Tim Berners-Lee) deve quindi scendere a patti con il fatto che la fruibilità umana e quella automatica hanno richieste ben distinte; e per lungo tempo (e per ovvi motivi) quella umana ha avuto un'alta priorità, rendendo arduo il compito, ad esempio, di quei motori di ricerca (meccanismi automatici di raccolta ed elaborazione (indicizzazione) delle informazioni) cui gli esseri umani stessi si appoggiano per trovare le informazioni di cui vorrebbero fruire.

Se gli esseri umani devono passare attraverso i computer per trovare le informazioni scritte da altri esseri umani, ed i computer non sono (facilmente) in grado di ‘comprendere’ le stesse informazioni, è evidente che si pone un problema. Ed è altrettanto evidente che, nell'attesa che la singolarità tecnologica porti ad un'intelligenza artificiale (che si speri non degeneri in Skynet) in grado di interpretare autonomamente le forme d'informazione umanamente fruibili, è necessario che chi produce le informazioni stesse (e quindi, gli esseri umani stessi) le presentino in una forma consumabile dalle macchine. Ma se l'utenza finale è sempre un altro essere umano, è evidente che le forme umanamente ostiche offerte da certe proposte per la costruzione della Rete ‘semantica’ non sono meno problematiche di quelle attuali.

Una promettente soluzione in questo senso è quella di nascondere le informazioni per le macchine in tutta quella montagna di metainformazioni che sono già presenti (per altri motivi) nelle pagine che propongono contenuti per gli esseri umani: nascono così i microformati, che permettono allo stesso calendario di essere fruibile dalla macchina.

E adesso ho la smania di microformatizzare il mio blog, ma l'unica cosa di nota che sono riuscito a fare è stato aggiungere i tag XFN al blogroll. (Rimando ad altra sede una dissertazione sull'utilità.)