Not by AI

Is it a good idea to sport a badge to attest that one's creatoins are done without using Ai-based tools?

This is a long-form post based on a previous Mastodon thread by yours truly

Some time ago I came across a Mastodon post by linking to a website proposing “Not by AI badges”. The idea is to offer a preset badge (or rather 3 such presets) to “certify that your work (writing, painting or audio/visual) has been produced without (or rather with “less than 10%”) AI (or rather, as proposed by Stefano Quintarelli, SALAMI).

The idea piqued my interest, but I didn't have time to look into it when I first came across the site, so I just bookmarked it, leaving it for a time where cold look into how to add such a badge here. When I finally had time to look into it, I was underwhelmed by what it turned out to be.

The first thing that surprised me was the 10% threshold. If it's “not by AI”, I would expect the threshold to be either 0% (no AI involved at all) or 50% (majority of the work done a human). 10% sounds pretty arbitrary, and it's weakly motivated on the website:

The 90% can include using AI for inspiration purposes or to look for grammatical errors and typos.

Second, there are some very stringent rules about the badge use beyond the 90% criterion (placement, size, no modifications). I can guess some of the reasons why they would want to exert such control on these parameters, but the fact that no licensing or copyright information seems to be available on the website or the download package makes the use of the badge troublesome.

Third, still on legal, the badge holds no actual value, and at the time of writing the site actually is still asking for legal expert to get in contact with the organizers (who are they?) to

explore the potential of formalizing and regulating the use of the Not By AI badge.

Fourth, as if it wasn't enough that nothing is said about who's behind the initiative, they seem to have no Fediverse presence. And yes, this is important: especially when combined with the restrictive terms of use of the badges, it's an indication of a substantial lack of interest (at best) or antagonism (at worst) towards free culture and humane tech, which —given the alleged mission behind the badge— is a significant failure.

So, this «Not by AI» badge has no legal, nor technical (as remarked in a comment to my original thread) merit. What's the use, then?

The most obvious reason to apply the badge seems to be for clout: «proudly show that you're still doing things the human way!»

But I think there's a much more sinister reason behind it, that emerges when you think about cui prodest, who benefits from this badge existing and being so clearly recognizable?

The answer is that the main objective is to tag human-generated content to help scrapers identify non-AI content. Why is this important? Because as the Internet gets flooded by low quality machine-generated content, it will be more and more difficult for scrapers to find actually useful content to train their AI on (see also this minithread by

And what's the best way to achieve that? Ask humans to tag the contents themselves! (Ironically, this need is highlighted on the “Not by AI” website itself, although it's spun under a pretense of “helping humanity move forward”, rather than as a way to feed sufficient noise into SALAMI models to make them produce more varied content.)

In other words, my suspicion is that by using the badge you'd just be setting yourself up for maximum targeting by SALAMI training bots: you'd be certifying to scrapers that your creative output can be used to train their models safely.

Don't use the badge

I'm not saying that it's not important to advertise when one's art has been produced without AI: there absolutely is merit to it, especially when looking at the implications of using a tool built on nonconsensual exploitation of the creativity of others. But if the intent is to communicate human-to-human, there is no benefit in automating the signaling of human effort and making it machine-processable —in fact, it's going to be counter-productive in the long term.

I don't dislike the idea of the badge. In fact, I may even be working on rolling my own. And it'll be a hand-rolled multi-lingual SVG, like I do. And it will be released on a CC-BY-SA license, and people will be encouraged to modify it, customizing it for their own purposes.

On the other hand, everybody using a derivative of my hypothetical badge would still be “detectable” by AI scrapers, so maybe that's not such a hot idea …

Meanwhile, consider looking into the brilliant alternative by, an “AI data usage statement” that reads:

By using any of my writing, including social media posts or other communications, as training data for any AI writing tool, you are hereby agreeing to be legally bound to pay me for any and all text used, at my special AI premium rate of €5.50/word.

All payments are required in full within seven days. Late payments or use of any of my writing without disclosure may incur an additional penalty fee of up to €18/word plus full payment of any necessary legal fees.

I doubt it would be legally enforceable —which is, to be honest, quite a pity— but it undoubtedly works both as testament of the humanity of the work it refers to, and to show a clear intent in the relationship between this and the possibility for SALAMI to train on it —something which is completely omitted from the “Not by AI” badges.

And it's really time we started to be clear and loud on something much more important than our work being “Not by AI”: the fact that our work is “Not for AI” either.

By humans, for humans.

The <switch> element

Things I wish HTML inherited from SVG: the <switch> element


This is an expanded, blog-form version of a recent thread of mine on Mastodon

I love the SVG format. It's not perfect, but it has some amazing features, and with all the issues in its support across different browsers, it still remains a solid vector graphics format.

One of the things I love the most about SVG is that it allows interaction and dynamic content without requiring JavaScript.

This isn't actually an SVG feature per se, but it's related to the specification integrating support for SMIL, an XML language for dynamic content.

SVG also supports and incredibly powerful element: <switch>. The combination of switch and SMIL allows some impressively sophisticated things do be achieved in SVG, without using JavaScript or serverside funkiness: and honestly, I love these features so much that I really wish HTML was extended to support them too.

In fact, there was an attempt to add SMIL support in HTML: it was called TIME (Timed Interactive Multimedia Extension), and was proposed proposed by Microsoft and Macromedia and, after being submitted to the W3C, evolved into the W3C Note (not even a recommendation) for XHTML+SMIL.

No other browser than Internet Explorer ever added support for it, and honestly, I see that as a loss.

With the integration of MathML and SVG standards into HTML5, there is actually some hope (if just a sliver) of things moving forward in this direction, although I doubt any of the existing implementation actually plans on investing resources in it. One of the benefits of having more competition in this area would be better chances of a growth in this regard.

I actually wonder if some kind of JavaScript polyfill could be created to implement support for these features without UA support. It would be suboptimal, similarly to how MathJax is inferior to UA support for MathML, but could work as a stopgap solution to promote the adoption and standardization of these extensions.

An HTML switch polyfill?

I've tried a quick test to see if you can exploit the HTML5 inclusion of SVG to do without the polyfill, but since you can't just randomly throw SVG elements in the HTML parts of the document and expect it to work, to make it actually work you need a double wrapping, passing through the SVG foreignObject element and put the HTML in there:

body > svg > switch > [foreignObject > your HTML here]+

and this requires a lot of efforts because sizing and spacing have to be handled manually.

You can almost implement the SVG switch element in pure HTML + CSS with something like:

switch > * {
    display: none;
switch > *:lang(...) {
    display: initial

with only one issue: there's no way to put in that :lang() pseudo-class “whatever the user asked for”.

So you still need some JavaScript or server-side assistance to bridge the gap between the user language selection and the styling.

So close, yet so far away …

An HTML switch polyfill

If we do things a bit more cleanly in CSS (to account for switch elements inside SVGs embedded in HTML5), and add a little bit of JavaScript to handle the language check, it turns out you can polyfill a switch element for HTML!

(How? I'll show this a little bit later|.)

Testing this across browsers, however, I ended up discovering that when it comes to the SVG switch element, there are discrepancies in which child is selected when the user voices a preference for multiple acceptable user languages

Choosing the “best” language

So: the switch element is typically employed together with the systemLanguage attribute of its immediate children, as a way to display different content depending on the language choice of the user. Per the specification, the switch element should select

the first child that matches the user's language preference.

Now, there are two ways to do this when the user accepts multiple languages.

One is: for every language accepted by the user, find the first matching element.

The other is: find the first element that matches any of the user languages

It turns out that Firefox adopts the first strategy, while WebKit and Blink browsers the second.

Which one is correct?

If I look at the SVG specification about the systemLanguage attribute, the text says:

Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match of one of the language tags given in the value of this parameter, or if one of the language tags indicated by user preferences is a case-insensitive prefix of one of the language tags given in the value of this parameter such that the first tag character following the prefix is "-".

My interpretation of this is that the correct way to handle the switch element would b the second one (used in WebKit/Blink) rather than the first one. On the other hand, when it comes to the specification of the switch element, we have

In SVG, when evaluating the ‘systemLanguage’ attribute, the order of evaluation of descendant elements of the ‘switch’ element must be as if the 'allowReorder' attribute, defined in the SMIL specification always has a value of 'yes'.

This means that a UA can reorder them so that the match with the highest preference has priority, and this is correct too. In fact, the SMIL specification clear says about allowReorder:

User agents are free to ignore the allowReorder attribute, but if they implement prioritized language ranges as defined in BCP47 they are expected to use that prioritization to reorder children with systemLanguage attributes. The effect should be that the users are presented with the alternative that best matches their language preferences. Any final child without systemLanguage attribute should retain its place as the default item to present.

Authors should add the allowReorder attribute if all items in the switch are equivalent.

So I hate the SVG switch element now. (OK, not really, but I dislike that different results are possible still following the specification).

It turns out that both interpretations are possible: the indication about allowReorder is that if true the UA should prioritize languages by the user preference, but the UA is free to ignore it, so one may consider Firefox to be better adhering to the specification spirit (give the user control), but WebKit/Blink are still correct simply by ignoring the possibility to reorder (which is good for speed, even if by the note above, that is only informative they would be expected to do the reordering).

Now, why is this important for me? Because I have to choose which strategy to implement in my JavaScript of the polyfill for the switch element in HTML: the “fast” way (no reorder) was easy to implement, but the reordering one should be contemplated too, and possibly given preference.

To reorder or not to reorder?

To clarify, the difference is that with reordering, the reader has priority in choosing the version, without reorder it's the writer that chooses.

Let's say I write a text in Italian, but also produce an English translation. My preference as a writer would be for a reader that understand Italian, even if it's not their preferred language, to read the original Italian text. With the reordering, the user preference for English over Italian means they would get the translation, even if they could understand the original.

One of the interesting advantages of the polyfill is that at least conceptually it can be overridden, for example providing interactive elements to allow users to force a specific language without changing the browser preferences. I'm not sure this is possible in SVG. (I tried, and couldn't make it work without duplication, but this may be a UA issue, I'll have to take it up with them).

SVG switch element in action

By the way, if you're unfamiliar with how the SVG switch element works, you can see it in action in some of the SVGs shown below.

All of them have text in them (some more, some less), and you will see the text in some language, but others will see it in a different language. Which one you see it in depends on a combination of your language preferences configured on the browser, and on the actual browser you're using|.

If you wish to actually see the element in action, and the text changing, you will have to (temporarily) configure your browser to prefer different languages, and reload the images.


The first multilingual SVG I explicitly for the wok is the printable SVG template I prepared to play Boulet's “Hybrids” game (see also the Italian article I wrote when I first published it):

Schema per giocare a Ibridi

Whatever little text is there, it should be translated in your language (assuming it's one of: en, it, fr, de, ca, es, pt) —if yours isn't there, and you let me know the singular for dice and player, I'll try adding them. Corrections welcome too.

On the usefulness of prayer

My first attempt at using switch was actually much older than that, and it was an attempt at recreating in SVG a meme on prayer that has been circulating on the Internet at least since 2011:

On the usefulness of prayer

I'm not interested in debating the meme here, so please spare your time (and most importantly my time) and go debate it somewhere else (such as this 2014 blog article about it), but if you do wish to provide translations for the text in other languages, then please do let me know: currently I only have Italian, French and English (the latter should be the fallback, so the one you see if your primary language is not among the supported ones)

Circular reasoning works because circular reasoning works

The last one is something I originally only did in English only, again based on a who-knows-how-old meme circulating on the Internet, so I took the opportunity of this article to revamp it and add additional languages:

Circular reasoning works because circular reasoning works

Again, you should be seeing the text in your language, provided it's among the supported ones (Italian and French), or English otherwise. Please do let me know of translations in other languages, I'll gladly add them, and do let me know if any of the translations are not up to par.

HTML switch element

Let's see now how the HTML switch element can be polyfilled. The ingredients are:

  • a browser that handles unknown HTML elements correctly;
  • a few lines of CSS styling to determine when children of the switch element should be shown;
  • a few more lines of JavaScript to actually mark the children appropriately.

The additional conditions are:

  • the default styling should display the fallback switch child (if present) if JavaScript is disabled;
  • neither the styling nor the JavaScript polyfill should handle switch elements that are handled by the browser

An example of all this has been neatly packaged up in this sample test file.

The CSS polyfill

The CSS style is relatively simple:

switch > * { display: none }
svg switch > * { display: initial }
switch > *:not([systemLanguage]):not(.html-switch-false)
{ display: initial }
switch > .html-switch-true { display: initial }

It hides immediate children of the switch element with the following exceptions:

  • when the switch is a descendant of an SVG element (because these will be handled by the SVG renderer in the browser);
  • immediate children without a systemLanguage attribute, unless they are marked with the class html-switch-false: this ensures that the fallback is handled correctly (even if JavaScript is disabled;
  • immediate children with an html-switch-true class.

Obviously, the html-switch-true and html-switch-false classes are the ones that will be set by the JavaScript polyfill to mark items that should (not) be visible.

The style is not perfect. For example, it doesn't handle HTML switch elements that would appear inside a foreignObject inside an SVG, which may cause issues (I haven't tested), and if no JavaScript is used and more than one child has no systemLanguage attribute, they will all be shown.

The JavaScript polyfill

This is where the “magic” happens: on document load, we run a function that goes over every switch element that isn't recognized by the browser (and is thus represented in the DOM as an HTMLUnknownElement), and finds “the first child that matches the user language”. Both reorder and no-reorder versions of the algorithm are possible, and have been implemented in the sample file. (I'm not going to paste the code here; it's not long, but not even short.)

Like for the CSS, the JavaScript I've implemented so far isn't perfect: it doesn't play nice with dynamic content (although one may wonder why a switch element would be generated via JavaScript), and it hasn't been thoroughly tested. I also have no idea how well it plays with accessibility (although I would assume that the display: none CSS would make it work ‘as expected’; do let me know how it works for you, though).

Lessons learned (and things to look into)

Issues with the SVG switch element and its implementation

With all its power, the SVG switch element has some limitations, the most important of which is that only a limited subset of the SVG elements can be used as children, and the element itself can only be use as child to a limited set of other elements.

This leads to a lot of duplication. For example, in reference to the circular reasoning example, the text and textPath elements have to be duplicated for each language, rather than using a single text > textPath nesting with a switch on tspan elements for each of the languages.

While there may be good reasons for these restrictions (for example, different languages may have very different requirements in terms of sizing and proportions of the elements), it makes the use of the switch element exceedingly bothersome whenever those reasons do not apply, and especially when the author has to go n and introduce changes to the wrapping elements that could be shared by all variants.

Even worse, it makes it much harder to build SVGs where language selection can be done both switch-wise and through dynamic interactions.

(Of course it's also possible that I'm just missing some obvious alternative solution —my knowledge of SVG is still largely amateurish anyway— or the browser is failing me.)

The fact that user agents with the same language settings can produce different results is also annoying, and potentially disruptive. It can be argued that the “no reorder” path taken by WebKit and Blink is lazy, but ultimately it's the specification not being stricter in this regard that gives them the leeway to act this way.

Ultimately, the possibly biggest issue at hand is that most UAs don't provide a simple way to change the language preference. I had the opportunity to discuss this also in a separate context in this Mastodon thread started by we really need some fresh blood in the browser space to bring forward “revolutionary” ideas like … allowing the user to choose a language easily without requiring each website to reinvent the wheel in this regard.

Should I propose an HTML switch element?

The WHATWG apparently has a procedure to ask for new features. I guess if I had some time to throw at this I could go there and submit a proposal to add a switch element to HTML too, or even to incorporate SMIL support into HTML5.

However, I have my doubts, even with a polyfill like the one presented here available for demo purposes, that this would garner enough attention, given implementors can't even be arsed with properly supporting multilingual titles in SVG, or giving users easier (and more fine-grained) controls on their language preferences for websites.

(That being said, if anybody wants to give it a go, I'll be happy to support them. I even have the use case right here.)


Riscoprire con soddisfazione cose proprie dal passato

La riscoperta

Ho recentemente riscoperto una cosina a cui avevo lavorato qualche anno fa: due check-list “grafiche” di cose da ricordarsi di verificare prima di uscire di casa o di andare a dormire.

È stata una riscoperta appassionante: ho rivisto quello che avevo fatto non solo con piacere, ma direi quasi con ammirazione. L'intero minisito è un piccolo gioiello di tecnologia web codificato a mano.

Le immagini

Le immagini che descrivono le cose da non dimenticare sono SVG, codificati a mano come a me piace fare. E benché io non mi sia mai considerato un artista particolarmente capace, rivedendo le immagini non ho potuto fare a meno di osservare che in realtà queste sono piuttosto ben riuscite. Seriamente, guardatele:

Guarda che bombola!


Guarda che lampada!

Peraltro come già osservato, le immagini in sé sono piccole: la maggior parte dell'SVG di queste immagini è composta da metadati, tra cui in particolare il titolo e la licenza.

Sorgente del minisito
git clone

(Per inciso, l'intero minisito in questione è sotto licenza Creative Commons BY-SA 4.0, quindi potete approfittarne, se vi servisse qualcosa; è facilmente scaricabile con git come indicato nella nota a margine.)

Le liste

Il minisito è composto da tre documenti principali: l'indice, e le due liste, individualmente raggiungibili dai “titoli” con cui sono indicati (e da cui sono collegate) nella pagina indice, ovvero la lista per uscire di casa, e la lista per andare a letto

Nel creare il minisito, mi si è posto un problema: come evitare di scrivere le liste due volte, una per la lista vera e propria ed una per l'indice? (Ricordiamo che l'obiettivo qui era di fare tutto a mano: quindi niente strumenti di pre- e post-processing: solo pure e semplici tecnologie web!)

E mentre ci siamo, anche, se possibile, come evitare di scrivere a mano tutta l'impalcatura per qualcosa che dovrebbe essere un semplice indice?

A venirmi in aiuto è stata una delle tecnologie forse piú odiata dai web developer, e che per molti versi è stata “uccisa” dal suo stesso successo, che l'ha trasformata in una insopportabile buzzword: l'XML.

Chi ha letto i link con attenzione avrà notato infatti che le due pagine/liste non sono i classici file index.html, bensí index.xml, e visti senza “stile” sono poco piú di un elenco <item id='nome'>Titolo</item> (ed anzi ora sto pensando che è possibile semplificarli ulteriormente, quindi forse quando leggerete questo articolo scoprirete che sarà solo un <item id='nome' />).

Aggiornamento: Q.E.D., ho ulteriormente semplificato gli indici come annunciato sopra.

Il potere trasformativo dei fogli di stile

La “magia” che trasforma questi XML in HTML è quella dell'XSLT, un linguaggio per trasformare XML in altro XML o, come in questo caso, in HTML.

Le singole liste sono quindi XML che specificano un foglio di stile XSL il quale, applicato dal browser stesso quando la lista viene aperta indipendentemente, trasforma il file XML in un documento HTML completo che il browser può visualizzare senza problemi. La pagina indice del minisito, invece, è un file XHTML che indica un altro foglio di stile per si prende cura di importare i contenuti delle singole liste in un formato piú compatto.

Riflessioni conclusive

Pur sapendo molto bene perché l'XML e l'XSL siano odiati profondamente nell'ambiente del web development, ed avendo avuto io stesso momenti in cui sentivo mio il sentimento della famosa barzelletta dei linguaggi di programmazione che prendono l'XML a mazzate sulle gengive, non posso negare che questi strumenti abbiano la loro utilità —ed ironicamente che questa è tanto maggiore quanto piú si deve (o si desidera) creare documenti web in maniera “artigianale”, con l'instancabile etica lavorativa dell'amanuense.

(Ironicamente: perché, sinceramente, scrivere XML ed XSL a mano è “'na fatica”.)

Forse il modo migliore per illustrare il mio pensiero è dato dal confronto tra queste liste ed il ben piú utile Planner.

Anche questo minisito è stato creato a mano, ma a differenza del precedente è composto solo dai classici HTML, CSS ed SVG, con l'aggiunta di un pizzico di JavaScript per la selezione della data. Il piano della settimana è sostanzialmente una tabella (OK, una raccolta di tabelle) piena di righe vuote. Farle tutte a mano è stato abbastanza noioso, anche se non troppo difficile grazie alla magia del copincolla. Ma non sarebbe stato piú raffinato crearle automaticamente con un XSL, riducendo notevolmente le dimensioni del file? (Ovviamente, qui entriamo nel famoso dilemma: conviene o non conviene automatizzare?)

(E prima che qualcuno intervenga a dire “eh, ma se usi già JavaScript potevi fare tutto con JS”, la risposta è no: solo il minimo indispensabile, sicché il documento sia fruibile, se pur con funzioni ridotte, anche senza. La cosiddetta graceful degradation è uno dei cardini del web accessibile.)

Alla fine, l'XML ed i suoi fogli di stile sono uno strumento come un altro, con pregi e difetti. Passata la manía di XMLizzare tutto, possiamo adesso rilassarci ed apprezzarne (e adottarne!) l'uso dove questo abbia senso ed utilità.

Post Scriptum

Qualcuno potrebbe intervenire dicendo che sfruttare l'XSL è “barare”, non stiamo piú parlando veramente di documenti web “prodotti artigianalmente”, che la strategia che ho adottato non è da amanuense, al massimo da Gutenberg.

Sinceramente, non sono d'accordo. Anche ammettendo che questo approccio stiracchia un po' l'idea della manualità, rimane una soluzione sostanzialmente costruita sugli standard web, senza l'intervento di linguaggi di programmazione1 per la produzione dei contenuti.

Post Scriptum 2

Perché i titoli in tedesco? Perché è lo stereotipo della lingua per le parole composte e desideravo chiamare ogni lista con una sola parola. Peraltro, è una lingua che non conosco se non in maniera molto superficiale, quindi non ho nemmeno idea se le parole che ho scelto alla fine siano giuste.

(Spero che i madrelingua non si offendano all'abuso che ne ho fatto.) ha avuto la cortesia di correggermi il tedesco: non è überprüfenlisten, ma überprüfungslisten (mi sembra di aver capito che -prüfen sarebbe il verbo, mentre -prüfungs sarebbe il sostantivo (?)).

Ho corretto e messo qualche redirect (spero correttamente!), grazie Walter.

  1. no, non condivido l'interpretazione che vede l'XSLT come un linguaggio di programmazione. ↩

An Opera Requiem, Part III: requiem for the open web?

Revisting the open web 10 years after the rendering engine switch of the Opera browser


I first started writing about the grim outlooks of the Opera browser over 10 years ago, forecasting dark times for the open web when the browser switched to Blink the next year, and declaring the Opera browser we knew and loved gone for good when the spin had surpassed any sense of decency after one year more. Since then, Firefox has become my daily driver, and I haven't have much to think or write about Opera until recently, with a long Mastodon thread where I brainstormed about the relationship between the browser, other browsers, and the open web. This article collects those thoughts in a (possibly more organic) long form, both as a means of preservation, and in favor those that find long Mastodon threads not to their liking.

Opera and the open web

I don't think people appreciate the role that Opera Software played in fostering the open web and “indie web” during the first browser wars (when the Opera browser was still built on their proprietary Presto engine), and a fortiori the role it had in their demise (when they switched to being “just another WebKit/Blink skin”), despite their browser never even reaching a 3% market share.

In the five years between the creation of the WHATWG and the switch from Presto to WebKit (and then Blink) by Opera, their role within the working group was essential as an independent standard implementor. Anything that was supported by two out of three (at the time, Apple, Mozilla, Opera) vendors meant different engines implemented the standard. Today, three out of five implementations agreeing is meaningless, since they are most likely just WebKit and its forks.

The Opera/Presto browser was pretty close to being a “Swiss army knife” for the web. Aside from the browser with a solid and modern rendering engine with decent standard support (for the time), it also integrated (in the same UI!) a workable email client, a decent IRC client, and a competitive RSS reader. The browser itself not only had better support for web standards than some of the competitors (including WebKit) in many areas, but it also put effort in supporting microformats.

As an example of how the Opera UI fostered web standards, not only it did automatic feed discovery (allowing subscription to RSS feeds even if they weren't announced on the visible part of the web page), but it famously featured a navigation bar with next/prev/up/top links that could be extracted from appropriately rel-marked link elements in the page (and for many common cases even when they were not properly rel-marked).

But the most impressive (and underrated) feature of Opera was Opera Unite. First introduced in 2009 in a beta release of Opera 10.10, Opera Unite was a web server that allowed JavaScript server-side scripting to write small static and dynamic websites that were accessible either directly (using UPnP to expose it on the Internet) or through a proxy service offered by Opera itself.

Read that again: in the years before its demise, the Opera/Presto browser not only integrated features to access a large chunk of the Internet aside from the web (email, USENET, IRC), but it featured a web server. In a period where most major players were working towards centralization of the web, Opera pioneered an effort that —if successful— would have made it possible for every Internet user to take both a passive and an active role in its participation.

Opera in the Presto days was a pioneer. I already mentioned in the other articles of this series some examples of the UI and technological innovations first demonstrated by Opera and made famous by other browsers. To those examples I will add two more. Anybody that enjoys a Progressive Web App today should be aware of the efforts made by Opera to standardize their Widgets feature, even if the standard they promoted was ultimately obsoleted by the current one, that relies on modern client features that were not available at the time. And the Opera-designed “demonstrative” Unite Applications were media, photo and file sharing applications. Does that make you think of anything?

Sometimes I wonder how different things could have been if the timing had been different. When Opera Unite was first announced, ActivityPub wasn't a thing yet, StatusNet had just been born, diaspora* didn't exist, and the only other major bidirectional federated protocol was XMPP, that had existed for 10 years and was in the process of being “Embrace, Extend, Extinguish”ed by Facebook and Google.

I have no problems imagining a different timeline, where ActivityPub had been already a better-established thing, and the demo Opera Unite applications for media and photo sharing had implemented basic support for it, resulting in self-hosted lightweight alternatives to Pixelfed or Funkwhale.

And this is actually the vision I have as ultimate goal for the Fediverse: one where, thanks also to client support, hosting and participation become even more trivial than setting up a static website.

Requiem for the open web?

In many ways, Opera giving up on their Presto engine marked not only the end of the browser war, with WebKit/Blink the uncontested winner, but it also marked the end of truly inspiring (inspired?) client innovations for the open Internet, although possibly not entirely by its own fault, since in the same period Firefox also largely seemed to “give up” on that front, even going as far as removing features they had (such as their RSS support).

With the modern Opera browser now just a derelict ghost of its past self, hooked into proprietary initiatives (think of its Messenger for closed silo networks) and cryptocurrency shilling, some of its legacy is now being carried by another Chromium skin/fork: Vivaldi. Although I do not appreciate it being partially closed source, or its reliance on Blink (that for example precludes JPEG XL support), it does seem to be still interested in keeping the spirit of the “Swiss army knife of the (open) web”.

One of the interesting ways in which this shows up is that in addition to email, RSS and calendars, Vivaldi has also actively promoted support for Mastodon, in a very simple yet effective way: providing a Web Panel for their instance, and allowing you to add your own. I expect the same will work on other Fediverse platforms, as long as they provide a functional web interface with good “small screen” support (since this is effectively what the Web Panels use).

The Vivaldi browser is the closest thing we have to a “Swiss army knife for the open Internet” today, and yet it doesn't even have feature parity with the late Opera/Presto. For example, it has no IRC client.

But in the context of my vision for the Fediverse, the most glaring omission is the lack of an equivalent to Opera Unite, an incentive to the development of easy-to-deploy self-hosted websites.

Even if Vivaldi (the company) did share my vision of an open web, I have my doubts that it has the energy and workforce necessary push it. The fact that their main product is proprietary (despite the abundance of open source software they leverage) is also a downside. Getting Mozilla on board would be of great help in this, but considering the downwards direction they have taken with Firefox, that's even less likely (seriously, not even RSS?), which is a pity, because two independent browsers implementing support for a common lightweight server applications framework in the spirit of the Opera Unite could be a a major push in the right direction. And even if Vivaldi did invest in something like that, their efforts alone would get nowhere.

People may dismiss the usefulness of the “Swiss army knife” concept pushed by Opera/Presto up to 10 years ago, and by Vivaldi now, citing “bloat”, “lack of focus” or the classic principle of doing “one thing well” instead of a 100 things poorly (sometimes called the Unix philosophy). There is merit to the objection, but I have never seen it put in practice as it should be: on the contrary, feature rejection, or even worse removal, have been to the detriment of “doing one thing well”.

Two of my pet peeves in this regard are with Mozilla Firefox, and in both cases they are about feature removal because of perceived bloat.

The first is the removal for the support of the MNG format. The purported reason for this was the “bloat” coming from linking a 200KB library. Reading the issue tracker for this, 20 years later when Firefox installations are 200MB and counting is … enlightening.

I still care about the MNG format support not for the format itself —it's quite clear that it irredeemably failed— but because the same argument can be used in the future to stymie adoption of other formats such as JPEG XL, which is currently supported in Firefox Nightly, and will likely receive the same treatment (I wonder if with the same excuses) now that Google has decided to drop support for it from Chrome.

In other words, the issue isn't so much with the specific format (although that has its importance: MNG was the best we had at the time for a unified format that supported animation, transparency and optionally lossy compression), but the active choice to not uphold the interests of the open web.

The same thing holds for the second pet peeve of mine: Mozilla's decision to remove RSS and Atom feeds support.

Firefox had some support for all three aspects of web feeds support (discovery, visualization, subscription), and it was all wiped out with the release of Firefox 64, with maintenance cost being the (purported) reason. Even if we accept the motivations and that WebExtensions would be the best way to reimplement the features, the question remains: why didn't Mozilla provide an official extension for it?

If you want an example of why an absence of feed discovery built into the browser (or at least offered through a default-installed official extension), consider this recent post by on —having to jump through hoops, looking at the page source code to find web feeds because the browser has removed the discovery feature is something that can trip even competent experts.

(And yes, the website could advertise the presence of the feeds on the visible part of the page, and the absence of visible links is to be blamed on them, but on the other hand: why duplicate the information when the browser can (and actually used to!) show you the information advertised in the document metadata, where it is supposed to be?)

Mozilla's choice to remove their built-in web feed support without providing an official extension to carry on the legacy is another strike to the open web and indie web on their side.

I often wonder what has been going on inside Mozilla. Firefox reached its largest market share (around 30%) some 10 years ago. Since then, it has been inexorably losing market share. There is little doubt that this has been largely due to the growth of mobile and Google's unfair marketing advantage, but I have little doubt that Mozilla's response has been the worst possible one: they have chosen to get into a “race to the bottom” based on mimicry instead of playing to their strengths or finding new ones through innovation. I can't say for sure that their market share wouldn't have fallen this quickly if they had taken a different path, but I know for sure that there are people who switched because Firefox didn't have anymore a compelling reasons to be used over the competition.

Again, this isn't about MNG or JPEG XL or RSS or web feeds support specifically: it's about the policies priority.

I do understand and appreciate that even just the maintenance of the engine to keep the pace with the evolution of the web standards is a huge undertaking —it's why so many browsers have just given up and chosen to “leech” on WebKit or Blink instead. But when the only reason to use your browser is that it's the only FLOSS alternative to Google's, you have a problem.

The fact that Vivaldi, a Chromium reskin with some proprietary glue, has more personality than Firefox (that doesn't even seem to have a Fediverse presence) is something that should really be a wake-up call for Mozilla

And before anybody gets into the comments to praise Mozilla for its history of web standards and user privacy defense —I don't need you to remind me of that. That's not the point. The point is that to actually be able to do that you need something more than “I'm not Google”. And the irony here is that while Firefox has nothing to claim for itself other than “not Google”, Vivaldi does, even if it's still using Blink as web engine, and is thus subject to Google's whims on that side (one example for all: concerning JPEG XL support). Heck, even the new Opera is more than just “not Google” —even though it's pursuing all the wrong “personality” traits for that.

Why is having a personality important? Because it's one of the pillars on which your capability to defend your position is founded. Mozilla cannot protect web standards through Firefox if their go-to solution is to remove support for standards that don't get the adoption they wish for in the timeframe they expect: nobody is going to adopt a standard if there is a credible threat that support for it may be senselessly removed in the near future.

The Do Not Track header has been deprecated, and has been largely useless because it was never adopted by most advertisers, using the cop-out of it not being legally binding. Despite this, Firefox (and most other browsers, with the only exception of Apple's Safari AFAIK) still support sending the header, despite it being arguably a waste of bandwidth and implementation resources (UI options to control its settings, JS access to it, etc). Why do they still do it?

Because it's part of their personality: even if just at face value, DNT header support is a signal that the browser cares about user privacy. (Don't get my started on the “new” Global Privacy Control standard when it would have sufficed to update the DNT spec in relation to the new legislation.) So while one could place reasonable confidence in Mozilla upholding past, current and future privacy-oriented standards, I don't feel the same concerning the open web.

I'm sure people have different ideas about what does it mean to support the open web. I think first and foremost it means allowing users (on both sides of the connection) to use the protocols and file formats of their choice. Every time a browser fails to implement (or worse decides to remove) support for a standard protocol or file format, it's failing the open web. Half-assing implementation of web standards was basically Microsoft's staple behavior during the first browser wars.

Microsoft had reasons for this: at first it was because they didn't “get” the Internet, later on it was because it's the only way they had to (attempt to) control it. They did all they could to cripple it. Remember when Opera Software released a “Bork” edition of their of the Opera browser in response to Microsoft serving them intentionally broken CSS? Now imagine what the Internet would have been like if Opera, Mozilla and few others hadn't held their ground.

If you think what Microsoft did was insane, consider this: Vivaldi had to change their user agent identification because Google, Facebook, Microsoft and even Netflix were intentionally breaking their websites when detecting the Vivaldi browser GAFAM are against the open web —and the worst in the bunch is Google, that also holds a dominant position with their browser both on the desktop and mobile space.

But the worst here isn't that Google is actively against the open web: it's that in contrast to the first browser wars, there is really nobody left to stand up to them. Consider for example Dave Winer's write-up on Google's effort to deprecate (non-secure) HTTP, and consider that Firefox, the only actual alternative, is also on Google's page, albeit less aggressively so.

Under the same pretense of security, support for classic (some would say obsolete) protocols such as FTP and Gopher has already been removed from all major browsers. In some browsers, such as Firefox, this has been an intentional choice. Others, like Vivaldi, have been basically forced into this position by their reliance on Google's engine.

And yes, I claim that security is just a pretense. Ad networks known to sell your data to the highest bidder and serve malware don't give a rat's ass about your security and privacy. The only thing they care about is making sure they are the one getting your data, and they are the one serving you the ad, even if it's malvertising.

(Firefox may not have such motives, but they definitely have an interest in reducing the code base, making maintenance easier for them. And as several have commented on Mastodon, they depend on Google for revenue, which makes them indirectly interested in toeing Google's line.)

When I first started putting down in writing the thoughts that would lead to this article, I didn't actually plan for it to turn so depressing. The original intent was quite the opposite: to celebrate the importance of even the smallest contributions in the resistance against apparently overwhelming odds, and even when the outcome is still not really the fair, open Internet one might have been fighting for. I could go with the Ursula K. Le Guin quote against capitalism's apparent inescapability now, but I think we can do better.

Someone may observe that protocols other than HTTP(S) are irrelevant in a discussion about the open web —which would be one of those pedantic, technically correct (the best kind of correct!) observations that completely misses the point. Yes, it's technically true that the World Wide Web is built on the HTTP protocol and the HTML and related file formats and specifications (such as CSS and JavaScript). But there is no open web without an open Internet.

And one of the keys to an open anything is ease of access. And sure enough, there are still plenty of dedicated tools to access specific parts of the Internet that are not the World Wide Web: clients for FTP, gopher, finger, USENET, email, IRC, or even for new hypertext navigation protocols like Gemini, exist. But why should I need a different client for each when I could access the whole Internet from a single client?

Why should I need to switch clients when following an FTP or Gemini URL in an HTTP-served HTML page, or conversely when following an HTTP link from a Gemtext page? Why shouldn't my Gemini client be able to render HTML pages delivered over the Gemini protocol, and my web browser able to render Gemtext natively if served over HTTP?

This is why the “Swiss army knife” browser model is essential to the open Internet, and a fortiori for the open Web.

Instead, we're seeing a growing, grotesque separation between a “lightweight” Internet and a “heavyweight” Internet where —ironically— the “lightweight” clients have support a wider range of protocols and metadata whereas “heavyweight” clients are gravitating towards being HTTP-only, and frequently eschewing useful metadata.

Why is it that a historical but up-to-date (latest version at the time of writing is from January 2023) textual client like Lynx can not only connect to FTP, Gopher and finger in addition to HTTP, but also presents the user with the next/prev and web feed links stored in the document head, while the most recent version of Firefox cannot do any of those things, and is likely destined to lose even more functionality in the future?

And no, the answer is not «ah, but Firefox has to dedicate much more resources to support the latest version of the massive, quickly-evolving HTML, CSS and JavaScript standards». The answer is not that because Firefox actually had support for those things and actually spent resources in removing them. And while for some of them (e.g. web feeds) an argument could be made that the implementation needed a rewrite, I doubt that's the case for the removed protocols.

This is frustratingly compounded in major browsers by a lack of extensibility: while it is generally possible to define external protocol handlers, it's not generally possible to write handlers that would just stream the content internally. Historical note: the much-maligned Internet Explorer actually supported something like that. Some Qt browsers (such as Konqueror and Falkon) can also be extended using the KDE Framework KIO plugins.

I still remember the days when Mozilla was the king of customization. It was them who introduced the extension concept to the browser, allowing all kinds of experimentation on the UX. Many of the features we expect in a modern browser today were first introduced through XPI extensions in the Mozilla Suite of lore and the first versions of Firefox. Now they play catch-up with whatever Chrome dictates web extensions are allowed to do, barely managing to avoid the worst

Again, the issue here isn't that Mozilla added support for Chrome-style web extensions to Firefox. It's that it did so removing support for “legacy” extensions. And while I'm sure there were good technical reasons why the existing implementation couldn't be kept and was holding back engine progress, like in the RSS/Live Bookmark case, I have my doubts that it could not be replaced with something more modern that still provided the same or —at worst— a similar interface.

Even assuming the new architecture is so wildly different from the previous one to make support legacy extension impossible, I find it extremely unlikely that it wouldn't be possible to design an extension interface that would allow pluggable protocol interfaces and image format support in modern browsers. Why do smaller niche browsers have better support for these things that the mainstream ones?

Why is it that Falkon and Konqueror can leverage KIO to provide generic protocol access, and the Otter browser can leverage the extensive Qt image format support when using the QtWebKit engine to support more exotic formats (or the new JPEG XL standard), but neither Chrome nor Firefox nor Vivaldi offer comparable extensibility?

I'm sure somebody will try to make a claim about “security”, but I very strongly doubt that's anywhere close to the actual reason.

You know what makes this whole thing even more horrifying? That all major browser vendors and the W3C have actually worked their assess off to provide something like what I'm talking about for open protocols and standard image formats —but they've done it in submission to the power wielded by the mafia-like content distribution oligopoly, on an extremely controversial “standard” (read about the EFF opinion in their resignation letter).

Let me rephrase that: the kind of extension system I'm proposing to allow browsers to support more (open) protocols and (standard) image formats isn't impossible: in fact, major browsers already have similar systems in place to allow “consumption” of content locked by Digital Restrictions Management —the antithesis of the open web. So don't come tell me there's security issues with allowing the extensibility I'm asking for: it can't be worse than the hole opened by closed source DRM modules.

A positive outlook?

In many ways, the years between 2013 and 2018 were the worst of the open web, with the reduction in browser engine variety (Presto, Trident, even EdgeHTML were all discarded in that timespan), Firefox giving up on legacy extensions and web feeds, and the W3C EME betrayal.

Can we make the years between 2023 and 2028 those of its revival? With the Fediverse taking shape, a return to prominence of the “indie” web, and the birth of new protocols like Gemini, the times seem ripe.

git git gadget command

How often did you start writing a git command and then looked something up and restarted?

git git commit -m "A beautiful commit message"

You know you've been there. Multiple times. You start writing a git command, switch to a different terminal or window to look something up (e.g. the exact syntax for some exoteric option combination), switch back to your command line and start typing your command from the beginning, including the initial git. And this gives you the rather depressing error:

git: 'git' is not a git command. See 'git --help'

This thing has always be so common that at some point in the past someone actually proposed to fix this internally in git, ignoring extra gits in the command line. The proposal was ultimately discarded (I don't remember the reasons, and I'm too lazy to browse the git mailing list to find the references, but I'm sure the actual reason was that it would have been too user friendly), so we're left with having to solve it ourselves, especially if we're particularly prone to it.

One of the nifty features of git is that if you have an executable git-whatever in your search path, git will happily allow you to use whatever as a git command, invoking the binary.

However, if you try to exploit this by making a git-git that is just git, for example with a symlink

ln -s $(command -v git) ~/bin/git-git

then it won't work, because:

fatal: cannot handle git as a builtin

You can work around this check by making a simple git-git shell script that does the work for you:

exec git "$@"

and the live happily ever after:

$ git git git help | head -n1
usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]

(of course, once you do it, it's recursive, just like the alias trick others have found before me).

(This post courtesy of my exchange with

Picking a toot

Adding “Share to Mastodon” links to the Wok via tootpick

In my ongoing efforts to integrated this site with the Fediverse, I was made aware of Tootpick, a nice single-file webpage designed to share links to Mastodon. The service take advantage of a specific Mastodon API to pre-fill a post, while still giving the user the opportunity to modify the text before submission or cancel the submission altogether.

The author even provides a website to make it easier for others to add a “Share to Mastodon” link. However, one of the great advantage of Tootpick is that it's trivial to self-host: the entire service is provided by a single HTML page that takes care of everything via JavaScript, so one just needs to drop this HTML page somewhere on their website and link to it from the “Share to Mastodon” links just as they would with the author-hosted service.

Indeed, adding this file to the Wok was the trivial part of the integration, but that's arguably because I had never had any other form of “Share to Anything” links before —due to a combination of non-existent or very limited usage of social networks on my side for a long time, and a general distrust for “non-local” solutions.

(I should mention here that I used to have a UserJS/GreaseMonkey script to find social network links to the Wok (post in Italian), although it has been non-functional now for a long time, by and large due to the progressive interoperability shutdown of major social networks.)

This meant that I had to start thinking about how to add these “Share” links to the Wok.

The “natural” solution would have been to add such links programmatically in the page templates. At the cost of a full rebuild of the whole website, this would have allowed such links to be assembled programmatically from the same metadata that was being used to build the pages, in an arguably natural fashion: at page build time, without any intervention from the server or from the client, at least until the “Share” link was clicked (Tootpick still requires JavaScript).

Since JavaScript is required for the “Share” link in question, however, I thought it would make sense to write a small bit of JavaScript to go over all the permalinks in the page, and add “Share” links to them. In theory, this could even have spared me a full site rebuild if not for the fact that IkiWiki does not have an option to include a JavaScript file from every page, so I had to customize the base page template and force a full rebuild anyway —several of them, in fact, as I turns out some of the metadata was not easily accessible (especially for nested permalinks in index pages).

I was ultimately successful (as you may notice if browsing the site with JavaScript enabled), and I'm quite satisfied by the results so far, even though I'll confess that the experience has made me consider again the possibility to switch to some other static site builder that might have made this easier to manage.

Continuous Content Generation

LLM/AI isn't for and won't be replacing art. It's a tool to satisfy the capitalist need for infinite growth via continuous content generation.

There's been a growing brouhaha about recent progress in “Artificial Intelligence” (AI) research with the publication of Large Language Models (LLM) that thanks to “deep learning” are finally capable of producing outputs that are very credible to the superficial and/or untrained eye.

OpenAI, ChatGPT, DALL-E, Stable Diffusion, Midjourney have all made the news rounds a few times thanks to their presumed ability to “interpret” textual input and turn them into satisfactory visual or textual rendering of the prompted requirements. Three main lines of criticism have been moved to these kind of efforts, focusing respectively on (1) the nature of what is actually being produced (especially in terms of whether or not this could finally be considered true AI or not), (2) the ethical underpinnings of the training data selection, both in terms of breadth-and-scope (e.g. concerning bias in race or gender) and in terms of copyright (did the authors authorize the use of their work to train these models), (3) the future and intended use of the models (e.g. will they replace art or artists).

I'm only going to touch on the last point here, because there's an aspect that I think has gone missing in all the discussions I've seen so far, including the jokes (but are they really) about machines supposedly being intended to replace the boring, physically and mentally destructive work and instead being advanced to eliminate the creative endeavours while humans are relegated to the work the machines were supposed to eliminate. And I will argue that this is not really the case, although the net result will be quite similar to what would have happened if it had been.

There is a trend that has been going on for decades (half a century at least, in fact) and has seen a sudden jump in the last 15 or 20 years: the replacement of art with content, and artists with creators. Others have warned about this before, in more details than I will do here, but the gist of the point is that the growth of the Internet, and of social media in particular, has exacerbated the trend initiated by the mass commodification of creative output to a critical point.

This is the result of two apparently aligned interests: that of artists to be able to make a living out of their creativity, and the capitalist obsession with infinite growth incarnated by the publishers and distributors (be they formally recognized as such, or be they such by practical definitions of the term).

Art depending on the rich and powerful to thrive isn't news: it's why so much past visual arts have a religious theme, and why such an inordinate amount of words have been written to celebrate this or that local lord. If anything, actually, now more than ever the trend has been subverted thanks to new, distributed forms of patronage (see e.g. Kelsey's and Schneider's Street Performer Protocol as discussed by Cory Doctorow) that allow artists greater control on their creative endeavours rescinding the dependency from the interests of a single supporter: commissions still exist and for many are still the primary if not only way to make money out of their capabilities, but there are some artists that can afford to follow the “self-realization” principle: write/paint/compose what you want, and let those interested in your creative output come to you, as opposed to writing/painting/composing what may give better “engagement” on the platform(s) of choice.

In fact, it can be argued that the point of divergence between the mentioned apparently aligned interests is indeed the definition of engagement. For the artist, engagement materializes in an audience that is interested in and appreciates their creative output: For the publishers and distributors, engagement materializes in a returning consumer.

Some may argue that those are similar, e.g. in expectations about the production of new content over time. But despite the superficial parallels between the two, there are some crucial differences, starting from the more direct, “personal” relationship between an artist and their audience (which is not all fun and games, as it gives way to stalking, a sense of entitlement, and the myriad of downsides that come with less sterile connections). And this is something for which no metric exists, because it depends on the intangible quality of the interactions with the audience. Worse “engagement” metric do not correlate to a worse or smaller audience.

Scale is also very different: a few hundred supporters paying a few € each monthly can be sufficient to sustain an artist's ordinary life (conditional to location at the very least, of course). This is one of the pillars for the success of the digital Street Performer Protocol mentioned above: as long as enough members of the audience support the artist, the artist can thrive and their art remain accessible to all.

Finally, at least for the purpose of this discussion, there is a difference in expectations: although I'm sure any artist would be thrilled to reach the level of success that would allow them to live comfortably for the rest of their lives, from what I can see most of them would be content with just being able to make a decent living, without having to worry about whether they'll be able to cover rent next month and not depending on their partner's income or financial support from relatives.

The situation is very different for a commercial enterprise (into publishing or distribution, given the context), doubly so for one that has already achieved a certain success, and even more if it's publicly traded: there is no true “connection” between them and their consumer, operating costs are high, and most importantly there is an expectation of growth, e.g. to “increase shareholder value”. This leads to a particularly pressing need to publish and distribute new content, at an increasing rate.

Even before the Internet went mainstream, we've seen this trend manifesting e.g. for movies and animation with the spread of home theater solutions: while 40 years ago some cinemas still offered older classics, a decade later you would have been hard-pressed to find anything but new releases outside of film society screenings, despite there not being a significant change in production until the beginning of the XXI century, when massively production-cost-lowering technology improvements and expansion to “emerging markets” led to an explosive growth.

One of the key ways in which the Internet and digital media distribution has revolutionized the field has been through a shift towards subscription models. Once the domain mostly of periodic journals and service providers, the subscription and streaming model pioneered by Netflix has become the method for larger media conglomerates to cope with the new technology, after fighting it for years and trying to get governments to regulate it in the name of lost profits (and getting a long way in with it).

Subscription models are very convenient for the company, as they guarantee a more stable revenue stream compared to one-shot purchases, and may result in higher profits at equal consumption rates. But while ongoing payments can be justified by periodic journals on the basis of quantifiable periodic updates and by the service providers with an uninterrupted service delivery and continuous infrastructure maintenance, they are more difficult to justify in the case of spotty updates (release of a new movie or music album) or consumption (watching the movie, listening to the album). There's both a practical and psychological reason for this: while a subscription to e.g. a journal gives you something tangible that remains with you even if you cancel the subscription service, the digital subscription services are more like an access fee to a library owned by somebody else, and regardless of how vast that collection is, if one ends up frequently perusing the same material, the natural observation is that it is ultimately cheaper to pay for it once and own it forever than to access it in streaming.

To make the subscription model enticing, the company would thus need to either select a clientele that has the curiosity to go through their entire catalog (which no company is going to aim for, because numbers), or keep up the attention of the “generic” consumer with a continuous stream of fresh, palatable content that keeps them distracted from reconsidering the benefits of one-shot purchases.

Recommendations from the existing catalog (especially if it's a large, well categorized one) can take the role only up to a certain point, as consumption levels out. Hence the need for continuous content generation. This is not new: cable TV started to push out reality shows for the same reason; TV series were born out of the need to keep housewives engaged to sell more advertisement; and we can go as far back as the feuilleton at least as the first examples of content serialized over long runs to keep people engaged, i.e. transfixed to the specific media channel. What had been changing in recent time is the scale, and the scope, of the phenomenon, with vicious cycles that benefit neither the company nor the consumer: as the consumer aims to maximize the utility of the subscription fee, the demand for fresh content grows, and as the demand for fresh content grows, the attention to the quality of the product diminishes, lowering longer-term engagement. This reflects not only in a massive increase in production, often of debatable quality, but reaches grotesque peaks where entire series are sacrificed after the first couple of seasons on the altar of immediate engagement growth, even when the higher quality was appreciated by a meaningful number of consumers.

We are now in an era dominated by Continuous Content Generation, where engagement is not the result of sustained quality, but of a continuous renewal that doesn't even let products reach maturity before they are swapped out for a more recent one, in a desperate search for instant freshness gratification. And this is the era where ‘content’ replaced ‘art’.

Now, there are practical reasons to use the terms content and creator rather than art and artist, not least the fact that not all content in these distribution channels is art: for example, journalism and essays may use similar or related tools and medium to those used for novels and poetry, but regardless of the aspirations and capabilities of the writer, they'll rarely be classified as art, regardless of how important their role is in keeping readers engaged. So using the ‘c’ words is a reasonable way to talk about all the material when going into specifics about what is art and what isn't is unnecessary.

However, the choice of terms is indicative of a diminishing interest in the content itself, i.e. it's not just a way to indicate all the available content, but most importantly it's an indicator that it doesn't matter what kind of content it is: obviously not everybody can hire a Jules Verne or Alexandre Dumas to write serialized novels and keep selling copies of a newspaper, especially since at scale you'd need one for each kind of audience among your consumers, but we're way past the point in which this is just a matter of quantity over quality: the scale is such that what causes engagement is completely irrelevant.

Hence the spread of clickbait, mainstream trolling, enRagement algorithms, and any other strategy that helps keeping people coming back for more.

It should now be clear where I'm going with this, given the premises: this same need for Continuous Content Generation, regardless of type, form, or actual content, is the most immediate practical target for the AI/LLM that are at the center of attention today.

In this sense, these models do not represent a threat to the independent artist (or other “content creator”) that has built or can build a following of its own. The models can be considered aimed primarily at replacing the armies of scriptwriters, copywriters, “bloggers” and whatnot that provide ‘content’ for the media industries. As the models grow more sophisticated, we can expect more of the output of these industries to be produced by or with the assistance of such models, with humans relegated towards the roles of prompters and selectors/verifiers. And if superficially this may seem like a good idea to some («oh good, less effort for me to write the articles I need to publish to get paid»), it should be obvious that this will entail not only a massive cut in the workforce, but potentially its almost complete elimination, when publishers realize that they can leverage their own consumers to fuel the machine (think about user comments as prompts for the next set of articles, or how many of you are freely offering tables of possible prompts that you can rest assured are being logged for future use —how about a model that writes prompts next, for example?).

It's also debatable whether or how much or how many consumers would actually care about or even realize that the content is being produced by LLMs, although it may be important in the beginning that the automaton intervention be as subtle as possible. If any of you were hoping that the recently presented tools that detect LLM outputs would just be used to denounce the number of AI-written articles already in circulation, think again: a more likely use will be for the tools to be sold to the publishers to help identify the machine-produce content that cannot be detected, and can thus be published with less danger of repercussion from readers that do care about where the article comes from, at least until the new choice of writing system gets normalized.

While this does eliminate some of the threats that AI/LLM pose to independent artists and other “content creators” (we really need a better word for this), its influence will still need to be considered. The most obvious effect is that by reducing the opportunity for relevant employment within the industry, it potentially increases the pool of artists that will have to make a living independently (if the wish to live off their art), and the competition may make it harder to build one's livelihood from it. More importantly, however, these models are (still) incapable of original content creation, which puts a limit to the variety of what they create. This may not be apparent now that its use is very limited, but with an adoption at scale for Continuous Content Generation these limits are likely to be hit sooner than later, even with judicious use of human-direct prompts and selection.

It therefore becomes essential for the models to be periodically injected with “noise” (new training data) to increase the variability of its creation (this is something that anyone experimenting with particularly unusual prompts has noticed already). Now while it's possible that this can lead to the displaced scriptwriters and copywriters to find employment specifically to produce such “noise”, what is most likely is that the work of independent artists that continue in their art unassisted by LLM will be unceremoniously hijacked as training data —and this is not a potential threat: this is something that has already happened and is still happening, as illustrated by efforts done by nearly all art hosting and online editing services (from Adobe to DeviantArt) to change their terms of service to include wording that allows them to feed the hosted content to such models, and making this opt-out (thus enabled by default) rather than opt-in. This is an area (outside of the scope of this article, as it falls within the second theme mentioned in the first paragraph) where legislation potentially could curb the phenomenon, but it's likely that lobbying by the media companies will render it ineffective at best, and counterproductive at worst, despite the clear preference from most artist that AI be kept away from their work.

Curiously, the reason why I think that AI/LLM do not pose an immediate, direct threat to the independent artists is not only that the lack of originality and variety in the output of the models makes it more valuable for the “reprocessed art” that is behind much of the “content production” of the media industry (think for example of the speed with which subgenres saturate in mass-produced animation and comics, and that's with human work), but also because the target audience for the two forms is wildly different, and the audience that seeks out the more original and varied “content” produced by independent artists is also more likely to value it being “artisanal”: so not only it is more difficult that AI/LLM would be able to produce the content this audience seeks, but that same audience would value it more when produced by an actual human rather than mechanically (or procedurally) in response to a writing prompt.

And again, while the higher appreciation for artisanal production poses a problem for physical production, it is considerably more sustainable in the digital space, by the higher efficacy of crowdfunding.

There is something ironic, I feel, in the “techbro” enthusiasm for AI/LLM travelling often in tandem with an unhealthy obsession for cryptocurrencies and the Non-Fungible Token (NFT) craze that goes with it. (One of) the purported intent(s) of NFTs (as advertised by said techbros) is to help artists “monetize” their art by artificially restricting purchases of what amount to “certificates of authenticity”. The reality (to no one's surprise, except the fools that bought into the scam) has been very different: most of the NFT-“certified” content that has floated around so far has been either procedurally generated, unoriginal, uninteresting variants of thematic images designed for unsubstantial products with the only aim of generating noise before the rug pull (Exist scam), or outright “stolen” infringing on the copyright, licensing, and/or moral rights of the original authors.

The irony here is that what makes art truly valuable isn't restriction on consumption (making it inaccessible); in fact, it could be argued (but I don't want to get into a philosophical discussion about what makes art art) that art achieves its peak value when it reaches the widest audience. (In this sense, the infinite reproducibility of digital works of art have the potential to be valued as nothing before them.) And this value stems from the uniqueness of the work of art at creation time. Each work of art is unique because it could only have been created by that particular artist in those particular circumstances: a different artist, or even the same artist in different circumstances, would have produced something different. (Heck, even in the same circumstances, just because of small differences in context: think for example about the English and German versions of the same movie by Hitchcock.)

The digital work of art is unique because the work of art itself is not the bits encoding its representation, in the same way in which the novel isn't the ink and paper with which it is transcribed: the art is the story, the images, the sounds that result from the decoding of that representation, and how they reflect on the aesthete's mind —and these are unique by creation.

I could go off on a tangent here on how art is the antithesis of capitalism, and the principle of substitution not applying to creative work is just part of it, but (back on topic) coming to terms with this is what shows the irony of the AI/LLM+NFT fandom: art is intrinsically not fungible, by virtue of creative originality at inception. LLM-generated content is the epitome of unoriginality, and the artificial restriction imposed by “minting” NFT for it does nothing to compensate for this.

So maybe instead of worrying about how these glorified procedural generation models may threaten the livelihood of artists, we should focus on rethinking the socioeconomic system in such a way that creativity may be valued for what it is, instead of the perverse system of disincentives that has been built around it to force it into the capitalist concept of “value” by (artificially created) scarcity of access.

(And how's «digital (street) performer» as a better alternative to «content creator»?)

Appendix: I'm going to collect here links to Mastodon threads that are relevant to the points discussed in this post:

Nuclear will not save us, part 4

No, not even nuclear fusion will do


Here we go again. Any time there is some kind of breakthrough in nuclear power generation, there are crowd cheers about our energy production problems being solved, or closed to being solved. And with every time this happens, I'm forced to remind people that no improvement in energy production will “solve” our energy issues, unless we tackle the growth in consumption. And since those improvements will more often than not lead to an increase in the growth rate of energy consumption, these breakthroughs —however locally promising they may be— will end up being deleterious on longer —but not even that much longer— time spans.


The news of the day is a breakthrough in nuclear fusion, with the National Ignition Facility at the Lawrence Livermore National Laboratory announcing fusion ignition, i.e. the ability to trigger a self-sustaining fusion process that produces more energy than it consumes.

This is wonderful news. It's a scientific and technological advancement that humans have been dreaming about for more than a century, and one that paves to way to a potentially cleaner and safer energy production mechanism than anything we've seen so far.

It's also still far from being anywhere close to actually be productive in that sense, as detailed by Michael Schirber in this article on APS (if not else because the amount of energy used to start the reaction is still orders of magnitude higher than the one released by the reaction).

Still it's an important result, and one that gives much better hope that the energy production based on nuclear fusion may actually be finally within reach, and that this may revolutionize energy production dramatically reducing its environmental impact, as well as its cost.


The dangers within

I've said it before, but I will say it again, because apparently this is a point where repetita iuvant: no form of power generation bound by the 90PJ/kg limit will suffice unless we curb the rate at which energy consumption grows. It doesn't matter if it's fission or fusion. It doesn't matter how efficient the energy production is. The only thing that matters is that exponential growth is faster than our perception. And as I mentioned in the first post of this series, the cheaper and the cleaner the energy production is, the higher the risk is that its adoption will lead to a faster growth rate in energy consumption.

In scarcity, people are frugal; in abundance, wasteful.

You don't need to teach someone who can barely make ends meet how to conserve food, water, heat or money. Yet a billionaire will not care about fuel costing 3 times as much as before when choosing to fly somewhere on their private jet.

There's a reason why energy conservation and efficiency have become such a hot topic in the last 50 years: the 1970s energy crisis (which also gave a strong push to the investment in nuclear (fission) energy production, until the Chernobyl disaster in 1986 cooled off much of the enthusiasm). Why does this matter? Because the discourse in the last years has largely shifted from efficiency and lower consumption (or at least lower growth in consumption) to a wider-reaching ecological discourse (“green” energy, low emissions, you name it).

There are different reasons for this change in topic, ranging from a genuine interest into the increasing threat of global warming and anthropic influence on climate change to the easier manipulation of the discourse into a business opportunity (“greenwashing” and friends). But the shift in attention is deleterious: not because reducing pollution is bad, but because pollution and the energy crisis are deeply interconnected by … energy consumption growth.

Let's pretend for a moment that tomorrow nuclear fusion became commercially viable, providing us with the cheapest, lowest-environmental-impact energy source we could have dream of. Let's pretend that with a single flip of a switch all energy consumption could be switched over to this cheap, low-impact source. This would absorb instantly a sizable fraction —I'd even go as far as say: the majority if not all— of the clamor about the environment, the carbon footprint, and all the “hot takes” that have replaced in the public discourse the only thing that really matters: energy consumption growth.

In such a scenario, it doesn't matter if a billionaire's private jet or yacht consumes in a year as much as half the population of the country they live in: there's so much cheap energy, and the higher consumption has so little impact that … who cares? Very few care now that it has an enormous impact on the environment and it risks depriving more important resources from having access to cheaper energy; how much would people care when it wouldn't?

In such a scenario, there's simply no interest in tackling the fundamental problem. In fact, quite the opposite, we can expect a radical upturning of the perspective from the general population: if energy is so easy to obtain and it has so little environmental impact, why would it even matter to keep its consumption in check, or to request that it be kept in check from government bodies? The most likely outcome of such a scenario is a sudden jump in energy consumption growth, as the limiting factors of cost and pollution prevention regulation are removed: from the rest of the world catching up to the “Western” standards of living more quickly to the “West” coming up with even higher-maintenance standards to compensate for the environmental damage of the last centuries (air conditioning everywhere, massive desalinization plants for fresh water, pervasive augmented reality, you name it, it'll be there).

And the net effect of this will be catastrophic, because no energy source can sustain exponential consumption growth for long, and by the time people realize that even the nuclear fusion fuel can run out it will be too late —again— and the collapse will be so much harder.

“The most abundant element in the universe”

Much of the enthusiasm behind nuclear fusion comes from fallacies similar to the ones we've already discussed in our previous chapter of this series: material abundance and constant consumption rates. And we've seen already that at the current growth rate the mass of our entire solar system won't last 30 centuries, regardless of energy production system (and assuming optimistically 100% efficient mass-to-energy conversion). And we've seen some interesting computations on the EET for fission fuel. So this time we'll play around with the numbers for fusion.

One of the biggest and most dishonest talking points of fusion fans is that since the fuel is hydrogen, «the most abundant element in the universe», it's virtually impossible to run out: for all intents and purposes, it will last “forever*”.
(*conditions may apply)

We already know that no matter how large the amount of fuel is, as long as it's finite and the consumption keeps growing exponentially (i.e. at a constant rate) it'll run out —and much earlier than predicted (how does 50 centuries sound for the entire mass of the galaxy converted to energy at 100% efficiency, again?), but where the talking point fails miserably is that while it's true that hydrogen is the most abundant element in the universe (estimated around 75% of matter is hydrogen), or even the solar system, it is not the most abundant element on this planet. (Why? Because most of the hydrogen in the universe is in the stars where it's already being used to run a nuclear fusion process!)

And the statistics are even worse if we look at “free hydrogen”, rather than hydrogen in other molecules (such as water or hydrocarbons). If we look at the abundance of hydrogen on Earth, it barely makes the top-10 in abundance for the crust, hidden in that 1.2% of “other trace elements”. In the atmosphere, it's even more rare: at ground level it's 0.6 parts per million (PDF warning), and we have to climb to the exosphere (starting approximately 700 km above sea level) to find it in more consistent concentrations … in a medium so rarefied it doesn't even behave like a gas anymore.

I'm sure you can see where this is going, and it shouldn't be a surprise, when we've already seen in the previous chapters how quickly we'd run out of fuel on Earth regardless of energy generation method. But wait, there's more!

Nuclear fusion doesn't use “classic” hydrogen (aka protium), actually: the most important elements for the fusion process are deuterium (around 2 in 10,000 hydrogen atoms) and tritium, the hydrogen isotopes with extra neutrons (1 and 2 respectively), and if the “successful” (for appropriate definitions thereof) experiment that renewed the enthusiasm in the fusion process recently is any indication, tritium is of particular importance (although in theory we could do without). And tritium is also the rarest of the isotopes: due to its 12-year half-life, it's barely found in nature (we're talking 1 in 1018 hydrogen atoms, at scale), and is more typically produced as a byproduct of other processes —such as nuclear fission, or other fusion processes.

I'll leave it as an exercise to the reader this time to estimate the mass of the available “free” molecular hydrogen and the corresponding EET for fusion power generation at current (or higher!) growth rates. After three chapters, and with the help of the form in the second chapter of the series, there shouldn't be any need for hand-guiding you through the process.

The not-so-clean energy source

Of course once we run out of molecular hydrogen —in contrast to other fuels for power generation— there are several orders of magnitude of fuel still available. The problem is that accessing this destroys the second myth peddled by nuclear fusion supporters: that fusion is the cleanest form of energy generation, even more so than fission, and without the risks deriving from radioactive waste typically associated with fission.

Leaving aside that any talk about the cleanliness and riskiness of the process (e.g. per unit of energy produced or per unit of power generation) is pure speculation, and will remain so until the first commercially viable fusion power generation plants are finally deployed and have proven themselves for a few decades at least, even from a purely theoretical standpoint the myth is on shaky ground. Indeed, the myth is tightly bound to the one about the abundance of hydrogen: assuming you have plenty of molecular hydrogen available, the fusion process is indeed one of the least impactful forms of energy generation. The question is: how do you get that hydrogen in the first place?

Fusion would be really clean if we could just have a passive collector of molecular hydrogen from the atmosphere, and extremely efficient ways to prepare it for the fusion process (e.g. deuterium extraction, tritium generation), but we have neither. And even if we did, where would we get the hydrogen from when we run out of the free molecular hydrogen in the air, something that is bound to happen sooner rather than later if we get seriously invested on fusion?

And this is where things become interesting: while there's definitely room to invest in the capturing of the hydrogen released e.g. by volcanic activity, the more readily accessible “stores” of hydrogen are water, carbohydrates and hydrocarbons. And the processes to extract hydrogen from these have two important downsides in the context of our discussion: they are either very energy intensive (which brings us back to the increased energy consumption, or in a restricted view to a lower power generation efficiency), or quite environmentally unfriendly (e.g. combustion, water depletion), when not both.

I can already see the objections about how these would still be less of an environmental problem than, say, the drilling and mining required for fossil and fission fuel, and while that may be the case now, I'd like to revisit this when the requirements for hydrogen extraction/production raise to the needs of our present and future power generation requirements.

Nuclear will not save us

I'll refrain from going on a tirade that would just repeat the conclusion of my previous chapter, but a recap is appropriate still.

The fuel employed, the process used to produce the energy don't matter. The single most important factor is how quickly energy consumption grows. And the cheaper the energy generation is, the more quickly its consumption will grow. If anything, a more efficient and cleaner energy production method is more likely to boost energy consumption, that will result in an even harder fall when the ceiling is hit.

Still, I'm glad for the progress in the research on nuclear fusion, and while I believe that the press release is laced with excess optimism, I'm looking forward to the time, a few more decades from now, when the technology will have progressed enough to turn fusion into a viable power source. Any option we have at our disposal to minimize the environmental impact of energy generation and optimize energy production with the means we have our disposal is more than welcome.

Of course, in those few decades our global power consumption will have doubled again (at least), unless the global economy suffers another major collapse. (It's fascinating, really: take any year-over-year global power consumption change and you can identify recessions simply by looking at when it energy consumption change dropped close to zero —or worse, went into the negatives.)

And if we don't fix that, nuclear will not save us.


About titleless entries (and other future changes) in the Wok

Having discovered the ActivityPub-based microblogging platform Mastodon and its feature to produce RSS, Dave Winer (one of the people that helped defined the RSS format) has set up to fight the lack of support for titleless feed entries in feed readers. The intent is commendable, as well as Dave's approach to titleless blog updates, but seeing his take on it has made me think again about my (current) approach to maintaining the Wok.

I've actually pondered several times in the past about this and related issues, wondering about the best approach to handle esp. collections, such as my quotes collection, my “lightning” aphorisms (the closest thing to an on-site microblog), my “upsetting” discoveries, etc, updates to which could be considered both from a “single item” perspective and from a container update perspective.

The problem for me isn't just one of titlelessness, though, especially since I actually generally prefer to have titles, doubly more so when I can think of interesting ones: my problem is actually that I'm growing tired of some of the limits of the platform I'm using, but at the same time have a distinct preference for its underlying architecture, which is arguably responsible for those same limitations.

I like that each post here is a text file. But this also means that it needs a filename, even if the post itself might not have a title. Of course, the post having a title makes it easier to choose a filename. I could go with titleless posts, but then I'd have to think of a way to name the files in a way that is unrelated to the title. (Not that dramatic: this cold just be the date-time).

There's more though: the post metadata (author, date) need to be entered manually. This is strictly speaking not necessary, especially since this is a single-author thing, so the author could be inferred, and the date could be taken by the file metadata itself —except for the whole revision-control tracking and pushing across different machines, that messes this up. The presence of in-file metadata isn't a bore as such, but having to enter it and keep it updated is, and while for a long-form post like this one it's not even that much of a bother, it is one of the obstacles to quick-posting one-liners or other small content.

The obvious way out would be to cook up some scripts to handle this. The obvious danger is that these scripts could easily grow to become an ad hoc, informally specified, bug-ridden, slow implementation of a blogging platform. And considering I have all intentions to move away from IkiWiki, would this even be worth the effort?

Of course, there's no guarantee that my “moving away from IkiWiki” is going to happen any time soon, so a quick & dirty patching up of the issue in question might have some value, even though we all know about the permanence of the temporary. Even worse, it's surely not going to be “quick” in any sense of the world (but plenty dirty) if the script will need to accommodate the many existing different collections that I've already started. It could work, OTOH, for at least the one of them, and potentially for a new one with a similar structure to be created ad hoc: because, and that's another thing, there's something to be said about microblogging, that can't be said for other form of long-form composition, and it's that its “immediacy” promotes usage.

From my experience on Twitter first and Mastodon later, I've noticed that the posting format encourages writing even for content longer than the character count limit for the single post usually imposed on these platforms: I've found myself writing long threads that could have just as well been long-form posts more often than not. This isn't just a matter of practicality due to the higher degree of automation, or the frequency with which one might find themselves on the website or “app”: there's something about the limit that tricks the mind, appealing to the possibility of jotting down “just a couple of words”, even when one ends up writing several thousands.

As far as I can see I'm not the only one feeling this way, although I wonder if others also perceive the tension between this and the consideration that microblogging is not designed for long form. I am a big fan of “using the right tool for the right job”, but on the other hand, a tool that invites you to write is better at its job than one that isn't as encouraging. Moreover, the chunked format of microblogging threads helps give a structure, a rhythm to the text that must be sought with purpose in standard blogging. And the resulting rhythm isn't just stylistic: it provides hooks to the reader for comments, quotes, etc, in a natural way for the platform itself.

Microblogging doesn't necessarily entail a lack of title, but it often is titleless, to the point that a title isn't even supported on some platforms. This contributes to the simplicity and immediacy of the posting format, but also reduces flexibility unless the platform does support the feature —or something that can be (ab)used to a similar effect, like Mastodon's Content Warnings (CWs). Usage of a platform always requires some adaptation to the characteristics of the platform: for example, Mastodon's lack of a “collapse thread” feature has led Cory Doctorow to use CWs on “child” posts in his famously long threads on Mastodon, and as he tells it when discussing Pluralistic, composing his daily thread starts on Twitter, because it's the “least forgiving” platform.

Now, I'm not anywhere near as prolific as Cory Doctorow, so I probably won't ever need the scripts have helped him lighten the manual load, and my blogging isn't professional enough to justify to myself the long-winded routine of multi-posting to separate platforms in addition to my self-hosted site (POSSE: Publish Own Site, Share Everywhere), but I am annoyed by my own over-reliance on Twitter previously and Mastodon to post content long enough that it would have deserved its own entry in the Wok, so I'm now left pondering on the strategy to adopt for the future (aside from backing up my off-site posts and importing them here —one of these days).

To my advantage I have not only the much lower production rate, but also the much smaller platform expanse: I only actually care about sharing my content on the Fediverse. This is something I achieve even now by sharing links to my articles on Mastodon, but I'm looking into better-integrated solutions, some way to support at least a minimal functional subset of ActivityPub that would allow others to follow my posts and maybe interact with them (favorite, boost, comment) directly rather than through my Mastodon account. Once this is achieved, the next step would be to aim towards a simplified way to microblog on the Wok, with a dedicated section and possibly some way to simplify the creation of posts (and chains thereof) on this future section. And yet, I'm not really looking forward to hacking my way through the IkiWiki codebase (again). Switching to a different platform might help in this sense, but in this case I'd also take the opportunity to also move to a different format (AsciiDoctor) for my source, but this is less supported by existing static site generators … and suddenly this all gets on the road to become a full-time unpaid web development job.

I might never get to the point of having the platform I really want in my hands, but maybe some interesting tech ideas may come from walking this path, even if only sporadically.


The road not taken towards energy independence


OK, I admit it, this isn't really about Solarpunk, at least not in the literary/artistic genre. And yet still it is, in some sense, since it is about the future that the genre envisions.

Since the beginning of the 2022 Russian invasion of Ukraine on February 24, a parallel economic conflict has been escalating between Russia on one side, and most European and NATO countries on the other. The “Western” side has imposed a number of sanctions, preventing circulation of most goods and people, and Russia has retaliated with the only weapon available to it, its control on the provision of natural gas to Europe. The last step of this conflict (at the moment of writing) is the “Western” side aiming to put a price cap on gas purchase, and Russia retaliating by shutting down delivery altogether.

This aspect of the conflict in particular has generated a lot of noise on social media, a significant portion of it quite obviously fed by Russian propaganda, revolving around the danger of the spike in energy prices and how its effect on the energy bill will negatively affect the “Western” economies, potentially even more than the Russian economy.

What is fascinating about this isn't so much the obvious propaganda trolling, but rather how the political discourse, both nationally and internationally, has been focused on “how to pull through the crisis” rather than on how to avoid the crisis altogether, accelerating on gaining independence from the Russian gas altogether, possibly without throwing our economies into the arms of the next authoritarian regime.

For obvious reasons, I'm talking here about plans that don't need long planning stages or lead-in times, but can still provide significant long term benefit. This excludes for example investment in nuclear energy, that in the best of cases take years to complete, with issues and delays extending this to decades, with enormous increases in costs if the experience of the Flamanville or Olkiluoto nuclear power plants has anything to teach us.

It's probably obvious from the choice of title for this post that the plans I'd rather see taken into considerations revolve around expanding renewable energy sources utilization, and chiefly solar among them.

Why solar

Why solar specifically, though?

There are several properties that make solar particularly palatable as the option to invest on in the short term. Let's see some of them.


Solar panel are ridiculously quickly to deploy. Actually setting them up takes only a few hours. Including the planning and acquisition stage one can expect to have an installation up and working in a couple of months, with delays taking up to six months: in this it's comparable to the typical deployment time for a wind farm, and orders of magnitude lower than the time needed e.g. for hydroelectric power.

Scalability and graduality

It's a very “local” power source, that “anybody” can set up on the roofs of buildings and other coverable land (thing e.g. about parking lots). It also scales well and gradually, allowing larger installations to start reaping benefits before the whole system is up, with incremental expansion.


Note that this isn't about it being more climate friendly, but rather more climate oriented, i.e. better suited for the direction climate is changing. 2022 has been an exceptionally hot and dry year in many parts of the world (although it's likely places like Pakistan, Afghanistan, and other parts of South Asia, might disagree on the specifics).

The decade of droughts that have hit the northern emisphere from Europe to China and the Americas, are affecting not only agriculture (as it has been in the previous years too), but also hydroelectric, nuclear, and even coal power generation.

With the trend showing no clear sign of reversing, solar and wind promise thus to be the most “climate-oriented”, green energy sources, i.e. the ones least likely to suffer from strong setbacks in the future.

It's not perfect

Yeah, it's obviously not a perfect solution. It won't allow Europe to achieve total independence from Russian gas (or from fossil fuels) in the next 12 months. It may require significant imports of rare earths for the battery systems that help prop up the periodic discontinuity of solar. A price spike might also be expected if expansion is concentrated in the upcoming few months before winter.

Yet none of these objections, alone or together with the others, are meaningful reasons not to invest in solar (or wind) right now, because we don't need (neither should be strive for) a “perfect” solution, we simply need to get started (the earlier, the better) on a solution that can be improved and expanded in time and that can give the first results with a small turnaround. And solar is just the right thing for this.

Getting a head start

It's fascinating, really, how this has been handled across Europe. The EU is taking initiative, the Baltic countries are setting up to expand offshore wind energy production, Portugal (already getting over 50% of its energy from renewable sources, plans to further expand both solar and wind.

What I found a bit depressing is that Italy seems to be lagging behind in these projects. The last significant boost in growth of solar installations was 10 years ago, and even the proposed tax breaks in 2020 for a number of residential energy improvements don't seem to have pushed growth much.

With an incoming national election and worries about the spike of energy price, the fast route to energy independence (and solar as the means to it) should be at the center of the political discoures. And while it's completely unsurprising that right-wing parties would be more open to sucking up to Russia again, it's more troubling that the rest of the spectrum doesn't seem to even think about it, focused either on long-term project of dubious utility (nuclear power plants) or on how the “common man” may help reduce consumption e.g. by turning off and disconnecting appliances or reducing heating.

Why isn't there a “solar panels on every rooftop” plan? Why doesn't every school, office building, factory, warehouse, mall, start investing now in the installation of solar panels on their buildings, or covering their parking spaces?

And yes, I'm well aware that even starting now the benefits won't be reaped before next year, since November to January are the least useful for solar energy production, but paraphrasing a saying dear to fans of nuclear power:

the best time to install solar panels was 6 months ago, the next best time is right now

EVs are still worth it

Why a transition to Electric Vehicles is worth it even with energy production backed by fossil fuels


In the wake of the EU Parliament's controversial decision to ban sales of combustion-engine vehicles by 2035, the harshest criticism essentially revolves around the purported “idiocy” of adopting electric engines in vehicles when most electricity is still produced from fossil fuel (often supported by the memetic news of diesel generators being used to charge fully-electrict Tesla cars).

While there is little doubt that such a setup is, shall we say, “less green” than it would be if electric vehicles were charged with ”green” energy (i.e. energy generated by renewable and/or less polluting sources such as wind, solar, water, or even nuclear), objecting to a wider adoption of EVs on that basis1 not only completely misses the point (a “green” transition can be gradual, it doesn't have to be all-or-none), but it's particularly stupid in the sense of the perfect being enemy of the good.

The fact that a diesel-charged fully-electric vehicle is still “greener” than a combustion-engine equivalent has been remarked by many when discussing Tesla's “loss of face”, but I don't have any particular appreciation for the articles I've found on a quick search on topic, so I've decided to present my own take on the subject. Note that this take only mostly looks at the “finished products”, so for a wider discussion on the topic of the “greenery” of vehicles (electric or not) including manufacturing and whatnot you'll have to look elsewhere. What you'll find here is some a few key advantages of the fossil-fuel-backed EV transition that I deem often overlooked.

Pollution delocalization

This particular advantage is in fact the first I thought of, the one that triggered my desire to write this post, and even if it was the only one (it is not), it would be —for me— sufficient.

Replacing all internal-combustion-engine vehicles (ICVs henceforth) with full-electric vehicles (EVs henceforth) displaces the pollution source. ICVs usage is concentrated in the same areas where people live, with each of them being a moving point-source of pollution right under our nose (and eyes, and skin, etc). Switching to EV would concentrate and move all those individual point-sources into a pollution sources located elsewhere, most typically farther from densely inhabited places.

Pollution from ICVs has well-known effects on health problems ranging from birth defects to premature death. While delocalizing the pollution source doesn't eliminate the effect altogether, it does improve things, and the farther the power plants are, the better.

In addition to displacing (if not reducing, see below) good ol' air pollution, replacing ICVs with EVs also massively decreases noise pollution. In fact, EVs are so much quieter that there are worries about the safety implications of the lack of noise.


This may come as a bit of a surprise, but EVs can actually be more efficient than ICVs in energy usage from the same amount of fuel. This depends on multiple factors ranging from the vehicle use-case to the power plant generation.

Current vehicle combustion engines have peak efficiencies ranging from 35% (gasoline) to 45% (diesel). Of course, in practical usage this only happens under ideal conditions (full load during acceleration with the engine at peak efficiency RPMs, which is usually between 2K and 3K RPMs) and one of the big ironies of ICVs is that maximum fuel economy is instead achieved at cruising speeds at the highest possible gear with the lowest possible RPMs (typically around 1K) which is actually not very efficient, in terms of fuel energy extraction. Combined with the more or less frequent (and immensely wasteful) stop-and-go (most typical in urban usage, yet less uncommon than most people would believe in extra-urban and highway usages), the effective efficiency of most ICVs is between 12% and 30%, with worst cases dropping as low as 6%, and best cases at around 37%.

{ Verify if drivetrain losses are accounted for in these figures. }

{ Add considerations on the cost of refining crude oil into vehicle fuel. }

Electric vehicles are more efficient in using the energy from their batteries (at least 60% considering all losses) and much less affected by idling or other low-efficiency usages. Of course, this has to be compounded with the efficiency of power plant energy production and grid distribution losses. The latter amount to between 10% and 30% (depending on distance, quality of the grid and a number of other factors), leading to an overall efficiency between 40% and 55% from power generation to motion.

The crux is, unsurprisingly, at the power generation step. Even though energy efficiency of nearly 60% are possible with modern tech, most power plants do not reach such levels (typical efficiency is 35% for coal, 38% for oil, 45% for natural gas, with the most efficient ones reaching resp. 42%, 45%, 52%), resulting in an effective efficiency for EVs (from fuel to wheel) between 14% and 29%.

So even without additional considerations emerges that switching to EVs from ICVs would typically result in comparable efficiency in fuel usage. However, the argument doesn't end here: there's more to consider both on the EV side and on the power generation side.

One of the significant advantage of EVs over ICVs is regenerative braking, i.e. the capability to recover some of the energy spent to put (and keep) the vehicle in motion when braking. Although similar systems (particularly KERS, Kinetic Energy Recovery Systems) have been explored for ICVs (particularly in racing cars), they have not seen any meaningful adoption for civilian transport, in contrast to the widespread use of regenerative braking in electric and hybrid vehicles. Taking brake energy recovery into account, the efficiency of EVs rises to the 75%-90% range, for an effective efficiency (fuel to wheel, including power generation and grid losses) between 18% and 42%.

The other aspect to consider is that while most of the inefficiency of ICVs goes into wasted heat, power plants can co-generate electric power and heat, with overall efficiencies as high as 88%.

Although this doesn't improve the fuel-to-wheel efficiency of EVs per se, it does improve the overall fuel consumption efficiency, thus reducing waste.

Efficiency improvements

Technology improves (in fact, it's interesting to note that technological progress seems to have had a higher impact on the efficiency of ICVs than in the efficiency of EVs, although this is largely due to the fact that the EV efficiency is already much higher, and that considerably less R&D has gone into improving EVs until recently).

One question that is interesting to pose is: how long does it take for a technological improvement to have an actual measurable effect (e.g. leading to lower pollution or higher efficiency).

In a largely saturated market like that of civilian vehicles, even if all new cars were to adopt the better technology, the replacement of the existing cars with the new ones would take decades if not for government incentives to switch to lower-emission vehicles.

With the highest sources of inefficiencies for EVs being located outside of the vehicles themselves (distribution grid, and most importantly power plants), many technological improvements would lead to indirect benefits to the effective efficiency of EVs without any intervention on the user side.

This doesn't hold true for all improvements (e.g. a better battery technology leading to higher density and thus lighter batteries for the same capacity would still require physical maintenance on the vehicles, although still less problematic than buying a whole new car as needed for most technological progress on ICVs), but e.g. a 5% reduction in power grid losses or a 5% improvements in efficiency for power plants would automatically lead to the corresponding gains in the overall effective efficiency (fuel to wheel) for all EVs recharging on said power grid.

Smoother transition

The previous point naturally segues into the final and (for some perspective) most important point: an early transition from ICVs to EVs will lead to smoother transition to other power sources.

Road transportation accounts for nearly 50% of oil consumption in the EU and constitutes the main emitting source for a number of pollutants responsible for the low air quality (and related health issues) in urban areas. Even if there was a full switch to “green energy” generation today, the around 250M vehicles currently in circulation in Europe would remain responsible for this massive consumption of oil and the associated pollution and health issues.

Even though, as discussed above, an accelerated transition from ICVs to EVs would neither eliminate our dependency on oil nor reduce the associated pollution (although it would reduce the health problems associated with the emitting sources being concentrated in highly populated areas), it would make the transition away from fossil fuel more effective, any subsequent increase in the percentage of energy produced from “green” sources would automatically (albeit indirectly) make road transportation more “green”.

Addendum: could it happen or not?

With the sales of electric vehicles doubling worldwide every 2 years or less (average growth rate 50% or more), one might even wonder if the EU initiative is even needed: if the trend continues and the percentage of EV car sales (to total car sales) were to keep doubling every 2 or 3 years, starting at an 8% of car sales being for electric vehicles in 2021, we would approach 100% of car sales worldwide being EVs in 10 to 15 years (thus with the target of the EU parliament proposal). The growing prices of car fuel (largely a consequence of the 2022 Russian invasion of Ukraine) is also likely to support such a trend.

(It should be noted however that these statistics include, so-called plug-in hybrid vehicles, that have both an internal combustion engine and a battery-backed electric motor. These are the ones that have seen the fastest growth in the most recent years, yet they would be affected by the ban proposed by the EU Parliament.)

What is missing on the other side is the infrastructure to support such a transition: charging stations are still few and far apart, and mostly concentrated in the higher-density, higher-traffic regions. Massive infrastructural upgrades are needed to support the target of the EV transition, and not just in terms of power distribution: large increases in the number of circulating EVs will also require an adequate growth in power generation. And between the looming energy crisis, the impact of climate change on “green” energy production, and the long times, increasing costs and general resistance to nuclear, that is something that might not ramp fast enough in the envisioned time frame.

It makes one wonder if investment in infrastructure (power generation, better grid, more charging stations), a differentiation of power sources, and support for local power production (“solar panels everywhere!”) to bring down electricity costs would be a more effective (albeit indirect) strategy to incentivize adoption of EVs. One thing for sure is that these things have to happen anyway for the “only EVs after 2035” to be sustainable.

Post Scriptum: a bet that I'm sure to win

Assuming the proposal (or some equivalent initiative to accelerate the adoption of EVs) passes, you can bet that 50 yeas from now, when we will be enjoying the benefits of the widespread use of EVs over ICVs, libertoloids (libertarians, ancap et similia) that are now so vocally against the EU plan will boast how the free market led to the resulting quality of life improvements, conveniently forgetting about the massive impact that regulations and incentives have had in directing such market.

How can I be so sure? Because that's exactly what they are doing about the improved energy consumption and reduced pollution that were driven by large scale government initiatives, particularly from the 1970s onwards.

Claiming “engineers did that, not government regulations” is a platitude inasfar as it's true, and is otherwise false. Yes, engineers were essential to achieve the technological progress that improved energy consumption and reduced pollution, but the main incentive to move in that direction came from the government regulation. We'd still be dying by smog in large numbers if private entrepreneurship profit had remained the driving motive for technological progress.

Until and unless the markets finally manage to incorporate the true cost of large scale externalities such as the environmental damage, it will never be able to lead to such improvements in quality of life. It's not by chance that the industrial revolution actually led to a decrease in life expectancy, in highly industrialized cities, with respiratory issues becoming the dominant cause of death (PDF warning).

(And yes, a similar discussion holds for epidemics, but that would be way off topic.)

  1. of course there are other objections, such as “this shouldn't be forced by law, but a decision made by the market”, which I'm not discussing here, and not just because “the market” not accounting for externalities —which are key in this discussion— makes such an objection irrelevant. ↩

Testing Mastodon

On Twitter, Mastodon, self-hosting and the migration between social networks

The self-hosted pipe dream

My ultimate aim, for my online presence, would be to be completely self-reliant. The aim is, objectivele, a bit of a pipe dream, since I'm well aware of the gigantic efforts that it would entail to actually reach a point of total independence, (and for starters, even for something as essential as email I am pretty sure that I will never make it) but I like to get my gains wherever I can.

The most obvious example (you're reading it now) is the choice to abandon external blogging platforms in favour of this self-hosted wok. For other protocols (such as IRC), “self-hosting” doesn't really make sense, but controlling your client allows you to control your experience and most importantly your backups (I have all the logs of all my IRC conversations, without having to ask them to anybody else). For some services this is not possible, although sometimes there are “bridges” that allow some degree of client control: for example, I use Bitlbee to connect to some of my instant-messaging accounts through an IRC-like interface, although the reliability of these “bridges” is severely limited by an explicit intent from their operators (e.g. Google) to limit interoperability.

In some cases, I have simply scaled down my presence, sometimes helped in that by the demise or downfall of the corresponding platform: this is for example the case for the now-long-gone FriendFeed (my idea of what could have been “social networking done right”, even if still on a proprietary, centralized platform), or Tumblr, which I only sporadically visit (most of the content being luckily accessible from other platforms or “followable” through other means).

Up until recently, the only remaining “significant” online point-of-presence for me has been the microblogging platform Twitter. I must say that even my on-platform presence has been sporadic for a long time, but had recently gained some weight, and each of my post there has been made with a “second thought” about the loss of control over my content. Much of it may be recoverable through the Twitter data export feature, but it's still a non-trivial process, and the fundamental implication about the loss of control remains even when workarounds are found.

(By the way, even for Twitter it's possible to set up a Bitlbee bridge, although given the extensive use of graphical elements the experience is far from being as smooth as with other services.)

The social network escape

I do not have the time or inclination to discuss the dangers of the centralization of social networks (especially when others have written more and better than me on the topic, and I may even link some of the relevant content from e.g. the EFF or Cory Doctorow here, after I find the time to collect it), but the work to create open and distributed alternatives has been going on for over a decade now, driven largely by the interest of individuals and groups worried about the implications of proprietary control of online spaces.

The most famous example at the time was probably Diaspora*, born as an alternative to FaceBook in 2010, that even reached a certain prominence in the news during a bout of «delete Facebook», but was hardly the first or the most successful (for example, the microblogging platform, alternative to Twitter, had already been active for a couple of years).

The coordination of efforts from separate groups, each dedicated to a specific aspect of the “modern”, “social” web (blogging and microblogging, aggregation, music and video streaming and sharing, discussion fora, etc) has led over the years to the creation of what is now known as the Fediverse, a “universe” built on the “federation” of individual entities. The development of common protocols (most notably the now recommended ActivityPub) and the growing maturity of the developed software has finally reached the point of (at least technical) feasibility for an alternative to centralized social networks, altough the question remains about the possibility for them to become a viable, and widely adopted, alternative to the centralized platforms.

Enter Mastodon

The recent bid by Elon Musk to purchase Twitter and make it private (“to restore freedom of speech”, but I will discuss the idiocy of the claim and of those who actually believe it in a different time and place) has brought back into the news the alternatives to it, and in particular the currently most popular Mastodon.

As with all Fediverse components, Mastodon is not a hosted platform in se (in the sense of a centralized website to which users register), but a software stack that provides a platform. Each installation of the software is an instance, and there are therefore multiple websites to which one can register to have a Mastodon account, similarly to how people can get an email address from different providers (their ISP, Google, HotMail, etc). And just like with email, Mastodon users can communicate with each other, follow each other's updates, etc through the common protocol, regardless of the instance they are registered with.

Not being a centralized service puts a barrier to entry on Mastodon, especially for people used now to decades of centralization: making a Mastodon account requires an active and conscious choice about where (which instance) to create it, with the associated burden of understanding the difference between instances, the fact that they each have their own terms and conditions (and possible additional restrictions on who may or may not register with them), and so on and so forth. There are “general” instances, both global and language-specific, that may be considered the go-to fallbacks, and are probably a safer bet (in term of reliability and permanance) compared to smaller instances: these provide, for the technically uninclined, the closest thing to an “optimal” situation outside of fully centralized solutions, similar to the larger email providers nowadays used by most people (the classic HotMail, Google's gmail, etc), with the benefits (and downsides) of the decentralized, federated model.

I personally don't think that the decentralized model poses a particular obstacle to adoption (despite the slightly higher barrier to entry), and I will discuss elsewhere the details on what may make (or fail) the future of the Fediverse, aside obviously from the FUD propaganda fueled by centralized services that are threatened by this model (hint: it involves the participation in the Fediverse of some high-profile accounts, possibly on their own instance in an official form: think for example of institutional accounts from the US or EU being on their respective or instances, in contrast to e.g. the unofficial mirrors from the proprietary platform that can be found through the instance).

(Edit: I just found out that the EU actually has an official Mastodon instance. That's actually pretty good news.)

In fact, from my “self-host-all-the-things” perspective, a much larger problem with Mastodon is that it's non-trivial to set up a personal instance: while it is possible, Mastodon is a bit infamous for being a massive resource hog with complex setup necessary even in the “reduced” use case of a self-hosted personal instance, to the point that it's frequent to find recommendations to try alternative microblogging platforms that still integrate in the Fediverse, most typically Pleroma.

As a result, I won't be able to consider myself commpletely self-reliant on the “social network” side of things yet, even while moving away from the centralized platforms.

Testing Mastodon

So yeah, I've taken the opportunity to set up a Mastodon account on a general instance. I'm not particularly worried about the future of Twitter with Musk at the helm (in fact, I doubt anything would change, and it even looks like the deal, that was assumed done, might actually fall through), but like in other circumstances, I've grabbed at the chance to stop putting off exploring alternatives to Twitter when the circumstances presented themselves.

I don't consider my current Mastodon account to be “definitive”, even though it will likely last for several years, as I don't see myself switching to a self-hosted instance anytime soon, although that would be the ultimate goal. In the mean time, I've made an effort to set up as much as possible in a way that would allow me to interact with the platform “on my own terms”: this includes setting up a Bitlbee bridge to be able to follow and interact with my timeline from an IRC-like interface (I must say that I'm not particularly impressed by its stability yet), and the adoption of a practical command-line utility that I've used extensively to search and follow various accounts from a variety of instances. Periodic, automatic backups of my stream are something that I intend to explore soon.

The process of migration from the proprietary platform to Mastodon will take some time, a transition period with permanence on both, and will probably result in a long tail (most likely reduced to the few really interesting “big name” accounts that won't switch or clone their presence across networks). Luckily, a lot of effort has gone already in the community to help in this regard: I've discovered a few services that make it easier to bridge Mastodon and Twitter, including a “wrapper” for Twitter accounts that are presented as if members of a Mastodon instance, and a service that should make it easier to manage accounts across different platforms (I haven't tested it yet, but it should come in useful while transitioning from the proprietary to the open platform). There are also ways to improve one's visibility across instances, such as this Italian bot designed specifically for this purpose.

I have not even made my first Mastodon “top level” toot (post) yet (sharing this article will probably be the first one), but I've already had the opportunity to interact with some users (unsurprisingly, mostly about the nature of Mastodon itself, as the influx of new people exposes doubts and perplexities about its accessibility, long-term viability, and the potential of the social platform on its own merits rather than just to host people running away from the proprietary platform for whatever reason or exploring the fad of the moment). The experience has been rather smooth, although I've noticed some minor issues already (most notably, the fact that you cannot set the toot language when posting from the web interface, the less aggressive/expansive behavior for embedded links esp. other toots —which may or may not be an issue, depending on the use case— and the fact that a fixed column size is used in the “advanced” web interface).

It will be interesting to see how the currently much lower traffic on Mastodon develops in the following days and months, if the migration pressure keeps up or dwindles (by general drop in interest or because the Musk acquisition falls through, if it does). It will also be interesting to see how well the platform holds up as the influx of new users puts strain on the software and hardware instances holding the network together (already the general instances maintanied by the Mastodon developers themselves have shown signs of “cracking under pressure”).

And who knows, this all might lead to better, more lightweight software and possibly more interest in making self-hosting more approachable.

Nuclear will not save us, part 3

Shorter-term considerations on the Exponential Expiration Time (EET) of nuclear power


If you believe that nuclear power is the solution to the energetical (and possibly environmental) issues of the more modern developed nations, the question you should ask yourself is: for how long still?

A few weeks ago I've started a series discussing the “expiration time” for nuclear power under the assumption of a constant growth in energy consumption (with a rate between 2% and 3% per year). The results were not very encouraging for the long term: even at the lower growth rate of 2%, the energy requirements would grow so much that even the entire mass of the Milky Way, converted entirely into energy according to the famous E=mc2 equation (100% efficient nuclear energy extraction, giving us around 90PJ/kg), would suffice us for less than 5 millennia: not even the time that would be needed to move from one end to the other of said galaxy.

Such is the power of the exponential function.

While the post was not intended as a prediction, but mostly just as a “cautionary tale” about the need to reduce the speed at which energy consumption is growing, it has been criticized for the timescales it considers —timescales ranging from several centuries to millennia, timescales in which “anything may happen”.

I have tried to address the main objections in the follow-up post, discussing primarily two points: one being the choice of 90PJ/kg as upper bound of energy production, and the other being the assumption of energy consumption growing at the current rate for the foreseeable future, and most likely even beyond.

Despite the validity of these longer-term considerations (again: not predictions, just considerations), I don't doubt that many (if not most) people would find it useless to reason over such time spans, refusing to take them into account for shorter-term decision-making.

In this third installment of the series, we're thus going to focus on a much shorter term (say, within the century or so), within which it's much harder to deny a continuing growth in energy consumption, at the current rate (which we will optimistically round down to 2% per year), and it's plausible that nuclear energy extraction will continue within the current order of magnitude of efficiency, or only slightly more (say, no more than 10-1 PJ/kg from the current approximately 1.210-3 PJ/kg).

Some preliminary numbers

If you've gone over the first two installments of the series, you may have noticed that the summary table in part 2 has an exceptionally low number in the upper-left corner: where all other scenarios offer an EET of two centuries or more, the lowest scenario gives us only 14 years. Surely that's too low? How is that possible?

The EET of 14 years is indeed too low. It corresponds to the EET under the following assumptions:

  1. constant growth rate of 2% per year;
  2. current tech level, extracting around 1.210-3 PJ per kg of uranium;
  3. 8109 kg of available uranium (the amount estimated to be in current known conventional reserves);
  4. the worldwide total primary energy consumption (6105 PJ/year) is entirely satisfied from nuclear.

The first two assumptions are entirely reasonable within the timespan of less than two decades, the third assumption is possibly even too generous (it assumes that within these two decades, we'd be able to even just extract all of the uranium from the estimated known conventional reserves).

The last assumption, on the other hand, is completely unrealistic: nuclear power generation today barely covers a fraction of worldwide total primary energy consumption. Existing civilian power plants produce less than 2600TWh (or less than 104 PJ) of electricity per year (and the amount is going to decrease, if the current initiatives to transition away from nuclear are any indication of the near future).

(That being said, that uranium wouldn't actually last long at full usage shouldn't even be that big of a piece of news for anyone following the field: even back in 2008 there was awareness about how long uranium would last if production was to ramp up, given the discovered deposits. In fact, we actually get more leeway in our estimates because we're using the much larger amount of estimated known conventional reserves.)

But let's try to get a bit more realistic.

Ramping up nuclear

The first exercise is to see what would happen if we ramped up nuclear power (instead of transitioning away), to try and cover a larger slice of the total primary energy consumption, at the current tech level (1.210-3 PJ/kg).

For simplicity, let's round things a little bit. Assume we currently produce 104 PJ/year from nuclear (while this is rounded up, in our calculation the final differences is of at best a couple of years over a whole century), and that the readily available uranium from known conventional reserves is 1010 kg (this is a bit on the generous side, but it's one way account for the discovery of some more uranium deposits).

We have two questions: how long will it take to cover the current global primary energy consumption (6105 PJ/year) and how quickly will we run out of uranium. In particularly, we'd like to at least get to satisfy the current primary energy requirements before running out of uranium

The answers to these questions obviously depend on how fast we can ramp up energy production from nuclear power: the faster we ramp up production, the quicker we match primary energy needs, but at the same time, the faster we ramp up production, the quicker we run out of uranium.

(You can follow the exercise by plugging in the relevant numbers in the form found after the table in part 2 of this series, just consider ‘production’ instead of ‘consumption’ in the first and last field.)

It's interesting to see that with anything less than a 4% growth rate for nuclear power generation, we won't even get to produce one whole year's worth of the current primary energy requirement before running out of uranium: at 4%, we would run out of fuel after slightly less than a century, while producing barely more than 5105 PJ/year.

Anything less than a 4% growth rate (18 years doubling time) would allow uranium to last for over a century, but without covering the current worldwide primary energy consumption. Ramping up at a 5% rate (more specifically, around 4.82%, 15 years doubling time) would allow us to match the current worldwide primary energy consumption just as we run out of easily accessible uranium, 85 years down the line.

To get some meaningful (multi-year) amount of coverage, we would have to ramp up production even faster, but this would shorten the time of availability of the fuel: for example, at a 7% growth rate (doubling time: 10 years, still realistic considering the time it takes to actually build or expand nuclear power stations) the known uranium reserves would have an EET of only 64 years.

Actually, if the ramping up limit was the current total primary energy consumption, uranium would last a little bit longer: the EET production rate would be 8.8105 PJ/year, which is higher than the current consumption. This would buy us a few years if we stopped ramping up as soon as we reached parity, pushing the EET to around 70 years (not much, but still something).

Playing catchup

On the other hand, assuming that the global primary energy consumption remains constant in the next century is quite a stretch: we can expect it to keep growing at the current rate of at least 2% per year for the foreseeable future.

Given the ramping-up timeline, this would give us at least another doubling, potentially even two: this means that even getting at 6105 PJ/year would cover at best only half of the future primary energy needs. We should strive for more. And yet, even a 7% ramp-up rate wouldn't manage to cover a single doubling (1.2106 PJ/year target) before running out of uranium.

We would need at least a 10% ramp-up rate (doubling time: 7 years, which is about the quickest we can do to bring new reactors online) since that would push production to 1.22106 PJ/year —just as uranium runs out, 48 years from now.

We could do “better” of course: knowing in advance the number of reactors needed to match the future energy request, we could build all of them at the same time. But that would only get us much closer to the dreaded 14-years EET for conventional uranium reserves (a quick estimate gives us around 30 years at best).

Ultimately, the conclusion remains the same: at the current technological level, and with the current estimates on the quantity of uranium available in conventional resources, we wouldn't be able to cover more than a few decades of global energy requirements at best, even with conservative estimates on how quickly the latter will grow.

Breeder reactors and the myth of the “renewable” nuclear power

Given that the short expiration time of uranium at current tech level even just to satisfy the current global energy requirements (let alone its increase over the next decades) has been known for decades, one may wonder where the myth of nuclear power as “renewable” comes from.

We can find the answer in a 1983 paper by Bernard L. Cohen published on the American Journal of Physics, vol 51 and titled “Breeder reactors: A renewable energy source”. The abstract reads:

Based on a cost analysis of uranium extracted from seawater, it is concluded that the world’s energy requirements for the next 5 billion years can be met by breeder reactors with no price increase due to fuel costs.

Hence, nuclear power is considered “renewable” in the sense of being able to last as much as other energy sources traditionally considered renewables (such as wind and solar), whose expiration time is essentially given by the time needed for the Sun to run out. (I think that's an acceptable interpretation of the term, so I'm not going to contest that.)

Cohen's work starts from the well-known (even at the time!) short expiration time for traditional nuclear reactors, and shows how moving to breeder reactors would allow unconventional sources of uranium (particularly, as mentioned in the abstract, uranium extracted from seawater) to become cheap (in the economic sense) enough to be feasible without a significant increase in the price of generated electricity.

The combination of 100× more effective energy production, and the much higher amount of fuel, lead him to calculate the 5 billion years expiration time —assuming a constant rate of production equal to the total primary energy consumption in 1983.

It should be clear now why Cohen's number don't match up with our initial analysis: uranium would only last long enough to be considered “renewable” at constant production rates, not at ever-increasing rates. In fact, if you want to know the exponential expiration time for seawater uranium in breeder reactors, you just have to look at the second row, second column of the famous table: if energy consumption keeps growing as it is, all the uranium in the sea water fed to breeder reactors wouldn't last us 500 years.

Of course we don't know how accurate of a forecast my “doubling every 30 years” assumption is for future energy consumption (although it's much less far-fetched than some may think) but at the very least we know that Cohen's assumption of constancy was wrong, since consumption has already doubled once since, and it shows no sign of stopping growing anytime soon.

In fact, as I mentioned in the first post, the biggest risk for nuclear comes specifically from the perception of its “renewability”. In some sense, we can expect this to be the opposite of a self-fulfilling prophecy: the appearance of nearly infinite, cheap energy, combined with our inability to understand the exponential function, will more likely encourage an increase in energy consumption, as wasteful behavior devalues in face of the perceived enormity of available energy, ultimately leading to such a steep growth in energy consumption that the source would be consumed in an extremely short time.

By contrast, higher friction against the adoption of nuclear, combined with the much lower energy cap of all other sources, is likely to drive more efforts into efficiency and energy consumption minimization, thus slowing down the growth of energy consumption, and potentially allowing future nuclear power use to last much longer (even though, most likely, still considerably less than the billions of years prospected by Cohen).

What does it really mean for an energy source to be renewable?

The truth is that, in face of ever-expanding energy requirements, no energy source can be considered truly renewable: the only difference is whether the production of energy from it can keep up with the requirements, or not.

Traditional renewables (wind, solar, hydro, wave, geothermal) can last “forever” (or at least until the Sun dies out) simply because we cannot extract them faster than they regenerate: as such, they won't “die out” (until the Sun does), but at the same time we'll reach a point (and I posit that most likely we're already there) where even if we were able to extract every millijoule as it gets generated, it still wouldn't be enough to match the requirements.

With non-renewables, the energy is all there from the beginning, just waiting for us to extract it. This means that (provided sufficient technological progress) we can extract it at a nearly arbitrary rate, thus keeping up with the growing requirements, but at the cost of exhausting the resource at a disastrous pace.

The importance of reducing energy consumption growth (and thus to avoid the energy Malthusian trap) is thus dual: maximize the usefulness of traditional renewable sources on one hand, and maximize the duration of non-renewable sources on the other. And yet, it would take extremely low growth factors for non-renewable sources to get anywhere close to billions of years in EET.

As an example, consider the case of Cohen's setup (breeder reactors, seawater uranium) in a slightly different scenario. Assume for example that energy consumption continues to grow at the current pace for slightly more than a century (due to ongoing population growth and developing countries lifting their standards of living), leading to three more doublings, arriving short of 5106 PJ/year. Assume also that only at this point humanity switched to breeding reactors fueled by seawater uranium, covering with it the total primary energy requirements, and that from this moment onwards energy consumption kept growing at a lower pace. Depending on how low the new pace is, the EET for the seawater uranium in breeding reactors grows proportionally larger:

(per year)
doublings within the EET
(no change)
(no change1)
0.00048828125%92549< 1

It should be clear that even at very small energy consumption growth factors (the smallest presented factor corresponds to a doubling over more than 140K years) it's simply impossible to have non-renewable resources last billions of years, although some may consider anything over 10K years to be “acceptable”, or at least “not our problem anymore”.

(Side note: even with a 100% conversion of mass to energy, i.e. 90 PJ/kg, the lowest growth rate considered won't give us billions of years: all the seawater uranium would last barely more than a million years, and all of the uranium and thorium estimated to be in the crust would last less than 4 million years, and our entire galaxy 15 million years; to get to a billion years for the Milky Way, growth would have to be lower than 10-5% per year, at 90 PJ/kg.)

Does it make sense to make decisions based on something so far into the future?

While it's true that we can't make predictions that far into the future (especially not in the millenia or hundreds thereof that might be provided by the very low growth case), it's true that at the very least we should avoid closing the paths to that future altogether.

Put in another way, we may not be able to look that far, but we are able to determine if we'll get there at all, possibly without passing through a societal collapse.

A quote frequently attributed to Albert Einstein recites something to the tune of:

I do not know with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.

Regardless of how accurate the quote (and its attribution) is, the sentiment is clear: the enormous destructive power (offered by the atom bomb or whatever even worse weapon comes after it) would be enough to throw civilization back to the Stone Age level.

A similar argument can be made here for energy consumption: we don't know when we'll overtop even the most effective form of energy production, but we do know that when that happens it will inevitably lead to civilization collapse —and it will be sudden, despite being perfectly predictable (or I wouldn't be writing this now).

With the famous bacteria in a bottle example, Bartlett highlights, among other things, how even just a few doublings before expiration most people wouldn't realize how close the expiration was, due to the vastness of the available resources and the lack of awareness on how quickly such vastness is consumed by the exponential function, and how even the successful efforts of farsighted individuals to expand the availability would only buy marginal amounts of time before the collapse.

In this perspective, it's never too early to act in the direction of reducing the exponential growth, and in fact, it's actually more likely to be almost always too late. And even if it wasn't too late already, even with the best intentions, there is actually very little, if anything at all, achievable at the individual level that would actually put a dent in the exponential. Frustrating as it may be, even a collective “doing our best” to avoid being wasteful hardly scratches the energy consumption (10 less watts per 24 hours per day per person in the world is still less than 0.5% of the total energy consumption).

The fundamental issue is much more profound, and much more systematic. And the first step in the right direction is to raise awareness about the true nature of the issue: there's a much more urgent problem to address than how to produce energy, and it's how to reduce consumption —and not by the crumbs for which the end user is responsible, but for the entire chain of production, from raw material extraction down to the point of sale.

As I mentioned, this might be the only upside of the transition away from nuclear, and similar “Green New Deal” fairy tale initiatives: promoting consumption reduction by energy starvation —although one would wish there were better ways. And worse, it really won't be enough anyway, as long as it's set in the same system for which growth is such an essential component.

We need a completely new direction.

A final(?) remark

I'm hardly the first to make such considerations, and I will surely not be the last. Aside from Bartlett whose famous talk on “Arithmetic, Population and Energy” that consciously or not inspired my initial curiosity into looking at the exponential expiration time for nuclear power, others have now and again discussed the finite limits we're set to meet sooner rather than later in our path of growth, including ones I haven't discussed here, such as the waste heat disposal issue.

And yet, awareness of the issue and of its importance is slow in the uptake. It could easily propagate exponentially, and yet (for once that the exponential could work in our favour!) it seems to encounter such large resistance that it barely trickles out with linear growth, and a slow one at that.

Where does this resistance come from? With all the campaigning on climate and “going green” and sustainability, one would expect this crucial side of the issue to be heard more. The numbers and the math behind it aren't even that hard to grasp. So why?

A possible explanation could be that the timeline is still too long to be able to catch people's attention: we can't get people truly involved with the climatological catastrophes we are bound to experience in the next decades, why would they worry about energy suddenly running out three centuries from now?

But I think there's something more to it. And yes, a sizable part of it is the pathetic realization that climate, sustainability and “going green” can be varnish to commercial exploitation (greenwashing, as they call it); full-chain consumption curbing, on the other hand, cannot, as it's the antithesis of what commercial exploitation thrives on.

But beyond that, there's most likely the realization that we're already at the point where any serious effort at sustainability with current standards of living would be in vain without a drastic reduction not so much of the consumption, but rather of the consumers.

  1. note that this is in addition to the time necessary to get from 6105 to 5106 PJ/year; the difference between starting the consumption at the 6105 PJ/year level versus starting at the 5106 PJ/year level is marginal. ↩

Nuclear will not save us, part 2

Are we sure energy consumption will keep growing at the current rate? A follow-up on why we need to curb energy consumption growth or it will be curbed for us.


A couple of weeks ago I wrote an article illustrating why even the transition to nuclear energy will not be able to keep up with our energy consumption if such consumption keeps growing at the rate it has been growing since at least the industrial revolution.

I've recently had the opportunity to debate the contents of the article, so I feel that it might be appropriate to clarify some aspects, and delve into some points with more details.

But first of all, a recap.

In the previous article, I make some back-of-the-envelope estimates about for how long it would be possible to keep growing our energy consumptions at an exponential rate (doubling approximately every 30±5 years, i.e. with a rate between 2% and 3% per year) under several (generous) assumptions on our energy productions capabilities.

Exponential Expiration Time (EET) scenarios for nuclear energy assuming a constant growth in energy consumption with rate k=2% per year, starting from the current r0=6105 PJ/yr, for different efficiency levels E (in percent) of mass-to-energy conversion (theoretical maximum: P=90 PJ/kg), and different amounts of available mass M. The EET is computed in years using Bartlett's formula (1k)ln(kRr0+1) and the total amount of energy that can be produced is computed as R=EMP.
Mass (kg)
8109 51012 71012 1.21013 1.21017 51017 71017 31022 91022 71023 31027 21030 41042

Note: the table currently needs JavaScript enabled, because I'm too lazy to copy the data by hand for each of cell. On the other hand, this means that with the same code you can play with the numbers yourself, by changing the numbers in the following form.

(The defaults are for the current tech level and growth rate, using the total mass of the current known uranium reserves and the current nuclear energy production instead of the entire primary global energy consumption, so the EET refers e.g. to a condition in which the fraction of total energy covered by nuclear remains constant, while the total energy consumption grows at the current rate.)

Point #1: it's not a prediction

The point of the article was not to make a forecast about what will happen.

The only point of the article was to show the upper bounds of the exponential expiration time (EET) of energy sources if energy consumption keeps growing at the rate it's growing.

Specifically because it was an estimation of the upper bound, the longest-term predictions were done under ideal, and absolutely unrealistic, conditions, such as in particular the possibility to convert matter (any matter) to energy with 100% efficiency, which —by our current understanding of the physical world— cannot produce more than 90PJ of energy per kg of mass.

Now, this is obviously unrealistic: even with nuclear (the most efficient form of energy production we know now) we only get five orders of magnitude less than 90PJ/kg. But it's intentionally unrealistic, leaving plenty of room to scientific and technological progress to catch up.

Objection #0: you can't tell that it's an upper bound

It's obviously quite possible that in the future (even in the near future) we might be able to find a more efficient form of energy production compared to the best we can do now.

However, the possibility of such an energy source being practically exploitable to produce significantly (i.e. orders of magnitude) more than 90PJ/kg is extremely slim. What it would require is:

  1. a scientific discovery that invalidates the famous E=mc2 formula, showing a way to produce orders of magnitude more than 90PJ of energy per kg of mass or equivalent;
  2. technological progress to make such scientific discovery exploitable to produce energy with sufficient efficiency that the amount of energy produced is still orders of magnitude more than 90PJ/kg (or equivalent);
  3. that such scientific discovery and technological progress happen before we hit the EET of our current energy production methods.

Now, I'm not saying that this is impossible, but the chances of this happening are so low that I can quite safely claim that the estimations to the EET computed with 90PJ/kg are, indeed, the upper bounds to the EET assuming energy consumption keeps growing at the current rate.

That being said …

So, again, the point of the article was not to try and predict the future, but only to see for how long still we can keep growing at the rate we're growing.

In fact, if a point had to be taken from the article, I would say that the main point should be the final suggestion: that it's better to invest in reducing the growth rate of energy consumption than it is to invest in improving energy production efficiency.

But let's move on to the more solid objections.

Objection #1: I'm ignoring the benefits of technological progress

One objection I've read is that my calculations don't take into account the benefits in terms of efficiency (both in energy production and in energy consumption) that will come from technological progress.

For energy production, this is actually mostly false: as I mentioned in the previous point, my estimations of the EET are done in such favorable conditions that I leave room for several orders of magnitude of improvements in energy production efficiency (at least up to the quite realistic but ideal limit of 90PJ/kg). Of course, it's not completely impossible that we won't find (before the expiration date!) a means of energy production that allows us to extract, in practice, more than 90PJ/kg. But unless such a very hypothetical method, beyond even our current scientific comprehension, allows to produce several orders of magnitude more than 90PJ/kg, this part of the objection is completely moot. In fact, even with several orders of magnitude more it would be a very weak objection, since each order of magnitude increase in efficiency only buys us around 3 doublings, which at the current rate means around a century.

For energy consumption, the objection is true, in the sense that I do not discuss the possibility for technological progress to improve the efficiency of our energy consumption, i.e. the possibility to waste less of the produced energy, or to do more work with the same amount of energy.

This is true, but again it's intentional, since how the energy consumed is being spent is irrelevant to my point. The only thing that matters is how much, and how quickly this grows.

Now, for the “how much”, the efficiency of the consumption is completely irrelevant. Where it can become relevant is on the growth of the consumption itself. However, finding a more efficient way to use energy doesn't necessarily mean that less energy will be used (in fact, historically this is mostly false).

That being said, even if improvements in efficiency of consumption did lead globally to a decrease in energy consumption growth, it wouldn't invalidate my point. As an objection, this would make sense if my post was an attempt at making a prediction of what would happen. But it's not, so this is not really an objection.

Au contraire, given that —if a point has to be made— the point would actually be that we should concentrate our efforts on reducing energy consumption growth, encouraging such technological progress (and such application of it) is actually exactly what my post aims for, by providing the estimated EET for our civilization if we don't go in that direction.

That being said, I can't say I'm particularly optimistic of this actually happening any time soon: when humanity finds a way to use energy more efficiently, this doesn't usually turn into “doing the same work with less”, but it tends to become instead a “let's do even more work with the same amounts of energy”.

In fact, even when at the individual level this may lead to lower consumptions, this decrease is not reflected globally; on the contrary, the higher efficiency leads to more widespread adoption of the technology, leading to an overall higher consumption: which is exactly why, despite the massive increase in efficiency since the beginning of the industrial era, energy consumption is still growing at a more-or-less constant rate.

Objection #2: to grow exponentially for that long, we would have taken to the stars

This was the first objection that tried to take issue with the continuing exponential growth. It was an interesting one, but still rather weak. Moreover, albeit in a bit underhanded way, I had already addressed it in the post, pointing out that the entire (estimated) mass of the Milky Way will last less than 5 thousand years if energy consumption keeps growing at this rate.

For comparison, the radius of the Milky Way is estimated to be between 80 thousand and 100 thousand light years: we wold run out of energy long before even being able to visit our galaxy without FTL.

With FTL? Possibly we could visit our galaxy then, but who knows how much energy is consumed by that.

Objection #3: you can't make predictions that far into the future

(“That far” being either the millenia for the consumption of our solar system and beyond, or even just the few hundred years before we run out of fissile material to fuel nuclear reactors thousands of times more efficient than the ones we own now.)

This objection comes in at least two variants.

One is essentially on the theme of the already-addressed objections #0 or #1 above, the other comes as a variation on the theme that the exponential growth assumption is invalid.

In either case, it's obviously true that I can't make predictions that far into the future. But then again, it's also true that I'm not making predictions, I'm just calculating the EET under the assumption of constant growth.

Of course, if the exponential growth assumption is invalid, then the EET doesn't hold —but that's not because I can't make predictions into the future, it's because the exponential growth assumption is invalid.

And that's actually OK, because the whole point, as I mentioned, is that we should slow down the growth to either get out of the exponential growth altogether, or at least lower the grow rate to something that will allow growth for a much, much longer period.

So let's get to the final objection:

Objection #4: we will not grow exponentially for long anyway

On one hand, I could dismiss this with a simple “duh”, since the whole point of the previous post is that if we don't do it by our own choice, it will happen anyway, catastrophically, when we get so close to the EET that it will be apparent to all we won't be able to keep going —except that it will then be too late to slow down without a civilization collapse.

It's interesting however to see the forms that this objection can take. Aside from #2 above, and the masked #3, there's a couple of interesting variants of this that deserve a mention.

Objection #4a: the magnitude of the consumption after a few more doubling is inconceivable

While the wording wasn't exactly that, the basic idea is that if we keep doubling for centuries still, the order of magnitude of the consumption would be so high that we can't even imagine what all that energy would be used for.

And while it's true that we would be hard-pressed to imagine energy consumptions that large, it's not really much of an objection, since this has always been the case. Would anyone have imagined, even just 20 or 30 years ago, that we'd end up air-conditioning the desert?

Ironically, this objection was raised by the same individual that objected to the 90PJ/kg upper limit: so you can imagine us finding a way to produce more energy than that, but not us consuming several orders of magnitude more energy than now?

Honestly, I have fewer problems imagining the latter than the former: flying cars anyone? teletransportation? robots for everything?

Objection #4b: population and consumptions will stabilize in time

This is an interesting objection, because in the long term it's quite likely to be true. I will call this the “logistic” objection, because the fundamental idea is that population and consumption follow a logistic function, which is essentially the only way to avoid the Malthusian trap of overpopulation (more on this later).

Now, let's accept for the moment that this is indeed mostly likely to be true in the long term. The big question is: how long of a term, and how fast will it stabilize?

There are two primary contributions to the global energy consumption: per-capita consumption, and world population. For the global energy consumption to stabilize, we thus need (1) the world population to stabilize and (2) the per-capita consumption to stabilize.

Both of these things are actually strongly correlated to the quality of life and standards of living, and so far they have exhibited a distinct tendency to “flatten out” while improving: more developed and wealthier nations have both a more stable population (sometimes even exhibiting negative growth, if not for immigration) and a reduced (or, again, slightly negative in some cases) growth in energy consumption per capita (although different countries have settled at different rates). Developing nations, on the other hand, have an energy consumption growth that is much higher than the world average: China and India, for example, that together account for nearly half the world population, both have a primary energy consumption growth rate that is around 5% per year (doubling time: 14 years).

Note that in both my previous and this posts the only real underlying assumption is that we don't want to reduce our standards of living nor quality of life. It's clear that without this assumption the exponential growth hypothesis doesn't hold, since it's quite simple to reduce energy consumptions simply by stopping using energy —and thus renounce all of the things in our life that depend on it. (This is also evident when looking at the global work energy consumption over time, and how it “dips” after each recession.)

Let's take the USA today as reference for “stable” energy consumption per capita, which is about 80MWh or slightly less than 300GJ per person. (By the way, did you know that the USA is not the worst offender in terms of energy use per capita? small nations such as Iceland and Qatar have much higher per-person energy use, currently closer to 200MWh per person, or 720GJ per person; even Norway sits slightly higher than the USA, at over 90MWh per person.)

We can expect global energy consumption to keep growing at least until the whole world reaches a similar per-capita consumption, and considering that the world average per-capita consumption is 20MWh per person, growing at a rate of slightly less than 1% annually on average (doubling time: over 70 years), this will take a century and a half if things keep going at the current rate. In fact, it will take at least 70 years even just to get to, say, German levels (around 40MWh per person per year).

If energy consumption per capita stabilizes, global energy consumption will only grow with population: after the ~2.1% growth rate peak reached in the '60s of the XX century, population growth rate has been on a stable decline, and is currently slightly over 1% per year, projected to drop below 1% halfway through this century —thus earlier, in fact, than the doubling time of the per-capita energy consumption.

With these two pieces of information, we can thus say that —unless something goes catastrophically wrong— the global energy consumption will keep at the current rate at least until the end of the XXI century. What will happen after that? According to those raising the objection, the flattening out of the population growth will only require the maintenance of the standards of living, which will require a constant (if not decreasing thanks to technological progress) amount of energy per year.

But is this actually the case?

In the following sections I will discuss two possible counter-points to the “logistic” objection, at the end of which I will drive the following conclusion: the most likely alternative to exponential growth is not stabilization, but societal collapse, i.e. a profound crisis that will lead to a drastic decrease in quality of life and standards of living for the majority of world population.

Counter-objection #1: there's no guarantee that the population will stabilize

Let's briefly recap what the Malthusian trap is. The basic idea is that, in a case where resources (e.g. food) are abundant, population grows exponentially. However, if the resources do not grow at the same rate as the population, we soon reach a point where they are not abundant anymore: there are less resources than the population would require, and this leads to the population collapsing (this is the “trap”), until it again drops below the level of scarcity, and the cycle begins again.

This kind of phenomenon has in fact been historically observed, both locally and globally. However, this seems to have stopped happening since the industrial revolution: since the XIX century in particular, population worldwide has instead grown at an ever-increasing pace up to the second half of the 1960s, peaking at around 2.1% per year. The growth rate has since been decreasing, dropping today to about half the peak rate, but still keeping a positive (larger than 1%, in fact) rate.

The observed trend is quite different from what could be expected by Malthus' model. The chief explanation for this has been the accelerating pace of technological progress, that has allowed the avoidance of the Malthusian trap by changing the ways resources are consumed (improving the efficiency of their consumption, accelerating the shift from one source to another as the previous one became scarcer, etc).

Avoiding the Malthusian trap has allowed a different mechanism to take over: the demographic transition from a child-mortality growth limit to an old-age growth limit. In this model, the plateau in population growth depends essentially on the improvement of living conditions that lead to lower child mortality, and a subsequent (and consequent) lowering of fertility (as a larger percentage of children reach adult age). As long as technological progress maintains the resource/population ratio high enough to avoid the Malthusian trap, this demographic transition shifts the age distribution up, as humans lives approach their maximum natural extent and fewer children are born.

This plateau actually contributes to avoiding the Malthusian trap by keeping the population size below the threshold of resource exhaustion.

There's more to it, though.

Looking at the timeframe of the rapid growth in world population, it's interesting to see how the time span of growing growth rate matches pretty well with the period of more revolutionary scientific and technological breakthroughs.

It's possibly a sad state of affairs that since the end of the Cold War technological progress, despite advancing at an incredible pace, has not given us any world-changing breakthroughs: most of the tech we use today is more a refinement of tech that emerged between the interwar period and the end of the Cold War than something completely new and original. (Sad state of affairs because it would hint that wars, be them hot or cold, are a stronger promoter of technological progress than peace.)

In some sense, we've reached a plateau not only in population growth (in the more developed nations), but also in the —allow me the expression— “originality” of technological progress.

Now the question is: when the next significant breakthrough happens, will it come alone, or will it be associated with a renewed increase in population growth rates?

One would be led to answer negatively, since we're already reaching the maximum natural extent of human life, but it's actually quite plausible that we can expect another spike. Some possible scenarios as examples:

  1. improved medical knowledge allowing significant age extension, upping e.g. the average age of death by about 50% compared to now; this would lead to another (although smaller) demographic transition to reach the new plateau associated with the longer life expectancy;
  2. colonization of the currently uninhabited (or very sparsely inhabited) areas on the planet surface, including both deserts and oceans: again a new spike in population growth;
  3. space travel and the colonization of the inner planets (Mars and Venus at least) would lead to a massive spike in population growth (not world population only anymore, but global humanity population growth, of course), something that will go on for several centuries more.

These are just examples, of course, but each and all of them are quite plausible. And together with many others, possibly unthinkable at the moment, they are a hint that we are only one technological breakthrough away from the delay of the population stabilization than we can forecast at the moment.

And with it, of course, the associated growth in energy consumption.

Counter-objection #2: stable population and quality of life does not imply stable energy consumption

While it is true that most modern, industrial, “Western” societies have reached a largely stable population, quality of life and energy consumption, I posit that the stabilization of the energy consumption is not, in fact, due to the stabilization of the population and their standards of living. In fact, I will further argue that a stable population at our standards of living cannot be maintained without growing energy consumption. Allow me to justify the latter first, and then explain why we have the perception of a locally stable energy consumption where population and standards of living have reached our levels.

As I've already mentioned, the accelerating pace of technological progress has allowed us to avoid the Malthusian trap (so far): humanity has been able to circumvent the resource/population ratio inversion by improving resource utilization and regeneration at a faster pace than population growth. However, the cost of these improvements has always been paid in terms of energy consumption.

Increased crop yields rely on synthetic fertilizers, whose generation is more energy-intensive than natural ones, and on agricultural machinery, whose construction and use is more energy-intensive than traditional human- or animal-based alternatives. Modern distribution networks are likewise more energy-consuming to build, maintain and use than footwork or animal transportation. For raw materials, especially those that are essentially non-renewable, the trap has been avoided by shifting consumption, as they became scarcer, from the “lower-hanging fruits” to materials that are harder to find, extract, create or manipulate, and that would therefore be prohibitive at lower levels of efficiency or energy production.

It's interesting to show the last part (material source transitions) from an example that will probably soon apply to energy production itself: as shown before, the EET of the current estimated uranium in known conventional sources (8 million metric tons) is only 14 years (assuming constant energy consumption growth of 2% per year, and nuclear alone being used for energy production). This means that soon uranium extraction from unconventional sources (especially the sea) will become not only convenient, but in fact the only possible way to keep maintaining our energy requirements —but extracting uranium from the sea is much more expensive, energy-wise, than the conventional methods.

In essence, what the industrial revolution has allowed has been to shift the entire burden of resource management into one single resource (category): energy. This, by the way, is why energy is the only resource I've discussed in the previous post: its EET is the only one that really matters, since expiration of any other resource can be compensated by increasing energy consumption.

For example, it has been said that “water is the oil of the 21st century”: this maxim is intended to mean that (clean, drinkable) water will become so scarce in the near future that it's likely to become as pricey and crucial as oil (as primary energy source) was in the XX century. Water, after all, is an essential resource for human survival and well-being both as a primary resource (drinking) and as secondary resource (e.g. farming), and with its usage growing at an exponential rate (doubling time: around 20 years), some scientists are worried that we'll soon hit its EET.

I'm actually not worried of that happening before we hit the energy EET, because with water like with any other resource we can (and in fact I predict we will) be able to expand our (clean, drinkable) water reserves trading out more energy consumption to reduce water consumption, improve filtering and develop better ways to extract useful water from the sea or the atmosphere.

In other words, as long as we can keep producing energy, humanity is largely unaffected by the Malthusian trap of other resources (or, in yet other words, the only resource that would trigger the Malthusian trap now is energy —and it will happen, as we've discussed in the previous post in this series).

The problem with that is: by avoiding the Malthusian trap, even if population stops increasing, we're already past the Malthusian trap point, meaning we're already consuming resources faster than they can regenerate: and this means that even if population stops growing, we will soon run out of the resources we're using, and we'll need to move to other, more “energetically expensive” resources to replace them. A similar argument holds for the environment: we have triggered a vicious cycle where our standards of living destroys the environment at a rate faster than it can regenerate, and this leads to higher energy consumption to preserve inhabited areas at levels which are more comfortable for humans (open air conditioning in the desert is only the prelude), which in turns accelerates the destruction of the environment, requiring a growing energy consumption to compensate: the “best” recipe for exponential growth.

That being said, it's quite possible (but see below) that the growth rate of energy consumption then (after the world population settles in size and standards of living) will be lower than the one we are experiencing now that the population is growing, and that's a good thing. But the key point is that our current standard of living still requires exponential growth in energy consumption just to be maintained at the present level.

Why then, one may ask, we are not seeing such growth in energy consumption in nations where the population and living standards have largely stabilized?

The answer to this is that what we are observing is a local aspect of a non-local phenomenon: a large part of the energy consumption needed to maintain our standards of living has been externalized, by outsourcing much (if not most) of the manufacturing process and resource extraction to the developing nations.

In other words, the energy consumption growth rate observed in developing nations accounts not only for the growth in size and standards of living of their population, but also for the maintenance of ours —hence energy consumption growth rates of 5% or higher in the face of population growth rates of 3% or lower.

In this situation it's obviously hard to isolate the component in energy consumption growth related to internal factors from the ones related to the burden of the maintenance of “stabilized” nations, but as the developing countries approach our levels of stability and quality of life, and the outsourcing possibilities diminish, we are likely to see a new redistribution (and relocalization) of the energy consumptions that will help characterize the factors. My “gut feeling” (correlating the energy consumption and population growth) is that the baseline (“maintenance-only”) energy consumption growth will remain around 2% (or marginally lower, but most likely not lower than 1%), but we'll have to wait and see.

And the conclusions?

Even though the estimation of the energy EET was not intended to be a prediction of how things will turn out, it's quite plausible that the current growth rate in energy consumption will continue long enough to get us there, unless either active action is taken to focus research on reducing the energy consumption (growth) needed to maintain our current standards of living or we end up hitting some other snag (before the energy EET) that leads to societal/civilization collapse, with the consequent drastic reduction in energy consumption.

And nuclear still won't save us.

Nuclear will not save us

Back-of-the-envelope calculations to why even nuclear won't save us, without curbing energy consumption


Humanity has been looking for alternatives to fossil fuels for over a century, but the problem has started to become more pressing since the 1960s, when people started to reflect on the fact that the resources would sooner or later be exhausted, it was reinforced during the 1970s energy crisis and has been moved to the foreground of both energy and climate discussions, due to the significant impact that burning fossil fuels has on the environment (something that even the oil companies themselves have known for at least half a century, despite their reliance on —and frequent financial support to— “climate skeptics” to deny the significant effect of anthropogenic effects on climate change —something that has been known (or at least suspected) for decades, and they have finally admitted).

For a brief moment, nuclear energy was seen as the most viable alternative, but the enthusiasm behind it received a collective cold shower after the Chernobyl disaster and with the growing issue of the nuclear waste management, that has brought attention back to “renewables” (extracting energy from the wind, the sun or the water) —with its own sets of issue.

Nuclear power still has its fans, whose arguments mainly focus on two aspects:

  • nuclear is actually the “greenest” energy source, even compared to “renewables” (especially in the medium/long term);
  • nuclear is the only energy source that can keep up with the requirements of modern, advanced societies, especially if you cut out fossil fuels

I'm not going to debate the first point here, but I'll instead focus on the second one. And my argument won't be to deny the efficiency of nuclear power (in fact, the opposite), but to show that despite its efficiency, even nuclear power cannot keep up, and that the real issue we need to tackle, as we've known for decades if not centuries now, is our inability to understand the exponential function.

But let's get into the meat of the discussion.

Fact #1: nuclear energy production has the highest density

This is an undeniable fact by whichever means you measure the density: it is true when you compare it with any of the renewables in terms of energy produced per square meter of occupied land, and it is true if you compare it with any fossil fuel generator in terms of energy produced per unit of mass consumed.

For example, an actual nuclear power plant at the current technological level occupies around 3km² and produces around 1GW, with an effective (surface) density of about 300W/m². By comparison, geothermal can do at best 15W/m², and solar —that can peak at less than 200W/m² on a good day (literally)— will typically do around 7W/m² (considering the Sun cycles) —and everything else is less than a blip compared to that.

In terms of energy density, gasoline and natural gas with their 45MJ/kg and 55MJ/kg respectively are clear winners among fossil fuels, but their chemical energy density is completely eclipsed by the nuclear energy density of uranium: a 1GW plant consumes less than 30 tons of uranium per year, giving us an effective energy density (at our current technological level) of more than 1000GJ/kg: 5 orders of magnitude higher than that of the best fossil fuels. In fact, even going by the worst possible estimates the uranium ore (from which the actual uranium used as fuel is extracted) has an effective energy density of slightly less than 80MJ/kg, which is still more than 1.5 the maximum theoretical we can get from fossil.

These data points alone could explain why so many people remain solidly convinced that nuclear power is the only viable alternative to fossil fuels, despite the economical, political and social costs of nuclear waste management.

But there's more! The attentive reader will have noticed that I've insisted on the «current technological level» moniker. There's a reason for that: while fossil fuel as an energy source has a long and well-established history, with an associated enormous progress in the efficiency of its exploitation, the same can't be said neither for most renewables, nor for nuclear.

For example, solar irradiance on the Earth surface is around 1kW/m² —about 5 times what we manage to get from it in ideal conditions, and 3 times higher than the surface energy production density of a modern nuclear power plant. A technology breakthrough in solar energy production that could bring the efficiency from 20% to 80% would make solar competitive in massively irradiated regions (think: the Sahara desert).

But the same is true also for nuclear —and in fact, for nuclear, it's considerably more true: indeed, the upper bound on the amount of energy that can be produced from matter is given us by the famous E=mc2 mass–energy equivalence equation. If we could convert 1kg of mass entirely into energy, this would produce close to 90 petajoules of energy, 90 million GJ: 90 thousand times more than what a nuclear power plant can produce today from the fuel pellets fed to it.

If we managed to improve the efficiency of nuclear energy production by a factor of 1000, we'd have an efficiency of only about 1.3%, and it would still completely eclipse any other energy generation method even if they were 100% efficient.

To say that there's room for improvements would be the understatement of the millenia. And this, too, would be an argument in favor of the adoption of nuclear power, and most importantly in investing massively in research for its improvement (especially considering that more efficient production also means less waste to worry about).

And yet, as we'll be seeing momentarily, even reaching 100% efficiency in nuclear energy extraction will not save us.

Ballpark figure #1: mass of the Earth crust.

Let's now do a quick computation of the total mass of the Earth crust, the “thin” (on a planetary scale) layer whose surface veil is the land we trod upon.

The surface of the earth is marginally more than S=510106 km². To estimate the total mass of the crust, let's pretend, very generously, that the crust can be assumed to be H=50 km deep everywhere (this is actually only true for the thickest parts of the continental crust), and of a constant density equal to that of the most dense igneous rocks (ρ=3500 kg/m³). Rounding up, this gives us a mass of the crust equal to SHρ=91022 kg.

(This is quite a large overestimation, since the actual average thickness is less than half of H, and the average density is less than 3000 kg/m³, so we're talking about at best a third of the overestimation; but as we shall see, even the generous overestimation of 91019 metric tons will not save us.)

How much energy could we extract from the crust?

Let's play a little game. Let's pretend that we have a 100% efficient mass–energy conversion: 1kg of mass _of any kind _goes in, 90PJ of energy (and no waste!) comes out.

For comparison, the world's yearly primary energy consumption currently amounts to more than 170103 TWh —let's be generous and round it down to 600103 PJ.

If we had the amazing 100% mass-to-energy conversion technology, less than 7 (metric) tons of mass would be sufficient to satisfy the current energy requirements for the whole world in a year. (For comparison, a modern 1GW nuclear power plant produces 5 tons of waste per year.)

If we had this wonderfully 100% efficient technology, it would take R=1.31019 years, at the current energy consumption rate, to exhaust the 91019 (metric) tons of the Earth's crust.

(Try it from the other side: 91022 kg of mass producing 90 PJ/kg means 8.11024 PJ of energy, which divided by 6105 PJ of yearly consumption give us a more accurate R=1.351019.)

Needless to say, we wouldn't need to worry about wasting energy ever again, considering the sun will run out long before that (estimated: 5109 years).

Or would we?

Enter the exponential function

Looking again at the world's energy consumption, we can notice that it has been growing at an almost constant rate (a ballpark estimation from the plot gives us a rate of about 2% or 3% per year, corresponding to a doubling time of about 25 to 35 years) —that is, exponentially.

And a widespread idea among supporters of nuclear energy is that with nuclear there's no need to change that —nuclear energy is the solution, after all, given how much it can give us now, and how much potential it still has, there's no need to limit how much energy we use.

The math, however, says different. Since the energy consumption will grow over time, the previously computed ratio R=1.31019 does not tell us anymore the number of years before the crust is consumed —to determine that, we rather need to check how many doublings will fit in that ratio, which we can approximate by log2(R) —and that's less than 64 doublings: at the current growth rate, that means something between 1500 and 2000 years.

For a more detailed computation, we can apply the “Exponential Expiration Time” formula, found for example in Bartlett's work: the EET in our case ln(k1.351019+1)k, which gives us 1351 years for a 3% growth rate, and 2007 years at a 2% growth rate.

This deserves repeating: at the current rate at which energy consumption grows, the entire crust of our plane would run out in at most 2000 years in the best-case scenario that we manage to find a 100% efficient mass to energy conversion method within the next decade.

Be more realistic

The actual timespan we can expect is in fact much lower than that.

For example, we're nowhere close to being 100% efficient in mass to energy conversion: in fact, you'll recall that even if we manage to improve our efficiency by a thousandfold, we'll only be barely more than 1% efficient —meaning that even the two-orders-of-magnitude-lower R=1.351017 is still an extremely generous estimate.

But there's more: the mass of the Earth crust is likely one third of that of our gross overestimation, bringing R down to around R=4.51016. But what's worse, the amount of uranium in the crust is currently estimated to be only about 4 parts in a million, which would bring R further down to about R=1.81011.

To wit, that would give us between 747 and 1100 years before we ran out of fuel, assuming we managed to extract all of the uranium and convert it to energy with a 1% efficiency, which is a thousand times better than what we can do now..

I'll take this opportunity to clarify something important about the exponential function —with an example.

At our current tech level, we would have R=2.34108 —all the uranium would be gone in 525 to 768 years. For thorium, which is around 3 times more abundant, the estimate is 562 to 822 years. Now ask yourself: what if we use both? Surely that means over a thousand years (525+562), possibly closer to 2000 (768+822)?


That's not how the exponential function works.

If energy consumption keeps growing at this steady 2-3% rate, thorium and uranium combined would only last 571 to 837 years: switching to thorium after depleting all the uranium would only add around 50 to 80 years.

Can it get worse?

It should be clear from even the most optimistic numbers seen so far that nuclear energy by itself is not sustainable in the long term: even if we switched entirely to nuclear power and found a breakthrough in the next decade or so that would bring the efficiency up by a thousand times, we won't last more than a few centuries before running out of energy, unless something is done to stop the exponential growth in energy consumption.

But it gets worse. I'm not particularly optimistic about humanity's wisdom. In fact, in my experience, the more a resource is abundant, the faster its consumption grows. And this goes for energy too.

In my mind, the biggest threat posed by nuclear power isn't even the risk posed by the mismanagement of the plants or of the still-radioactive waste. The biggest threat posed by nuclear power is the “yahoo, practically infinite energy in our hands!” attitude of its supporters, which is quite likely to lead to energy consumption growing at an even higher rate than the current one, if we ever switch to nuclear on a more extensive scale.

And with an increased growth rate, we'll run out of energy much, much earlier: at a 7% growth rate in energy consumption (doubling time: 10 years), all the estimated uranium in the crust would be gone in 237 years at our current tech level, or 332 years assuming we get the 1% efficiency breakthrough now; and the entire crust would be depleted in 591 years assuming 100% efficient mass-to-energy conversion from any material.

And no, there is no “we'll find something better in the mean time”, because there's nothing better than 100% efficient mass-to-energy conversion. Even harnessing the mass of other celestial bodies won't do more than extend the expiration time by another few hundred years, maybe a couple of millennia at best: at a growth rate of 2% and 100% conversion efficiency, the entire planet of Mars would last us for no more than 2105 years —and remember, that's not in addition to the depletion of the crust of our planet: in fact, adding the overestimated mass of Earth's crust to the mass of Mars won't even budge the expiration time by a single year.

The entire mass of all the celestial bodies in the solar system would last around 2500 years. If we add in the Sun (which means, essentially, just the mass of Sun, actually), we would still run out in 2852 years, at a 2% growth rate and 100% efficiency.

(Wait a second, I'll hear somebody say: how comes the Sun will last for billions of years still, but if we converted all of its mass into energy using our 100% efficient mechanism it won't even last 3000 years? And the answer, my friend, is again the exponential function: the Sun produces energy at a (more or less) constant rate, but we're talking about how quickly it will be depleted at a growing rate. Does that help put things in perspective? No? How about the entire Milky Way would last less than 43 centuries?)

So yes, there is no “we'll find something better”, not at the current growth rate.

The only sustainable option is reducing the growth rate of the total energy consumption.

Degrowth is the answer

Now, with this title I'm not proposing degrowth as the solution, I'm simply stating a fact: degrowth will happen, regardless of whether humanity choose voluntarily to go down that path or not. The only difference is how it will happen. But it will happen. Because if we don't wisen up and curb our own growth, we will run out of resources, and at the current growth rate that will happen at best in a few centuries, with or without nuclear power: and when it does happen (not if, but when), we will have sudden, drastic, forceful degrowth imposed on us by the lack of resources (most importantly, energy).

We're running towards an unbreakable wall. There is no other option but deceleration, and that's because deceleration will happen, whether we want it or not. Our only choice is between slowing down gracefully, and stopping before we hit the wall, or experiencing the sudden, instantaneous and painful deceleration that will happen the moment we hit that wall.

And now for the “good” news

Slowing down the growth rate is an extremely effective way to extend the EET. Let's have a look at this from our worst-case scenario: at the current technological level, and 3% growth rate, all of the estimated uranium and thorium in Earth's crust will be depleted in 571 years, but with a 2% growth rate it would last 837 years.

Dropping the growth rate to 1%, they would last 1605 years —which is more or less the EET for the entire crust at 100% efficient conversion with a 2.5% growth rate.

Going even lower, to 0.5% growth rate, they would last over 3000 years —more than it would take to deplete the Sun with 100% efficient conversion and a 2% growth rate.

Increasing the adoption and efficiency of nuclear power generation can buy us maybe a few centuries.
Decreasing the growth rate can buy us millenia.

Where would you invest with these odds?

(See also the next article on the topic for additional details and comments on the plausibility of a continuing exponential growth.)

Getting ready for 2078

Taking advantage of IRC drama to change IRC client

There has recently been quite some drama on IRC: the largest IRC network dedicated to free/libre and open source software (FLOSS), Freenode, has been taken over by a fraudulent “entrepreneur”, causing the entire (volunteer!) staff that had operated the network for decades to just quit en masse to create an alternative to Freenode, named Libera.

Most communities and projects that previously relied on Freenode have now started a migration process to move to the newly established Libera or to the pre-existing OFTC networks, leaving Freenode with “skeleton” channels and users —so much so that the new Freenode administration has made changes to the Terms of Service to basically allow, if not straightforwardly encourage, hostile takeovers of “inactive” community channels.

Drama aside, I've taken the switchover from Freenode to Libera as an opportunity to do some long-needed cleanup of my IRC networks and channel list —but not just that.

In a famous XKCD strip, “Team Chat”, Randall Munroe pokes fun at the surprising persistence of IRC as a communication platform: from the “old days” in which it was the protocol for both real-time and asynchronous communication, to the current times, where every major innovation in “instant messaging” has to allow some kind of bridge to IRC, to a hypothetical future where all human consciousness has merged, except for that single individual that still uses IRC to interface and communicate with others. The alt-text of the comic reveals an even more distant future, where finally some progress is made … in a fashion:

2078: He announces that he's finally making the jump from screen+irssi to tmux+weechat.

This is quite the nerdy joke (as frequent with XKCD).

For the uninitiated, screen is a terminal multiplexer, i.e. a program that allows you to control multiple terminals from a single one. One of the major features of terminal multiplexers is that they are “resistant” to disconnections: if your connection fails while you're using the multiplexer, you can reattach to the previous session when the connection comes back up, allowing you to continue working with nothing worse than some wasted time. This particular feature makes it a very convenient “wrapper” in conjunction with an IRC client: you run the client from within a multiplexer session running on some server, and this allows you to reconnect to it from anywhere and never lose track of your IRC conversations.

The joke is that screen is “a bit long in the tooth”, and there are more modern and feature-rich terminal multiplexers around, tmux being the most common one. Similarly, irssi is by many considered now a bit “stale” and underdeveloped, compared to other IRC clients such as weechat. Still, most people have a tendency to stick to the tools they're used to (“if it's not broken, don't fix it”), so that switching to a more modern multiplexer and IRC client combo would be considered “more effort than it's worth” —it would take some very strong selling point of the new combo to convince them into investing time and active brain power for the switchover.

(It would be so much simpler if there were ways to convert ones' configuration from one tool to the other, but not only this isn't always possible, it's also such a low-priority feature for most developers that it's rarely done even when possible.)

In my case, I have long abandoned screen for tmux, not only to host my “permanent” connections to IRC, but in general for all my terminal multiplexing needs. (Why? That would be a long story, but the short of it is that I find the level of control and (VIM-like) command syntax of tmux sufficiently superior to their screen counterparts to justify the switch; finding a documented tmux configuration that eased the transition also helped a lot.)

So for a long time I was in a sort of hybrid (XKCD-wise) situation, using the venerable irssi as my IRC client, but within tmux. And with the Freenode/ drama, I've had the opportunity to revisit the relevant XKCD comic, and finally give weechat a try.

I'm sold. I've now completed the transition to tmux+weechat, and thus consider myself ready for 2078.

(If you're curious about the reason why: weechat's selling point for me was their relay feature, that allows connection to a running weechat instance from e.g. the Android app in a more practical way than going through something like ConnectBot or its specialized cousin to connect to the IRC client running in a terminal multiplexer via SSH —because let's be honest here, Android as an operating system, and the devices on which it runs, aren't really designed for this kind of usage, usually.)

XPS 15 7590: the worst computer I've ever had

An excellent laptop on paper, ruined by a catastrophically bad implementation

On September 2019 I got a Dell XPS 15 7590, a powerful 15.6" laptop, to replace my previous Dell XPS 15 9570 from 2013 that, after 6 years of honorable service and several maintenance interventions (which, given the unusual stress I put my laptops in, was not unusual) was getting a bit too long in the tooth.

I have generally been quite satisfied with my Dell laptops (all the way back to the first one I owned, a venerable Dell Inpsiron 8100 with an out-of-this world 15" 1600×1200 UXGA high-density display, that I've mentioned before), with which I've had generally better luck than with laptops from other vendors.

This is however not the case with the one I'm currently using. In fact, my experience with this laptop has been so bad that I have no qualms in claiming that this is the worst computer I've ever had. (And I've had some pretty poor experiences, including a laptop HDD failing exactly one week after the warranty expiration, and working for two years with the shittiest display ever attached to a laptop.)

In fact, what makes the XPS 15 7590 situation particularly crappy is that not only it's a badly designed piece of hardware with components of debatable reliability (as I'm going to discuss momentarily), but it's the fact that —at least on paper— it's supposed to be a high-end powerhorse, starring an Intel Core i7-9750H 6-core/12-thred CPU running at 2.6GHz and an NVIDIA GeForce GTX 1650 3D accelerator, with a high-capacity battery to provide the user with several hours of gaming/officing/video streaming.

(Narrator: «It doesn't»)

There's so many things that went wrong in the materialization of this hardware that I'm not even sure where I should start listing them from.

First of all, I should probably mention that the power requirements of the laptop are enormous: you need a 130W power source to be able to use it while charging, and even with the battery configured for slow charging the power is still barely sufficient. I also have a strong suspicion that the distribution of the power within the system is far from reliable, due to at least two different symptoms: monitor flickering when switching from/to battery/AC, and the laptop simply shutting down when turning the discrete GPU / 3D accelerator on while on battery.

To make things worse, the power connector in the system is dramatically loose, leading to unpleasant situations where finding the correct angle/depth/tension to make the laptop even just sense that the power cord is inserted becomes a ridiculous game of contorsionism, or finding out the connector had gone disconnected by the laptop nearly dying under your hands (or suddently shutting off because you switched on the accelerator, as mentioned before).

It doesn't end here, obviously: a strong contributor to the power issues of this model is the horribly inadequate cooling system: the system runs at over 60°C even when under light load, with the fans having troubles keeping the temperature low enough under heavy load, leading to frequently throttling of both the CPU and the 3D accelerator —and a consequent massive reduction in performance (videos stuttering, gaming with FPS dropping in the single digits, long compilation time, near impossibility to do any serious benchmark of my HPC code).

And of course, the combination of higher-than-expected power requirements and lower-than expected cooling capabilities, the battery has never lasted as long as advertised (maybe half of that, out of the box).

The rest of the hardware isn't much better: it took several firmware updates to get the WiFi working reliably, Bluetooth connections still randomly die without apparent cause, and the touchpad has issues recovering from sleep mode. This last issue is particularly frustratring because it's not even easy to circumvent: when the touchpad is borked the touchscreen doesn't work either, and even external mice become unreliable due to the touchpad still firing up random events. (Apparently, a workaround is to keep the left touchpad bottom for a few seconds and this can help reset the device, or at least clear the queue or whatever else is causing the malfunction.)

Now, before anybody comes up and mention that I might just have been unlucky and drawn the short stick, getting myself a defective laptop —nope: these are structural issue, reported by several users, and not even related to the operating system (as a Linux user, I'm used to hardware issues related to poor testing with that operating system, and in fact I was half convinced that e.g. the touchpad issue might be Linux-related —but no, Windows users have the exact same issues, so it's something in hardware.)

In fact Dell even recently (January 2021) released a new BIOS version that tries to address several of the issues I mentioned, and while it does improve some of them up to a point, it's still not enough to completely fix most of them (e.g. the power cord detection is improved, but it's still extremely volatile, especially when the laptop has been on for several days; moreover, the touchpad still has issues when getting out of sleep mode). But at least the laptop does run cooler now (between 50 and 60 degrees Celsius with a light load) most of the time.

Now, as I've said before, I've had some pretty poor experience with laptops. Indeed, until I got this one, I would have said that the worst I've ever had was the one before the previous one: the one whose HDD died right after the warranty expiration, which was also the one with the, shall we say, less than stellar display; and flimsy plastic finishes; and several other small annoyances. Yet despite the traumatic experience of the HDD death (a one-off issue against every wise person should be adequately prepared) most of my gripes against the previous holder of the “worst laptop I've ever owned” had only minor annoyances counting against it. Also I came to it from my wonderful über-bright matte UXGA display of Inspiron 8100, which might have heavily biased me against its display.

But no, the XPS 15 7590 isn't like that. It's really bad.

Mind you, on paper it's really a wonderful laptop. The 4K display is also crystal clear —when it's not flickering due to the power distribution issues— and it's even a touch screen, if you choose that configuration1. The keyboard is backlit, and as laptop keyboards go, it's a pretty nice keyboard —except that some times it seems to eat up characters (but again this might be an operating system issue, although I've generally seen these issues when the touchpad isn't working either). The touchpad is large and comfortable to use, including support for multi-touch gestures —when it actually works. The number and type of connectors, while not exceptional, is pretty adequate, especially paired with the USB-C adapter with VGA, HDMI, Ethernet and USB-A 3.0 connectors. The CPU and 3D accelerator (discrete GPU) are high-end, top-of-the-line offers (for the release date of this model) —too bad you don't really get to exploit them at their full power for long, due to the thermal and power issues. The 32GB of RAM and 1TB of NVMe storage are also a very nice touch —and possibly the only thing that hasn't given me any significant issues … yet.

In the end, as I already mentioned, the biggest let-down is that what you're left with after all the issues are taken out isn't nowhere near what it was supposed, which —for the hefty price the product carries— is simply unacceptable.

I mean, if I pay 200€ for a laptop (I did, in fact, buy one such thing for my mother, that was quite strict on the upper bounds we were allowed to spend for her present) I don't expect much from it, other than the bare minimum. And in fact, with all its downsides and limitation, that laptop was exactly what we expected to be, and even managed to last way longer than we had envisioned, with minimal maintenance (although to be fair we did expand the RAM and we did replace the internal hard disk with an SSD). That's fine —I'm not buying a Ferrari, I don't expect a Ferrari.

But when I do buy a Ferrari Enzo, I most definitely don't expect to find myself using something that —on a good day— may at best resemble an Open Tigra with the pretense of being an Enzo.


The single biggest (for me) issue is that the power connector is loose and will frequently drop the laptop out of charge.

A close second is the horrible thermals, and the consequent CPU and GPU throttling.

The inability for the touchpad to reliably come out of sleep is a distant third (at least inasmuch it can be worked around in ways that the other two issues cannot).

  1. I've had usage of the touchscreen lead to hard lock-ups for the system, but I'm quite sure this is an operating system/driver issue, and not a hardware one; I can't be 100% sure though, because it's an issue which is neither easy to reproduce nor easy to debug. ↩

(How to) avoid division by zero (in C)

Leveraging boolean operators to avoid divisions by zero without conditional expressions.

Let's say we're collecting some data, and we want to compute an average of the values. Or we computed the absolute error, and we want the relative error. This requires the division of some number (e.g. the sum of the values, or the absolute error) by some other number (e.g. the number of values, the reference value).

Catastrophe arises when the number we want to divide by is 0: if the list of values we want to average is empty, for example, we would end up with an expression such as 0/0 (undefined).

Programmatically, we would like to avoid such corner cases with as little hassle as possible. The standard way to handle these cases is by using conditional expressions: if the value we want to divide for is zero, do something special, otherwise do the division we're actually interested in.

This can be cumbersome.

In what follows, we'll assume that the special handling of the zero division case would be to return the numerator unchanged: we want r=ab if b is non-zero, otherwise r=a will do. In (C) code, this could be written:

if (b != 0)
    r = a/b;
    r = a;

We can write this more succinctly using the ternary operator:

r = b != 0 ? a/b : a;

or, leveraging the fact that any non-zero value is “true”:

r = b ? a/b : a;

I'll leave it to the reader to decide if this expression is more readable or not, but the fundamental issue remains that this kind of conditional handling is still not nice. Worse, if this is done in a loop (e.g. to convert a set of absolute errors into a set of relative errors, dividing each by the corresponding —potentially null!— reference value) It can even produce sub-optimal code on modern machines with vector capabilities: since the expression for the two sides is different, and there is no way to know (until the program is running) which elements will follow which path, the compiler will have to produce sub-optimal scalar code instead of potentially much faster vectorized code.

Ideally, we would want to have the same operation done on both sides of the conditional. This can, in fact, be achieved by remarking that a is the same as a/1. We can thus write:

r = a/(b ? b : 1);

The advantage of this expression is that, as the body of a loop, it leads to better vectorization opportunities, delegating the conditional to the construction of the divisor.

But we can do better! There's a nifty trick we can employ (at least in C), leveraging the fact that the boolean negation of any non-zero value is 0, and the boolean negation of 0 is 1. The trick is:

r = a/(b + !b);

Why does this work?

If b == 0, then !b == 1, and b + !b == 0 + 1 == 1.

If b != 0, then !b == 0, and b + !b == b + 0 == b.

The result of b + !b is thus exactly the same as b ? b : 1, without using conditionals.

Addendum (OpenCL C and vector types)

The trick above doesn't work if a, b are vector types, at least in OpenCL C since the specification in this case requires that the component-wise negation of 0 is -1 rather than 1. So, for vector types, the trick becomes:

r = a/(b - !b);

to correct for the difference in sign.

Other programming languages

The trick extends trivially to any programming language that can seamlessly cast between numerical and logical values, For example, in MATLAB, Octave or Scilab one would use:

r = a./(b + ~b)

for the same purpose (notice the use of ./ rather than / to allow component-wise division between equi-dimensional vectors or matrices), and in Python:

r = a/(b + (not b))

Other languages may need explicit casting. For example, the expression in Mathematica would be:

r = a/(b + Boole[b == 0])

using the Boole function introduced in version 5.1, and in FORTRAN you would need something even uglier such as

r = a/(b + MERGE(1, 0, b == 0))

(and a recent enough version of the standard where MERGE is defined, I believe this was introduced with F90) which is just as ugly as the C version with the ternary operator.

A surprising practical gadget: the finger mouse


A friend of mine has been doing for a while now a weekly reading on YouTube. Sometimes you can clearly see him holding a computer mouse in his hands, whose only purpose is to scroll the reading material.

Myself, I'm a big fan of webcomics, and find myself frequently reading material that is published online in long strip format, where each chapter or episode is a single continuous vertical strip. This format is geared towards “mobile” usage, designed to be viewed on a display in “portrait” orientation, but if you're willing to risk it on your laptop (and don't want to spend the money to get a convertible one that can be transformed in a tablet), you can simply flip the laptop on its size, reading it like a book. The worst downside I've found to this configuration is —possibly suprisingly— the input mechanism.

The solution to my long strip webcoming fruition issues and my friend's reading is the same: something that allow scrolling documents on the computer without the full encumbrance of a traditional mouse.

Enter the finger mouse

The finger mouse, or ring mouse, is an input device that is tied to a finger and typically operated with the other fingers (usually the thumb) of the same hand.

There are at least three forms of finger mice, that I've seen, that chiefly differ by how motion is handled: the trackball, the “nub”, the gyro and the optical.

Trackball finger mice follow the same mechanism as traditional desktop trackballs, and thus the reverse of the old-style mice with balls: you roll the ball with the thumb, and the motion of the ball is converted into planar motions (combinations of left/right and up/down).

Nub finger mice follow the same mechanism as the TrackPoint™ or pointing stick found on some laptop keyboards (most famously IBM/Lenovo): the nub is pushed around with the thumb, and again this converts to planar motions.

Gyroscopic mice use an internal gyroscope to convert hand motions into planar motions. This has the advantage of freeing up some estate on the rest of the device for more buttons.

Finally, optical finger mice work exactly like the usual modern mice, with a laser and optical sensor, the only difference being that instead of holding them with your hands, the pointing device is tied to a finger.

The search (and the finding)

While researching finger mice options (as a present for my friend and obviously for me), I've been held back by two things: pricing and size.

Size was a particularly surprising issue: most of the finger mice options I've seen appear to be unwieldy, some even resembling more dashboards that would require a full hand (other than the one holding it) to operate, than a practical single-handed input device.

Price was no joke either: with more modest pricing ranging between 25€ and 50€, and some options breaking the 100€ barrier or even approaching 200€, one would be led to ask: who is the intended target for these devices? Most definitely not amateurs like me or my friend, but I would be hard pressed to find a justification even at the professional level, except maybe for the lower-cost options if you spent your life doing presentations.

Ultimately, I did find a palatable solution in this (knock-off?) solution: it has everything I wanted (i.e. an easily acccessible scrollwheel) and the price (around 10€ plus shipping) was low enough to cover the worst case scenario. And this is its review.


First, the good news. I'm extremely favorably impressed by the device. It works, it does what I wanted it for, and it's in fact an exceptionally practical device. I mean, I'm not going to say it's good enough for gaming, but I did use it exactly for that too, in the end.

I'm not a pro gamer, most of the games I play are not particularly challenging and I'm generally not a fan of stuff that requires quick reflexes, and perfect timing. But, I do play puzzle-platform games and sometimes you do need pretty good control and timing for them. And I was able to achieve both with this device —definitely much more so with it than with my laptop's touchpad.

To wit, a couple of years ago I had abandoned The swapper shortly after starting it, because I came across a puzzle that had an obvious solution that I was unable to complete on my trackpad. Shortly after getting the new finger mouse, and using it to my enjoyment as no more than a scrollwheel for my weekly dose of long-strip webcomics, I decided to give it a go: let's see if we can finish that stupid puzzle; what's the worst that can happen?

In this case, the worst that happened was that I did manage to solve the puzzle, and many other puzzles after it, all while lying in bed with the laptop on my stomach, a hand on the keyboard (WASD) and the other, with the finger mouse, lying relaxed on the bed sheets. Until 2:30am.

So yes, it's accurate enough at least for casual gaming (I've also replayed Portal, and finally started Lugaru, which was unplayable on the touchpad) and what's more it works on surfaces where a standard mouse would have issues working, such as bed sheets and covers or the shirt or T-shirt you're wearing. Or the palmrests of your laptop, if you don't want to look too weird (but in that case you're not the kind of person that flips the laptop on its side to use it in portrait mode, so you have one less reason to enjoy this gadget).

The device runs on battery, with a single AAA battery. It has a physical switch to turn the power on and off, and from what I understand it goes into low power mode while not being in use too. And of course you can use recharable batteries in it without issues (it's what I'm using).

And it works out of the box (at least on my machine, running Linux).


The device isn't perfect.

It's wireless, which while practical may be an issue for security-conscious people (and possibly health fanatics too).

It does require a surface for use as a mouse (but of course not if you only care about the scrollwheel, which is my case for the most part), but it's not that big of an issue since, as mentioned, I've been able to use it even on surfaces where even standard optical mice are notoriously problematic (there are, however, surfaces on top of which the finger mouse has issues too).

It can take a bit to get used to it, and it feels wierd. The most comfortable way to use it is to tie it to the outside of the middle finger, resting the index finger on top of it, and leaving the thumb to control the buttons and scrollwheel. It's not particularly heavy, but not exceptionally light either (yet I suspect a large part of the weight actually comes from the battery, so if you can find an extra-light battery, that might fix the issue for you). I got used to it and it doesn't annoy me in the least, but I've read reviews of people that find it too weird, so this is most definitely subjective.

It ties to the finger with a strap; this allows freedom to regulate the tightness, but it may be difficult to find the optimal one: too tight, and the diminished circulation can make your finger go numb; not tight enough, and the wiggling will chafe your skin.

It's designed to be used with the right hand. This isn't a big problem for me, since I've always used mice with my right hand even though I'm left-handed, but it might be an issue for other people. It can be used with the left hand, and the most practical way I've found for is to tie it to the inside of the middle finger (so it's inside your hand, more similar to classic mice), but you'll need to flip the axis directions (both horizontal and vertical —and possibly the buttons too) unless you use it on your stomach.


The specific product I bought for myself is already not available anymore on the Amazon page, but several other similarly-priced variants are there. The product I have identifies with USB ID 062A:4010, registered to MosArt for a wireless Keyboard/Mouse combo (even though in this case there's only a mouse), and I've seen the same product ID used in several cheapo brands mouse and keyboard/mouse combos (Trust, RadioShack, etc). Products similar to mine, always from no-name brands and at similar (around 10€, sometimes less) prices, can also be found on both Amazon and other e-commerce sites. I don't know how closesly they match the products I've reviewed (aside from the branding), but given my package flew in almost directly from the factory in China, I'm going out on a limb and guess that for the most part they're all the same thing.

Ah, you want pictures too? There's a couple on my Twitter.

Por una subraya

Days of work lost because of an underscore

(I'm told guion bajo is the preferred name for the underscore sign _ in Castilian, but that would have made it harder to echo Por una cabeza. Then again, why the Spanish title? Because.)

(Also, this is going to be a very boring post, because it's mostly just a rant to let off some steam after a frustrating debug session.)

I'm getting into the bad habit of not trusting the compiler, especially when it comes to a specific compiler1. I'm not sure if there's a particular reason for that, other than —possibly— a particular dislike for its closed nature, or past unpleasant experiences in trying to make it work with the more recent versions of the host compiler(s).

Compilers have progressed enourmously in the latest years. I have a strong suspicion that this has been by and large merit of the (re)surgence of the Clang/LLVM family, and the strong pressure it has put the GCC developers under —with the consequent significant improvements on both sides.

However, compilers that need to somehow interact with these compilers (most famously the nvcc compiler developed by NVIDIA for CUDA) have a tendency to lag behind: you can't always the latest version of GCC (or Clang for the amtter) with them, and they themselves do not provide many of the benefits that developers have come to expect from modern compiler, especially in the fields of error and warning message quality and detail, or even in the nature of those same warnings and errors.

This rant is born out of a stressing and frustrating debugging session that has lasted for a few days, and that could have easily been avoided with better tools. What made the bug particularly frustrating was that it seemed to trigger or disappear in the most incoherent of circurmstances. Adding some conditional code (even code that would never run) or moving code around in assumingly idempotent transformations would be enough to make it appear, or disappear again, until the program was recompiled.

The most frustrating part was that, when the code seemed to work, it would seem to work correctly (or at least give credible results). When it seemed to not work, it would simply produce invalid values from thin air.

The symptoms, for anyone with some experience in the field, would be obviously: reading from unitialized memory —even if for some magic reason it seemed to work (when it worked) despite the massively parallel nature of the code and the hundreds of thousands of cycles it ran for.

The code in question is something like this:

struct A : B, C, D
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
        B(index_, params),
        C(index_, pdata, params),
        D(r, params),
        f(func(r, params))

Can you spot what's wrong with the code?

Spoiler Alert!

Here's the correct version of the code:

struct A : B, C, D
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
        B(index_, params),
        C(index_, pdata, params),
        D(r_, params),
        f(func(r, params))

The only difference, in case you're having trouble noticing, is that D is being initialized using r_ instead of r.

What's the difference? The object we're talking about, and initialization order. r is the member of our structure, r_ is the parameter we're passing to the constructor to initialize it. After the structure initialization is complete, they will hold the same value, but until r gets initialized (with the value r_), its content is undefined, and using it (instead of r_) will lead to undefined behavior; and D gets initialized before r, because it's one of the parent structures for the structure we want to initialize —and note that this would happen even if we put the initialization of r before the initialization of D, because initialization actually happens in the order the members (and parents) are declared, not in the order their initialization is expressed.

That single _ made me waste at least two days of work.

Now, this error is my fault —it's undoubtedly my fault, it's a clear example of PEBKAC. And yet, proper tooling would have caught it for me, and made it easier to debug.

  1. if you want to know, I'm talking about the nvcc compiler, i.e. the compiler the handles the single-source CUDA files for GPU programming. ↩

10 digits

The question

How many digits do you need, in base 10, to represent a given (binary) number?

A premise

The C++ standard defines a trait for numerical datatypes that describes “the number of base-10 digits that can be represented by a given type without change”: std::numeric_limits::digits10.

What this means is that all numbers with at most that many digits in base 10 will be representable in the given type. For example, 8-bit integers can represent all numbers from 0 to 99, but not all numbers from 0 to 999, so their digit10 value will be 2.

For integer types, the value can be obtained by taking the number of bits (binary digits) used by the type, dividing by log2(10) (or multiplying by log10(2), which is the same thing), and taking the integer part of the results.

This works because with n bits you can represent 2n values, and with d digits you can represent 10d values, and the condition for digit10 is that d should be such that 10d2n. By taking the logarithm on both sides we get dlog10(2n)=nlog10(2), and since d must be an integer, we get the formula d=nlog10(2).

(Technically, this is still a bit of a simplification, since actually the highest representable number with n bits is 2n-1, and that's still only for unsigned types; for signed one things get more complicated, but that's beyond our scope here.)

The answer

What we want is in some sense the complement of digit10, since we want to ensure that our number of (decimal) digits will be sufficient to represent all numbers of the binary type. Following the same line of reasoning above, we want d such that 2n10d, and thus, skipping a few passages, d=nlog10(2), at least assuming unsigned integer types.

We're looking for the simplest formula that gives us the given result. With C++, we could actually just use digits10 plus one, but we want something independent, for example because we want to use this with C (or any other language that doesn't have a digits10 equivalent).

The first thing we want to do is avoid the logarithm. We could compute the actual value, or at least a value with sufficient precision, but in fact we'll avoid doing that, and instead remember that 210 is pretty close to 103, which puts the logarithm in question in the 310 ballpark, an approximation that is good enough for the first several powers of 210.

With this knowledge, we can approximate dn310. In most programming languages integer division with positive operands returns the floor rather than the ceiling, but it can be turned into something that returns the ceiling by adding to the numerator one less than the denominator1. So:


is the formula we're looking for. In a language like C, where the size of types is given in bytes, that would be come something like

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*3+9)/10)

where we're assuming 8 bits per byte (adapt as needed if you're on an insane architecture).


The C expression provided above isn't universal. It is better than the even more aggressive approximation sizeof(type)*CHAR_BIT/3, which for example fails for 8-bit bytes (gives 2 instead of 3) and overestimates the result for 64-bit data types (gives 21 instead of 20), but it's not universal.

It works for most standard signed data types, because the number of base-10 digits needed to represent them is almost always the same as their unsigned equivalents, but for example it doesn't work for 64-bit data types (the signed ones need one less digit in this case).

Moreover, it actually starts breaking down for very large integers, because the 310 approximation commits an error of about 10% which starts becoming significant at 256 bits or higher: the formula predicts 77 digits, but 78 are actually needed.

We can expand this by taking more digits to approximate the logarith. For example

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*301+999)/1000)

doesn't break down until 4096 bits, at which point it misses one digit again. On the other hand

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*30103+99999)/100000)

can get us reasonably high (in fact, by a quick check it seems this formula should work correctly even for types with 2216=265536 bits, if not more). It also has a nice symmetry to it, even though I guess it would overflow on machines with smaller word sizes (but then again, you probably wouldn't need it there anyway).

  1. If a,b are non-negative integers with b>0, then a+b-1b=ab: (1) if a is a multiple of b, then adding b-1 doesn't go to the next multiple, and thus on both sides we have ab (which is an integer) and (2) if a is not a multiple of b, adding b-1 will overtake exactly one multiple of b. More formally, we can write a=kb+c where k,c are non-negative integers, and c<b (c=0 if a is a multiple, and c>0 otherwise). Define s=sign(c), i.e. s=0 if c=0 and s=1 otherwise. Then a+b-1b=kb+c+b-1b=(k+1)+c-1b=k+1-(1-s)=k+s and ab=kb+cb=k+cb=k+s. ↩


Mixed DPI and the X Window System

I'm writing this article because I'm getting tired of repeating the same concepts every time someone makes misinformed statements about the (lack of) support for mixed-DPI configurations in X11. It is my hope that anybody looking for information on the subject may be directed here, to get the facts about the actual possibilities offered by the protocol, avoiding the biased misinformation available from other sources.

If you only care about “how to do it”, jump straight to The RANDR way, otherwise read along.

So, what are we talking about?

The X Window System

The X Window System (frequently shortened to X11 or even just X), is a system to create and manage graphical user interfaces. It handles both the creation and rendering of graphical elements inside specific subregions of the screen (windows), and the interaction with input devices (such as keyboards and mice).

It's built around a protocol by means of which programs (clients) tell another program (the server, that controls the actual display) what to put on the screen, and conversely by means of which the server can inform the client about all the necessary information concerning both the display and the input devices.

The protocol in question has evolved over time, and reached version 11 in 1987. While the core protocol hasn't introduced any backwards-incompatible changes in the last 30 years (hence the name X11 used to refer to the X Window System), its extensible design has allowed it to keep abreast of technological progress thanks to the introduction and standardization of a number of extensions, that have effectively become part of the subsequent revisions of the protocol (the last one being X11R7.7, released in 2012; the next, X11R7.8, following more a “rolling release” model).


Bitmapped visual surfaces (monitor displays, printed sheets of paper, images projected on a wall) have a certain resolution density, i.e. a certa number of dots or pixels per unit of length: dots per inch (DPI) or pixel per inch (PPI) is a common way to measure it. The reciprocal of the the DPI is usually called “dot pitch”, and refers to the distance between adjacent dots (or pixels). This is usually measured in millimeters, so conversion between DPI and dot pitch is obtained with

DPI   = pitch/25.4
pitch = 25.4/DPI

(there being 25.4 millimeters to the inch).

When it comes to graphics, knowing the DPI of the output is essential to ensure consistent rendering (for example, a drawing program may have a “100% zoom” option where the user might expect a 10cm line to take 10cm on screen), but when it comes to graphical interface elements (text in messages and labels, sizes of buttons and other widgets) the information itself may not be sufficient: usage of the surface should ideally also be taken into consideration.

To this end, the concept of reference pixel was introduced in CSS, representing the pixel of an “ideal” display with a resolution of exactly 96 DPI (dot pitch of around 0.26mm) viewed from a distance of 28 inches (71cm). The reference pixel thus becomes the umpteenth unit of (typographical) length, with exactly 4 reference pixels every 3 typographical points.

Effectively, this allows the definition of a device pixel ratio, as the ratio of device pixels to reference pixels, taking into account the device resolution (DPI) and its assumed distance from the observer (for example, a typical wall-projected image has a much lower DPI than a typical monitor, but is also viewed from much further away, so that the device pixel ratio can be assumed to be the same).

Mixed DPI

A mixed-DPI configuration is a setup where the same display server controls multiple monitors, each with a different DPI.

For example, my current laptop has a built-in 15.6" display (physical dimensions in millimeters: 346×194) with a resolution of 3200×1800 pixels, and a pixel density of about 235 DPI —for all intents and purposes, this is a HiDPI monitor, with slightly higher density than Apple's Retina display brand. I frequently use it together with a 19" external monitor (physical dimensions in millimeters: 408×255) with a resolution of 1440×900 pixels and a pixel density of about 90 DPI —absolutely normal, maybe even somewhat on the lower side.

The massive difference in pixel density between the two monitors can lead to extremely inconsistent appearance of graphical user interfaces that do not take it into consideration: if they render assuming the standard (reference) DPI, elements will appear reasonably sized on the external monitor, but extremely small on the built-in monitor; conversely, if they double the pixel sizing of all interface elements, they will appear properly sized on the built-in monitor, but oversized on the external one.

Proper support for such configuration requires all graphical and textual elements to take a number of pixel which depends on the monitor it is being drawn on. The question is: is this possible with X11?

And the answer is yes. But let's see how this happens in details.

A brief history of X11 and its support for multiple monitors

The origins: the X Screen

An interesting aspect of X11 is that it was designed in a period where the quality and characteristics of bitmap displays (monitors) was much less consistent than it is today. The core protocol thus provides a significant amount of information for the monitors it controls: the resolution, the physical size, the allowed color depth(s), the available color palettes, etc.

A single server could make use of multiple monitors (referred to as “X Screen”s), and each of them could have wildly different characteristics (for example: one could be a high-resolution monochrome display, the other could be a lower-resolution color display). Due to the possible inconsistency between monitors, the classical support for multiple monitors in X did not allow windows to be moved from one X Screen to another. (How would the server render a window created to use a certain kind of visual on a different display that didn't support it?)

It should be noted that while the server itself didn't natively support moving windows across X Screens, clients could be aware of the availability of multiple displays, and they could allow (by their own means) the user to “send” a window to a different display (effectively destroying it, and recreating it with matching content, but taking into account the different characteristics of the other display).

A parenthetical: the client, the server and the toolkit

Multiple X Screen support being dependent on the client, rather than the server, is actually a common leit motif in X11: due to one of its founding principles (“mechanism, not policy”), a lot of X11 features are limited only by how much the clients are aware of them and can make use of them. So, something may be allowed by the protocol, but certain sets of applications don't make use of the functionality.

This is particularly relevant today, when very few applications actually communicate with the X server directly, preferring to rely on an intermediate toolkit library that handles all the nasty little details of communicating with the display server (and possibly even display servers of different nature, not just X11) according to the higher-level “wishes” of the application (“put a window with this size and this content somewhere on the screen”).

The upside of this is that when the toolkit gains support for a certain feature, all applications using it can rely (sometimes automatically) on this. The downside is that if the toolkit removes support for certain features or configurations, suddenly all applications using it stop supporting them too. We'll see some example of this specifically about DPI in this article.

Towards a more modern multi-monitor support: the Xinerama extension

In 1998, an extension to the core X11 protocol was devised to integrate multiple displays seamlessly, making them appear as a single X Screen, and thus allowing windows to freely move between them.

This extension (Xinerama) had some requirements (most importantly, all displays had to support the same visuals), but for the most part they could be heterogeneous.

An important downside of the Xinerama extension is that while it provides information about the resolution (in pixels) and relative position (in pixels!) of the displays, it doesn't reveal any information about the physical characteristics of the displays.

This is an important difference with respect to the classic “separate X Screens” approach: the classic method allowed clients to compute the monitor DPI (as both the resolution and the physical size were provided), but this is not possible in Xinerama.

As a consequence, DPI-aware applications were actually irremediably broken on servers that only supported this extension, unless all the outputs had the same (or similar enough) DPI.

Modern multi-monitor in X11: the XRANDR extension

Xinerama had a number of limitations (the lack of physical information about the monitors being just one of many), and it was essentially superseded by the RANDR (Resize and Rotate) extension when the latter reached version 1.2 in 2007.

Point of interest for our discussion, the RANDR extension took into consideration both the resolution and physical size of the display even when originally proposed in 2001. And even today that it has grown in scope and functionality, it provides all necessary information for each connected, enabled display.

The RANDR caveat

One of the main aspects of the RANDR extension is that each display is essentially a “viewport” on a virtual framebuffer. This virtual framebuffer is the one reported as “X Screen” via the core protocol, even though it doesn't necessarily match any physical screen (not even when a single physical screen is available!).

This gives great flexibility on how to combine monitors (including overlaps, cloning, etc); the hidden cost is that all of the physical information that the core protocol would report about the virtual backend to its X Screen become essentially meaningless.

For this reason, when the RANDR extension is enabled, the core protocol will synthetize ficticious physical dimensions for its X Screen, from the overall framebuffer size, assuming a “reference” pixel density of 96 DPI.

When using a single display covering the whole framebuffer, this leads to a discrepancy between the physical information provided by the core protocol, and the one reported by the RANDR extension. Luckily, the solution for this is trivial, as the RANDR extension allows changing the ficticious dimensions of the X Screen to any value (for example, by using commands such as xrandr --dpi eDP-1, to tell the X server to match the core protocol DPI information to that of the eDP-1 output).

Mixed DPI in X11

Ultimately, X11, as a display protocol, has almost always had support for mixed DPI configurations. With the possible exception of the short period between the introduction of Xinerama and the maturity of the RANDR extension, the server has always been able to provide its clients with all the necessary information to adapt their rendering, window by window, widget by widget, based on the physical characteristics of the outputs in use.

Whether or not this information is being used correctly by clients, however, it's an entirely different matter.

The core way

If you like the old ways, you can manage your mixed DPI setup the classic way, by using separate X Screens for each monitor.

The only thing to be aware of is that if your server is recent enough (and supports the RANDR extension), then by default the core protocol will report a DPI of 96, as discussed here. This can be worked around by calling xrandr as appropriate during the server initialization.

Of course, whether or not applications will use the provided DPI information, X Screen by X Screen, is again entirely up the application. For applications that do not query the X server about DPI information (e.g. all applications using GTK+3, due to this regression), the Xft.dpi resource can be set appropriately for each X Screen.

The RANDR way

On a modern X server with RANDR enabled and monitors with (very) different DPIs merged in a single framebuffer, well-written applications and toolkits can leverage the information provided by the RANDR extension to get the DPI information for each output, and use this to change the font and widget rendering depending on window location.

(This will still result in poor rendering when a window spans multiple montiors, but if you can live with a 2-inch bezel in the middle of your window, you can probably survive misrendering due to poor choice of device pixel ratios.)

The good news is that all applications using the Qt toolkit can do this more or less automatically, provided they use a recent enough version (5.6 at least, 5.9 recommended). Correctly designed Applications can request this behavior from the toolkit on their own (QApplication::setAttribute(Qt::AA_EnableHighDpiScaling);), but the interesting thing is that the user can ask this to be enabled even for legacy applications, by setting the environment variable QT_AUTO_SCREEN_SCALE_FACTOR=1.

(The caveat is that the scaling factor for each monitor is determined from the ratio between the device pixel ratio of the monitor and the device pixel ratio of the primary monitor. So make sure that the DPI reported by the core protocol (which is used as base reference) matches the DPI of your primary monitor —or override the default DPI used by Qt applications by setting the QT_FONT_DPI environment variable appropriately.)

The downside is that outside of Qt, not many applications and tookits have this level of DPI-awareness, and the other major toolkit (GTK+) seems to have no intention to acquire it.

A possible workaround

If you're stuck with poorly written toolkits and applications, RANDR still offers a clumsy workaround: you can level out the heterogeneity in DPI across monitors by pushing your lower-DPI displays to a higher virtual resolution than their native one, and then scaling this down. Combined with appropriate settings to change the DPI reported by the core protocol, or the appropriate Screen resources or other settings, this may lead to a more consistent experience.

For example, I could set my external 1440×900 monitor to “scale down” from a virtual 2880×1800 resolution (xrandr --output DP-1 --scale-from 2880x1800), which would bring its virtual DPI more on par with that of my HiDPI laptop monitor. The cost is a somewhat poorer image overall, due to the combined up/downscaling, but it's a workable workaround for poorly written applications.

(If you think this idea is a bit stupid, shed a tear for the future of the display servers: this same mechanism is essentially how Wayland compositors —Wayland being the purported future replacement for X— cope with mixed-DPI setups.)

Final words

Just remember, if you have a mixed DPI setup and it's not properly supported in X, this is not an X11 limitation: it's the toolkit's (or the application's) fault. Check what the server knows about your setup and ask yourself why your programs don't make use of that information.

If you're a developer, follow Qt's example and patch your toolkit or application of choice to properly support mixed DPI via RANDR. If you're a user, ask for this to be implemented, or consider switching to better applications with proper mixed DPI support.

The capability is there, let's make proper use of it.

A small update

There's a proof of concept patchset that introduces mixed-DPI support for GTK+ under X11. It doesn't implement all of the ideas I mentioned above (in particular, there's no Xft.dpi support to override the DPI reported by core), but it works reasonably well on pure GTK+ applications (more so than in applications that have their own toolkit abstraction layer, such Firefox, Chromium, LibreOffice).

Cross-make selection

I was recently presented, as a mere onlooker, with the potential differences that exist in the syntax of a Makefile for anything non-trivial, when using different implementations of make.

(For the uninitiated, a Makefile is essentially a list of recipes that are automatically followed to build some targets from given dependencies, and are usually used to describe how to compile a program. Different implementations of make, the program that reads the Makefiles and runs the recipes, exist; and the issue is that for anything beyond the simplest of declarations and recipe structure, the syntax they support is different, and incompatible.)

Used as I was to using GNU make and its extensive set of functions and conditionals and predefined macros and rules, I rarely bothered looking into alternatives, except maybe for completely different build systems or meta-build-systems (the infamous GNU autotools, cmake, etc). However, being presented with the fact that even simple text transformations could not be done in the same way across the two major implementations of make (GNU and BSD) piqued my curiosity, and I set off to convert the rather simple (but still GNU-dependent) Makefile of my clinfo project to make it work at least in both GNU and BSD make.

Get the code for


Since clinfo is a rather simple program, its Makefile is very simple too:

  1. it defines the path under which the main source file can be found;
  2. it defines a list of header files, on which the main source file depends;
  3. it detects the operating system used for the compilation;
  4. it selects libraries to be passed to the linker to produce the final executable (LDLIBS), based on the operating system.

The last two points are necessary because:

  • under Linux, but not under any other operating system, the dl library is needed too;
  • under Darwin, linking to OpenCL is done using -framework OpenCL, whereas under any other operating system, this is achieved with a simpler -lOpenCL (provided the library is found in the path).

In all this, the GNU-specific things used in the Makefile were:

  1. the use of the wildcard function to find the header files;
  2. the use of the shell function to find the operating system;
  3. the use of the ifeq/else/endif conditionals to decide which flags to add to the LDLIBS.

Avoiding wildcard

In my case, the first GNUism is easily avoided by enumerating the header files explicitly: this has the underside that if a new header file is ever added to the project, I should remember to add it myself.

(An alternative approach would be to use some form of automatic dependency list generation, such as the -MM flag supported by most current compiles; however, this was deemed overkill for my case.)

(A third option, assuming a recent enough GNU make, is presented below.)

Avoiding shell

BSD make supports something similar to GNU make's shell function by means of the special != assignment operator. The good news is that GNU make has added support for the same assignment operator since version 4 (introduced in late 2013). This offers an alternative solution for wildcard as well: assigning the output of ls to a variable, using !=.

If you want to support versions of GNU make older than 4, though, you're out of luck: there is no trivial way to assign the output of a shell invocation to a Makefile variable that works on both GNU and BSD make (let alone when strict POSIX compliance is required).

If (and only if) the assignments can be done ‘before’ any other assignment is done, it is however possible to put them into a GNUmakefile (using GNU's syntax) and makefile (using BSD's syntax), and then have both of these include the shared part of the code. This works because GNU make will look for GNUmakefile first.

In my case, the only call to shell I had was a $(shell uname -s) to get the name of the operating system. The interesting thing in this case is that BSD make actually defines its own OS variable holding just what I was looking for.

My solution was therefore to add a GNUmakefile which defined OS using the shell invocation, and then include the same Makefile which is parsed directly by BSD make.

Conditional content for variables

Now comes the interesting part: we want the content of a variable (LDLIBS in our case) to be set based on the content of another variable (OS in our case).

There are actually two things that we want to do:

  1. (the simple one) add something to the content of LDLIBS only if OS has a specific value;
  2. (the difficult one) add something to the content of LDLIBS only if OS does not have a specific value.

Both of these would be rather trivial if we had conditional statements, but while both BSD and GNU make do have them, their syntax is completely incompatible. We therefore have to resort to a different approach, one that leverages features present in both implementations.

In this case, we're going to use the fact that when using a variable, you can use another variable to decide the name of the variable to use: whenever make comes across the syntax $(foo) (or ${foo}), it replaces it with the content of the foo variable. The interesting thing is that this holds even within another set of $() or ${}, so that if foo = bar and bar = quuz, then $(${foo}) expands to $(bar) and thus ultimately to quuz.

Add something to a variable only when another variable has a specific value

This possibility actually allows us to solve the ‘simple’ conditional problem, with something like:

LDLIBS_Darwin = -framework OpenCL
LDLIBS_Linux  = -ldl

Now, if OS = Darwin, LDLIBS will get extended by appending the value of LDLIBS_Darwin; if OS = Linux, LDLIBS gets extended by appending the value of LDLIBS_Linux, and otherwise it gets extended by appending the value of LDLIBS_, which is not defined, and thus empty.

This allows us to achieve exactly what we want: add specific values to a variable only when another variable has a specific value.

Add something to a variable only when another variable does not have a specific value

The ‘variable content as part of the variable name’ trick cannot be employed as-is for the complementary action, which is adding something only when the content of the control variable is not some specific value (in our case, adding -lOpenCL when OS is not Darwin).

We could actually use the same trick if the Makefile syntax allowed something like a -= operator to ‘remove’ things from the content of a variable (interestingly, the vim scripting and configuration language does have such an operator). Since the operator is missing, though, we'll have to work around it, and to achieve this we will use the possibility (shared by both GNU and BSD make) to manipulate the content of variables during expansion.

Variable content manipulation is another field where the syntax accepted by the various implementations differs wildly, but there is a small subset which is actually supported by most of them (even beyond GNU and BSD): the suffix substitution operator.

The idea is that often you want to do something like enumerate all your source files in a variable sources = file1.c file2.c file3.c etc and then you want to have a variable with all the object files that need to be linked, that just happen to be the same, with the .c suffix replaced by .o: in both GNU and BSD make (and not just them), this can be achieved by doing objs = $(sources:.c=.o). The best part of this is that the strings to be replaced, and the replacement, can be taken from the expansion of a variable!

We can then combine all this knowledge into our ‘hack’: always include the value we want to selectively exclude, and then remove it by ‘suffix’ substitution, where the suffix to be replaced is defined by a variable-expanded variable name: a horrible, yet effective, hack:

LDLIBS_not_Darwin = -lOpenCL
LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

This works because when OS = Darwin, the substitution argument will be $(LDLIBS_not_Darwin) which in turn expands to -lOpenCL, so that in the end the value assigned to LDLIBS will be ${LDLIBS:-lOpenCL=}, which is LDLIBS with -lOpenCL replaced by the empty string. For all other values of OS, we'll have ${LDLIBS:=} which just happens to be the same as ${LDLIBS}, and thus LDLIBS will not be changed1

Cross-make selection

We can then combine both previous ideas:


LDLIBS_Darwin = -framework OpenCL
LDLIBS_not_Darwin = -lOpenCL
LDLIBS_Linux  = -ldl

LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

And there we go: LDLIBS will be -framework OpenCL on Darwin, -lOpenCL -ldl on Linux, -lOpenCL on any other platform, regardless of wether GNU or BSD make are being used.

Despite the somewhat hackish nature of this approach (especially for the ‘exclusion’ case), I actually like it, for two reasons.

The first is, obviously, portability. Not requiring a specific incarnation of make is at the very least an act of courtesy. Being able to do without writing two separate, mostly duplicate, Makefiles is even better.

But there's another reason why I like the approach: even though the variable-in-variable syntax isn't exactly the most pleasurable to read, the intermediate variable names end up having a nice, self-explanatory name that gives a nice logical structure to the whole thing.

That being said, working around this kind of portability issues can make a developer better appreciate the need for more portable build systems, despite the heavier onus in terms of dependencies. Of course, for a smaller projects, deploying something as massive as autotools or cmake would still be ridiculous overkill: so to anyone that prefers leaner (if more fragile) options, I offer this set of solutions, in the hope that they'll help stimulate convergence.

  1. technically, we will replace the unexpanded value of LDLIBS with its expanded value; the implications of this are subtle, and a bit out of scope for this article. As long as this is kept as the 'last' change to LDLIBS, everything should be fine. ↩

Poi poi, mai mai

Note: this article makes use of MathML, the standard XML markup for math formulas. Sadly, this is not properly supported on some allegedly ‘modern’ and ‘feature-rich’ browsers. If the formulas don't make sense in your browser, consider reporting the issue to the respective developers and/or switching to a standard-compliant browser.

For the last three decades or so, non-integer numbers have been represented on computers following (predominantly) the floating-point standard known as IEEE-754.

The basic idea is that each number can be written in what is also known as engineering or scientific notation, such as 2.34567×1089, where the 2.34567 part is known as the mantissa or significand, 10 is the base and 89 is the exponent. Of course, on computers 2 is more typically used as base, and the mantissa and exponent are written in binary.

Following the IEEE-754 standard, a floating-point number is encoded using the most significant bit as sign (with 0 indicating a positive number and 1 indicating a negative number), followed by some bits encoding the exponent (in biased representation), and the rest of the bits to encode the fractional part of the mantissa (the leading digit of the mantissa is assumed to be 1, except for denormals in which case it's assumed 0, and is thus always implicit).

The biased representation for the exponent is used for a number of reasons, but the one I care about here is that it allows “special cases”. Specifically, the encoded value of 0 is used to indicate the number 0 (when the mantissa is also set to 0) and denormals (which I will not discuss here). An exponent with all bits set to 1, on the other hand, is used to represent (when the mantissa is set to 0) and special values called “Not-a-Number” (or NaN for short).

The ability of the IEEE-754 standard to describe such special values (infinities and NaN) is one of its most powerful features, although often not appreciated by programmers. Infinity is extremely useful to properly handle functions with special values (such as the trigonometric tangent, or even division of a non-zero value by zero), whereas NaNs are useful to indicate that somewhere an invalid operation was attempted (such as dividing zero by zero, or taking the square root of a negative number).

Consider now the proverb “later means never”. The Italian proverb with the same meaning (that is, procrastination is often an excuse to not do things ever) is slightly different, and it takes a variety of forms («il poi è parente del mai», «poi è parente di mai», «poi poi è parente di mai mai») which basically translate to “later is a relative of never”.

What is interesting is that if we were to define “later” and “never” as “moments in time”, and assign numerical values to it, we could associate “later” with infinity (we are procrastinating, after all), while “never”, which cannot actually be a “moment in time” (it is never, after all) would be … not a number.

(Actually, it's also possible to consider “later” as being indefinite in time, and thus not a (specific) number, and “never” having an infinite value. Or to have both later and never be not numbers. But that's fine, it still works!)

So as it happens, both later and never can be represented in the IEEE-754 floating-point standard, and they share the special exponent that marks non-finite numbers.

Later, it would seem, is indeed a relative of never.

Warp shuffles, or why OpenCL should expose low-level interfaces

Since OpenCL 2.0, the OpenCL C device programming language includes a set of work-group parallel reduction and scan built-in functions. These functions allow developers to execute local reductions and scans for the most common operations (addition, minimum and maximum), and allow vendors to implement them very efficiently using hardware intrinsics that are not normally exposed in OpenCL C.

In this article I aim at challenging the idea that exposing such high-level functions, but not the lower-level intrinsics on which their efficient implementation might rely, results in lower flexibility and less efficient OpenCL programs, and is ultimately detrimental to the quality of the standard itself.

While the arguments I will propose will be focused specifically on the parallel reduction and scans offered by OpenCL C since OpenCL 2.0, the fundamental idea applies in a much more general context: it is more important for a language or library to provide the building blocks on which to build certain high-level features than to expose the high-level features themselves (hiding the underlying building blocks).

For example, the same kind of argument would apply to a language or library that aimed at providing support for Interval Analysis (IA). A fundamental computational aspect which is required for proper IA support is directed rounding: just exposing directed rounding would be enough to allow efficient (custom) implementations of IA, and also allow other numerical feats (as discussed here); conversely, while it's possible to provide support for IA without exposing the underlying required directed rounding features, doing so results in an inefficient, inflexible standard1.

The case against high-level reduction operations

To clarify, I'm not actually against the presence of high-level reduction and scan functions in OpenCL. They are definitely a very practical and useful set of functions, with the potential of very efficient implementations by vendors —in fact, more efficient than any programmer may achieve, not just because they can be tuned (by the vendor) for the specific hardware, but also because they can in fact be implemented making use of hardware capabilities that are not exposed in the standard nor via extensions.

The problem is that the set of available functions is very limited (and must be so), and as soon as a developer needs a reduction or scan function that is even slightly different from the ones offered by the language, it suddenly becomes impossible for such a reduction or scan to be implemented with the same efficiency of the built-in ones, simply because the underlying hardware capabilities (necessary for the optimal implementation) are not available to the developer.

Thrust and Kahan summation

Interesting enough, I've hit a similar issue while working on a different code base, which makes use of CUDA rather than OpenCL, and for which we rely on the thrust library for the most common reduction operations.

The thrust library is a C++ template library that provides efficient CUDA implementations of a variety of common parallel programming paradigms, and is flexible enough to allow such paradigms to make use of user-defined operators, allowing for example reductions and scans with operators other than summation, minimum and maximum. Despite this flexibility, however, even the thrust library cannot move (easily) beyond stateless reduction operators, so that, for example, one cannot trivially implement a parallel reduction with Kahan summation using only the high-level features offered by thrust.

Of course, this is not a problem per se, since ultimately thrust just compiles to plain CUDA code, and it is possible to write such code by hand, thus achieving a Kahan summation parallel reduction, as efficiently as the developer's prowess allows. (And since CUDA exposes most if not all hardware intrinsics, such a hand-made implementation can in fact be as efficient as possible on any given CUDA-capable hardware.)

Local parallel reductions in OpenCL 2.0

The situation in OpenCL is sadly much worse, and not so much due to the lack of a high-level library such as thrust (to which end one may consider the Bolt library instead), but because the language itself is missing the fundamental building blocks to produce the most efficient reductions: and while it does offer built-ins for the most common operations, anything beyond that must be implemented by hand, and cannot be implemented as efficiently as the hardware allows.

One could be led to think that (at least for something like my specific use case) it would be “sufficient” to provide more built-ins for a wider range of reduction operations, but such an approach would be completely missing the point: there will always be variations of reductions that are not provided by the language, and such a variation will always be inefficient.

Implementor laziness

There is also another point to consider, and it has to do with the sad state of the OpenCL ecosystem. Developers that want to use OpenCL for their software, be it in academia, gaming, medicine or any industry, must face the reality of the quality of existing OpenCL implementations. And while for custom solutions one can focus on a specific vendor, and in fact choose the one with the best implementations, software vendors have to deal with the idiosyncrasies of all OpenCL implementations, and the best they can expect is for their customers to be up to date with the latest drivers.

What this implies in this context is that developers cannot, in fact, rely on high-level functions being implemented efficiently, nor can they sit idle waiting for the vendors to provide more efficient implementations: more often than not, developers will find themselves working around the limitations of this and that implementation, rewriting code that should be reduced to one liners in order to provide custom, faster implementations.

This is already the case for some functions such as the asynchronous work-group memory copies (from/to global/local memory), which are dramatically inefficient on some vendor implementations, so that developers are more likely to write their own loading functions instead, which generally end up being just as efficient as the built-ins on the platforms where such built-ins are properly implemented, and much faster on the lazy platforms.

Therefore, can we actually expect vendors to really implement the work-group reduction and scan operations as efficiently as their hardware allows? I doubt it. However, while for the memory copies an efficient workaround was offered by simple loads, such a workaround is impossible in OpenCL 2.0, since the building blocks of the efficient work-group reductions are missing.

Warp shuffles: the work-group reduction building block

Before version 2.0 of the standard, OpenCL offered only one way to allow work-items within a work-group to exchange informations: local memory. The feature reflected the capability of GPUs when the standard was first proposed, and could be trivially emulated on other hardware by making use of global memory (generally resulting in a performance hit).

With version 2.0, OpenCL exposes a new set of functions that allow data exchange between work-items in a work-group, which doesn't (necessarily) depend on local memory: such functions are the work-group vote functions, and the work-group reduction and scan functions. These functions can be implemented via local memory, but most modern hardware can implement them using lower-level intrinsics that do not depend on local memory at all, or only depend on local memory in smaller amounts than would be needed by a hand-coded implementation.

On GPUs, work-groups are executed in what are called warps or wave-fronts, and most modern GPUs can in fact exchange data between work-items in the same warp using specific shuffle intrinsics (which have nothing to do with the OpenCL C shuffle function): these intrinsics allow work-items to access the private registers of other work-items in the same warp. While warps in the same work-group still have to communicate using local memory, a simple reduction algorithm can thus be implemented using warp shuffle instructions and only requiring one word of local memory per warp, rather than one per work-item, which can lead to better hardware utilization (e.g. by allowing more work-groups per compute unit thanks to the reduced use of local memory).

Warp shuffle instructions are available on NVIDIA GPUs with compute capability 3.0 or higher, as well as on AMD GPUs since Graphics Core Next. Additionally, vectorizing CPU platforms such as Intel's can trivially implement them in the form of vector component swizzling. Finally, all other hardware can still emulate them via local memory (which in turn might be inefficiently emulated via global memory, but still): and as inefficient as such an emulation might be, it still would scarcely be worse than hand-coded use of local memory (which would still be a fall-back option to available to developers).

In practice, this means that all OpenCL hardware can implement work-group shuffle instructions (some more efficiently than others), and parallel reductions of any kind could be implemented through work-group shuffles, achieving much better performance than standard local-memory reductions on hardware supporting work-group shuffles in hardware, while not being less efficient than local-memory reductions where shuffles would be emulated.


Finally, it should be obvious now that the choice of exposing work-group reduction and scan functions, but not work-group shuffle functions in OpenCL 2.0 results in a crippled standard:

  • it does not represent the actual capabilities of current massively parallel computational hardware, let alone the hardware we may expect in the future;
  • it effectively prevents efficient implementation of reductions and scans beyond the elementary ones (simple summation, minimum and maximum);
  • to top it all, we can scarcely expect such high-level functions to be implemented efficiently, making them effectively useless.

The obvious solution would be to provide work-group shuffle instructions at the language level. This could in fact be a core feature, since it can be supported on all hardware, just like local memory, and the device could be queries to determine if the instructions are supported in hardware or emulated (pretty much like devices can be queried to determine if local memory is physical or emulated).

Optionally, it would be nice to have some introspection to allow the developer to programmatically find the warp size (i.e. work-item concurrency granularity) used for the kernel2, and potentially improve on the use of the instructions by limiting the strides used in the shuffles.

  1. since IA intrinsically depends on directed rounding, even if support for IA was provided without explicitly exposing directed rounding, it would in fact still be possible to emulate directed rounding of scalar operations by operating on interval types and then discarding the unneeded parts of the computation; of course, this would be dramatically inefficient. ↩

  2. in practice, the existing CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE kernel property that can be programmatically queried corresponds already to the warp/wave-front size on GPUs, so there might be no need for another property if it could be guaranteed that this is the work-item dispatch granularity. ↩

Colorize your man

The terminal, as powerful as it might be, has a not undeserved fame of being boring. Boring white (or some other, fixed, color) on boring black (or some other, fixed, color) for everything. Yet displays nowadays are capable of showing millions of colors, and have been able to display at least four since the eighties at least. There's a resurgence of “colorization” options for the terminal, from the shell prompt to the multiplexers (screen, tmux), from the output of commands such as ls to the syntax highlighting options of editors and pagers. A lot of modern programs will even try to use colors in their output right from the start, making it easier to tell apart semantically different parts of it.

One of the last strongholds of the boring white-on-black (or conversely) terminal displays is man, the manual page reader. Man pages constitute the backbone of technical documentation in Unix-like systems, and range from the description of the syntax and behaviour of command-line programs to the details of system calls and programming interfaces of libraries, passing through a description of the syntax of configuration files, and whatever else one might feel like documenting for ease of access.

The problem is, man pages are boring. They usually have all the same structures, with sections that follow a common convention both in the naming and in the sequence (NAME, SYNOPSIS, DESCRIPTION, SEE ALSO, etc.), and they are all boringly black and white, with a sprinkle of bold and italics/underline.

There's to be said that bold and underline don't really “cut it” in console: undoubtedly, things would stand out more if colors were used to highlight relevant parts of the manual pages (headings, examples, code, etc) rather than simply bold and underline or italics.

Thanks to the power and flexibility of the pagers used to actually visualize those man pages, a number of people have come out with simple tricks that colorize pages by just reinterpreting bold/italics/underline commands as colors. In fact, there's a pager (most) that does this by default. Of course, most is otherwise inferior in many ways to the more common less pager, so there are solutions to do the same trick (color replacement) with less. Both solutions, as well as a number of other tricks based on the same principle, are pretty well documented in a number of places, and can be found summarized on the Arch wiki page on man pages.

I must say I'm not too big a fan of this approach: while it has the huge advantage of being very generic (in fact, maybe a little too generic), it has a hackish feeling, which had me look for a cleaner, lower level approach: making man itself (or rather the groff typesetter it uses) colorize its output.

Colorizing man pages with *roff

The approach I'm going to present will only work if man uses (a recent enough version of) groff that actually supports colors. Also, the approach is limited to specific markup. It can be extended, but doing so robustly is non-trivial.

We will essentially do three things:

  • tell groff to look for (additional) macros in specific directories;
  • override some typical man page markup to include colors;
  • tell groff (or rather grotty, the terminal post-processor of groff) to enable support for SGR escapes.

Extending the groff search path

By default, groff will look for macro packages in a lot of places, among which the user's home directory. Since cluttering home isn't nice, we will create a ~/groff directory and put out overrides in there, but we also need to tell groff to look there, which is done by setting the GROFF_TMAC_PATH environment variables. So I have in my ~/.profile the following lines:


(Remember to source ~/.profile if you want to test the benefits of your override in your live sessions.)

Overriding the man page markup

The groff macro package used to typeset man pages includes an arbitrary man.local file that can be used to override definitions. For example, in Debian this is used to do some character substitutions based on whether UTF-8 is enabled or not, and it's found under /etc/groff. We will write our own man.local, and place it under ~/groff instead, to override the markup we want to colorize.

Sadly, most of the markup in man pages is presentational rather than semantic: things are explicitly typeset in bold/italic/regular, rather than as parameter/option/code/whatever. There are a few exceptions, most notably the .SH command to typeset section headers. So in this example we will only override .SH to set section headers to green, leaving the rest of the man pages as-is.

Instead of re-defining .SH from scratch, we will simply expand it by adding stuff around the original definition. This can be achieved with the following lines (put them in your ~/groff/man.local):

.rn SH SHorg
.de SH
. gcolor green
. SHorg \\$*
. gcolor

The code above renames .SH to .SHorg, and then defines a new .SH command that:

  1. sets the color to green;
  2. calls .SHorg (i.e. the original .SH), passing all the arguments over to it;
  3. resets the color to whatever it was before.

The exact same approach can be used to colorize the second-level section header macro, .SS; just repeat the same code with a general replacement of H to S, and tune the color to your liking.

Another semantic markup that is rather easy to override, even though it's only rarely used in actual man pages (possibly because it's a GNU extension), is the .UR/.UE pair of commands to typeset URLs, and its counterpart .MT/.ME pair of commands to typeset email addresses. Both work by storing the email address as the variable \m1, so all we need is to override its definition before it's actually used, in the second element of the pair; for example, if we want to typeset both URLs and email addresses in cyan, we would use:

.rn ME MEorg
.de ME
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. MEorg \\$*

.rn UE UEorg
.de UE
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. UEorg \\$*

(Keep in mind that I'm not a groff expert, so there might be better ways to achieve these overrides.)

Enabling SGR escapes

The more recent versions of grotty (the groff post-processor for terminal output) uses ANSI (SGR) escape codes for formatting, supporting both colors and emboldening, italicizing and underlining. On some distributions (Debian, for example), this is disabled by default, and must be enabled with some not-well-document method (e.g. exporting specific environment variables).

Since we already have various overrides in our ~/groff/man.local, we can restore the default behavior (enabling SGR escapes by default, unless the environment variable GROFF_NO_SGR is set) with the lines:

.if '\V[GROFF_NO_SGR]'' \
.   output x X tty: sgr 1

Of course, you should also make sure the pager used by man supports SGR escape sequences, for example by making your pager be less (which is likely to be the default already, if available) and telling it to interpret SGR sequences (e.g. by setting the environment variable LESS to include the -R option).

Limitations and future work

That's it. Now section headers, emails and URLs will come out typeset in color, provided man pages are written using semantic markup.

It is also possible to override the non-semantic markup that is used everywhere else, such as all the macros that combine or alternate B, I and R to mark options, parameters, arguments and types. This would definitely make pages more colored, but whether or not they will actually come out decently is all to be seen.

A much harder thing to achieve is the override of commands that explicitly set the font (e.g. *roff escape sequences such as \fB, often used inline in the code). But at this point the question becomes: is it worth the effort?

Wouldn't it be better to start a work on the cleanup and extension of the man macro package for groff to include (and use!) more semantic markup, with built-in colorization support?

Bonus track: italics

If your terminal emulator truly supports italics (honestly, a lot of modern terminals do, except possibly the non-graphical consoles), you can configure grotty to output instructions for italics instead of the usual behavior of replacing italics with underline. This is achieved by passing the -i option to grotty. Since grotty is rarely (if ever) called directly, one would usually pass -P-i option to groff.

This can be achieved in man by editing your ~/.manpath file and adding the following two lines:

DEFINE  troff   groff -mandoc -P-i
DEFINE  nroff   groff -mandoc -P-i

And voilà, italicized rather than underlined italics.

R.I.P. Opera

Opera is dead. I decided to give it some time, see how things developed (my first article on the topic was from over two years ago, and the more recent one about the switch to Blink was from February last year), but it's quite obvious that the Opera browser some of us knew and loved is dead for good.

For me, the finishing blow was a comment from an Opera employee, in response to my complaints about the regressions in standard support when Opera switched from Presto to Blink as rendering engine:

You may have lost a handful of things, but on the other hand you have gained a lot of other things that are not in Presto. The things you have gained are likely more useful in more situations as well.

This was shotly after followed by:

Whether you want to use Opera or not is entirely up to you. I merely pointed out that for the lost standards support, you are gaining a lot of other things (and those things are likely to be more useful in most cases).

Other things? What other things?

But it gets even better. I'm obviously not the only one complainig about the direction the new Opera has taken. One Patata Johnson comments:

There used to be a time when Opera fought for open standards and against Microsoft's monopol with it's IE. Am I the only one who us concerned about their new path? Today Google / Chrome became the new IE, using own standards and not carrying about open web standards that much.

The reply?

Opera is in a much better position to promote open standards with Blink than with Presto. It's kind of hard to influence the world when the engine is basically being ignored.

Really? How does being another skin over Blink help promote open standards? It helps promote Blink experimental features, lack of standard compliance, and buggy implementation of the standards it does support. That does as much to promote open standards as the Trident skins did during the 90s browser wars.

As small as Opera's market share was before the switch, its rendering engine was independent and precisely because of that it could be used to push the others into actually fixing their bugs and supporting the standards. It might have been ignored by the run-of-the-mill web developers, but it was actually useful in promoting standard compliance by being a benchmark against which other rendering engines were compared. The first thing that gets asked when someone reports a rendering issue is: how does it behave in the other rendering engines? If there are no other rendering engines, bugs in the dominant one become the de facto standard, against the open standard of the specification.

With the switch to Blink, Opera has even lost that role. As minor a voice as it might have been, it has now gone completely silent.

And let's be serious: the rendering engine it uses might not be ignored now (it's not their own, anyway), but I doubt that Opera has actually gained anything in terms of user base, and thus weight. If anything, I'm seeing quite a few former supporters switching away. Honestly, I suspect Opera's survival is much more in danger now than it was before the switch.

The truth is, the new Opera stands for nothing that the old Opera stood for: the old Opera stood for open standards, compliance, and a feature-rich, highly-customizable Internet suite. The new one is anything but that.

At the very least, for people that miss the other qualities that made Opera worthwile (among which the complete, highly customizable user interface, and the quite complete Internet suite capabilities, including mail, news, RSS, IRC and BitTorrent suport) there's now the open-source Otter browser coming along. It's still WebKit-based), so it won't really help break the development of a web monoculture, but it will at least offer a more reliable fallback to those loving the old Opera and looking for an alterntive to switch to from the new one.

For my part, I will keep using the latest available Presto version of Opera for as long as possible. In the mean time, Firefox has shown to have most complete support for current open standards, so it's likely to become my next browser of choice. I will miss Opera's UI, but maybe Otter will also support Gecko as rendering engine, and I might be able to get the best of both world.

We'll see.

Amethyst: essential statistics in Ruby

Get the code for
Amethyst, a Ruby library/script for essential statistics:


Amethyst is a small Ruby library/script to extract some essential statistics (mean, median, mode, midpoint and range, quartiles) from series of (numerical) data.

While it can be used as a library from other Ruby programs, its possibly most interesting use is as a command line filter: it can read the data series to be analyzed from its standard input (one datum per line), and it produces the relevant statistics on its standard output. Typical usage would be something like:

$ produce_some_data | amethyst

For example, statistics on the number of lines of the source files for one of my ongoing creative works at the time of writing:

$ wc -l oppure/ganka/ganka[0-9]* | head -n -1 | amethyst
# count: 43
# min: 48
# max: 274
# mid: 161
# range: 226

# mean: 102.3953488372093
# stddev: 42.266122304343874

# mode(s): 48 51 59 75 79 86 93 102 104

# median: 97
# quartiles: 79 97 110
# IQR: 31

When acting as a filter, Amethyst will check if its standard output has been redirected/piped to another program, in which case, by default, it will also produce commands that can be fed to gnuplot to produce a visual representation of the distribution of the dataset, including a histogram and a box plot:

$ produce_some_data | amethyst | gnuplot -p

Command line options such as --[no-]histogram and --[no-]boxplot can be used to override the default choices on what to plot (if anything), and options such as --dumb can be used to let gnuplot output a textual approximation of the plot(s) on the terminal itself.

Integer math and computers

One would assume that doing integer math with computers would be easy. After all, integer math is, in some sense, the “simplest” form of math: as Kronecker said:

Die ganzen Zahlen hat der liebe Gott gemacht, alles andere is Menschenwerk

The dear God has made integers, everything else is the work of man

While in practice this is (almost) always the case, introducing the extent (and particularly the limitations) to which integer math is easy (or, in fact, ‘doable’) is the first necessary step to understanding, later in this series of articles, some of the limitations of fixed-point math.

We start from the basics: since we are assuming a binary computer, we know that n bits can represent 2^n distinct values. So an 8-bit byte can represent 256 distinct values, a 16-bit word can represent 65536 distinct values, a 32-bit word can represent 4,294,967,296, and a 64-bit word a whooping 18,446,744,073,709,551,616, over 18 (short) trillion. Of course the question now is: which ones?

Representation of unsigned integers

Let's consider a standard 8-bit byte. The most obvious and natural interpretation of a byte (i.e. 8 consecutive bits) is to interpret it as a (non-negative, or unsigned) integer, just like we would interpret a sequence of consecutive (decimal) digits. So binary 00000000 would be 0, binary 00000001 would be (decimal) 1, binary 00000010 would be (decimal) 2 binary 00000011 would be (decimal) 3 and so on, up to (binary) 11111111 which would be (decimal) 255. From 0 to 255 inclusive, that's exactly the 256 values that can be represented by a byte (read as an unsigned integer).

Unsigned integers can be trivially promoted to wider words (e.g. from 8-bit byte to 16-bit word, preserving the numerical value) by padding with zeroes.

This is so simple that it's practically boring. Why are we even going through this? Because things are not that simple once you move beyond unsigned integers. But before we do that, I would like to point out that things aren't that simple even if we're just sticking to non-negative integers. In terms of representation of the numbers, we're pretty cozy: n bits can represent all non-negative integers from 0 to 2^n-1, but what happens when you start doing actual math on them?

Modulo and saturation

Let's stick to just addition and multiplication at first, which are the simplest and best defined operations on integers. Of course, the trouble is that if you are adding or multiplying two numbers between 0 and 255, the result might be bigger than 255. For example, you might need to do 100 + 200, or 128*2, or even just 255+1, and the result is not representable in an 8-bit byte. In general, if you are operating on n-bits numbers, the result might not be representable in n bits.

So what does the computer do when this kind of overflow happens? Most programmers will now chime in and say: well duh, it wraps! If you're doing 255+1, you will just get 0 as a result. If you're doing 128*2, you'll just get 0. If you're doing 100+200 you'll just get 44.

While this answer is not wrong, it's not right either.

Yes, it's true that the most common central processing units we're used to nowadays use modular arithmetic, so that operations that would overflow n-bits words are simply computed modulo 2^n (which is easy to implement, since it just means discarding higher bits, optionally using some specific flag to denote that a carry got lost along the way).

However, this is not the only possibility. For example, specialized DSP (Digital Signal Processing) hardware normally operates with saturation arithmetic: overflowing values are clamped to the maximum representable value. 255+1 gives 255. 128*2 gives 255. 100+200 gives 255.

Programmers used to the standard modular arithmetic can find saturation arithmetic ‘odd’ or ‘irrational’ or ‘misbehaving’. In particular, in saturation arithmetic (algebraic) addition is not associative, and multiplication does not distribute over (algebraic) addition.

Sticking to our 8-bit case, for example, with saturation arithmetic (100 + 200) - 100 results in 255 - 100 = 155, while 100 + (200 - 100) results in 100 + 100 = 200, which is the correct result. Similarly, still with saturation arithmetic, (200*2) - (100*2) results in 255 - 200 = 55, while (200 - 100)*2 results in 100*2 = 200. By contrast, with modular arithmetic, both expressions in each case give the correct result.

So, when the final result is representable, modular arithmetic gives the correct result in the case of a static sequence of operations. However, when the final result is not representable, saturation arithmetic returns values that are closer to the correct one than modular arithmetic: 300 is clamped to 255, in contrast to the severely underestimated 44.

Being as close as possible to the correct results is an extremely important property not just for the final result, but also for intermediate results, particularly in the cases where the sequence of operations is not static, but depends on the magnitude of the values (for example, software implementations of low- or high-pass filters).

In these applications (of which DSP, be it audio, video or image processing, is probably the most important one) both modular and saturation arithmetic might give the wrong result, but the modular result will usually be significantly worse than that obtained by saturation. For example, modular arithmetic might miscompute a frequency of 300Hz as 44Hz instead of 255Hz, and with a threshold of 100Hz this would lead to attenuation of a signal that should have passed unchanged, or conversely. Amplifying an audio signal beyond the representable values could result in silence with modular arithmetic, but it will just produce the loudest possible sound with saturation.

We mentioned that promotion of unsigned values to wider data types is trivial. What about demotion? For example, knowing that your original values are stored as 8-bit bytes and that the final result has to be again stored as an 8-bit byte, a programmer might consider operating with 16-bit (or wider) words to (try and) prevent overflow during computations. However, when the final result has to be demoted again to an 8-bit byte, a choice has to be made, again: should we just discard the higher bits (which is what modular arithmetic does), or return the highest representable value when any higher bits are set (which is what saturation arithmetic does)? Again, this is a choice for which there is no “correct” answer, but only answers that depend on the application.

To conclude, the behavior that programmers used to standard modular arithmetic might find ‘wrong’ is actually preferable in some applications (which is why it is has been supported in hardware in the multimedia and vector extensions (MMX and onwards) of the x86 architecture).

Thou shalt not overflow

Of course, the real problem in the examples presented in the previous section is that the data type used (e.g. 8-bit unsigned integers) was unable to represent intermediate or final results.

One of the most important things programmers should consider, maybe the most important, when discussing doing math on the computer, is precisely choosing the correct data type.

For integers, this means choosing a data type that can represent correctly not only the starting values and the final results, but also the intermediate values. If your data fits in 8 bits, then you want to use at least 16 bits. If it fits in 16 bits (but not 8), then you want to use at least 32, and so on.

Having a good understanding of the possible behaviors in case of overflow is extremely important to write robust code, but the main point is that you should not overflow.

Relative numbers: welcome to hell

In case you are still of the opinion that integer math is easy, don't worry. We still haven't gotten into the best part, which is how to deal with relative numbers, or, as the layman would call them, signed integers.

As we mentioned above, the ‘natural’ interpretation of n bits is to read them as natural, non-negative, unsigned integers, ranging from 0 to 2^n-1. However, let's be honest here, non-negative integers are pretty limiting. We would at least like to have the possibility to also specify negative numbers. And here the fun starts.

Although there is no official universal standard for the representation of relative numbers (signed integers) on computers, there is undoubtedly a dominating convention, which is the one programmers are nowadays used to: two's complement. However, this is just one of many (no less than four) possible representations:

  • sign bit and mantissa;
  • ones' complement;
  • two's complement;
  • offset binary aka biased representation.

Symmetry, zeroes and self-negatives

One of the issues with the representation of signed integers in binary computers is that binary words can always represent an even number of values, but a symmetrical amount of positive and negative integers, plus the value 0, is odd. Hence, when choosing the representation, one has to choose between either:

  • having one (usually negative) non-zero number with no representable opposite, or
  • having two representations of the value zero (essentially, positive and negative zero).

Of the four signed number representations enumerated above, the sign bit and ones' complement representations have a signed zero, but each non-zero number has a representable opposite, while two's complement and bias only have one value for zero, but have at least one non-zero number that has no representable opposite. (Offset binary is actually very generic and can have significant asymmetries in the ranges of representable numbers.)

Having a negative zero

The biggest issue with having a negative zero is that it violates a commonly held assumption, which is that there is a bijective correspondence between representable numerical values and their representation, since both positive and negative 0 have the same numerical value (0) but have distinct bit patterns.

Where this presents the biggest issue is in the comparison of two words. When comparing words for equality, we are now posed a conundrum: should they be compared by their value, or should they be compared by their representation? If a = -0, would a satisfy a == 0? Would it satisfy a < 0? Would it satisfy both? The obvious answer would be that +0 and -0 should compare equal (and just that), but how do you tell them apart then? Is it even worth it being able to tell them apart?

And finally, is the symmetry worth the lost of a representable value? (2^n bit patterns, but two of them have the same value, so e.g. with 8-bit bytes we have 256 patterns to represent 255 values instead of the usual 256.)

Having non-symmetric opposites

On the other hand, if we want to keep the bijectivity between value and representation, we will lose the symmetry of negation. This means, in particular, that knowing that a number a satisfies a < 0 we cannot deduce that -a > 0, or conversely, depending on whether the value with no opposite is positive or negative.

Consider for example the case of the standard two's complement representation in the case of 8-bit bytes: the largest representable positive value is 127, while the largest (in magnitude) representable negative value is -128. When computing opposites, all values between -127 and 127 have their opposite (which is the one we would expect algebraically), but negating -128 gives (again) -128 which, while algebraically wrong, is at least consistent with modular arithmetic, where adding -128 and -128 actually gives 0.

A brief exposition of the representations

Let's now see the representations in some more detail.

Sign bit and mantissa representation

The conceptually simplest approach to represent signed integers, given a fixed number of digits, is to reserve one bit to indicate the sign, and leave the other n-1 bits to indicate the mantissa i.e magnitude i.e. absolute value of the number. By convention, the sign bit is usually taken to be the most significant bit, and (again by convention) it is taken as 0 to indicate a positive number and 1 to indicate a negative number.

With this representations, two opposite values have the same representation except for the most significant bit. So, for example, assuming our usual 8-bit byte, 1 would be represented as 00000001, while -1 would be represented as 10000001.

In this representation, the highest positive value that can be represented with n bits is 2^{n-1} - 1, and the lowest (largest in magnitude) negative value that can be represented is its opposite. For example, with an 8-bit byte the largest positive integer is 127, i.e. 01111111, and the largest (in magnitude) negative integer is its opposite -127, i.e. 11111111.

As mentioned, one of the undersides of this representation is that it has both positive and negative zero, respectively represented by the 00000000 and 10000000 bit patterns.

While the sign bit and mantissa representation is conceptually obvious, its hardware implementation is more cumbersome that it might seem at first hand, since operations need to explicitly take the operands' signs into account. Similarly, sign-extension (for example, promoting an 8-bit byte to a 16-bit word preserving the numerical value) needs to ‘clear up’ the sign bit in the smaller-size representation before replicating it as the sign bit of the larger-size representation.

Ones' complement representation

A more efficient approach is offered by ones' complement representation, where negation maps to ones' complement, i.e. bit-flipping: the opposite of any given number is obtained as the bitwise NOT operation of the representation of the original value. For example, with 8-bit bytes, the value 1 is as usual represented as 00000001, while -1 is represented as 11111110.

The range of representable numbers is the same as in the sign bit and mantissa representation, so that, for example, 8-bit bytes range from -127 (10000000) to 127 (01111111), and we have both positive zero (00000000) and negative zero (11111111).

(Algebraic) addition in modular arithmetic with this representation is trivial to implement in hardware, with the only caveat that carries and borrows ‘wrap around’.

As in the sign-bit case, it is possible to tell if a number is positive or negative by looking at the most-significant bit, and 0 indicates a positive number, while 1 indicates a negative number (whose absolute value can then be obtained by flipping all the bits). Sign-extending a value can be done by simply propagating the sign bit of the smaller-size representation to all the additional bits in the larger-size representation.

Two's complement

While ones' complement representation is practical and relatively easy to implement in hardware, it is not the simplest, and it's afflicted by the infamous ‘negative zero’ issue.

Because of this, two's complement representation, which is simpler to implement and has no negative zero, has gained much wider adoption. It also has the benefit of ‘integrating’ rather well with the equally common modular arithmetic.

In two's complement representation, the opposite of an n-bit value is obtained by subtracting it from 2^n, or, equivalently, from flipping the bits and then adding 1, discarding any carries beyond the n-th bit. Using our usual 8-bit bytes as example, 1 will as usual be 00000001, while -1 will be 11111111.

The largest positive representable number with n bits is still 2^{n-1}-1, but the largest (in magnitude) negative representable number is now -2^{n-1}, and it's represented by a high-bit set to 1 and all other bits set to 0. For example, with 8-bit bytes the largest positive number is 127, represented by 01111111, whose opposite -127 is represented by 10000001, while the largest (in magnitude) negative number is -128, represented by 10000000.

In two's complement representation, there is no negative zero and the only representation for 0 is given by all bits set to 0. However, as discussed earlier, this leads to a negative value whose opposite is the value itself, since the representation of largest (in magnitude) negative representable number is invariant by two's complement.

As in the other two representations, the most significant bit can be checked to see if a number is positive and negative. As in ones' complement case, sign-extension is done trivially by propagating the sign bit of the smaller-size value to all other bits of the larger-size value.

Offset binary

Offset binary (or biased representation) is quite different from the other representations, but it has some very useful properties that have led to its adoption in a number of schemes (most notably the IEEE-754 standard for floating-point representation, where it's used to encode the exponent, and some DSP systems).

Before getting into the technical details of offset binary, we look at a possible motivation for its inception. The attentive reader will have noticed that all the previously mentioned representations of signed integers have one interesting property in common: they violate the natural ordering of the representations.

Since the most significant bit is taken as the sign bit, and negative numbers have a most significant bit set to one, natural ordering (by bit patterns) puts them after the positive numbers, whose most significant bit is set to 0. Additionally, in the sign bit and mantissa representation, the ordering of negative numbers is reversed with respect to the natural ordering of their representation. This means that when comparing numbers it is important to know if they are signed or unsigned (and if signed, which representation) to get the ordering right. The biased representation is one way (and probably the most straightforward way) to circumvent this.

The basic idea in biased representation or offset binary is to ‘shift’ the numerical value of all representations by a given amount (the bias or offset), so that the smallest natural representation (all bits 0) actually evaluates to the smallest representable number, and the largest natural representation (all bits 1) evaluates to the largest representable number.

The bias is the value that is added to the (representable) value to obtain the representation, and subtracted from the representation to obtain the represented value. The minimum representable number is then the opposite of the bias. Of course, the range of representable numbers doesn't change: if your data type can only represent 256 values, you can only choose which 256 values, as long as they are consecutive integers.

The bias in an offset binary representation can be chosen arbitrarily, but there is a ‘natural’ choice for n-bit words, which is 2^{n-1}: halfway through the natural representation. For example, with 8-bit bytes (256 values) the natural choice for the bias is 128, leading to a representable range of integers from -128 to 127, which looks distinctly similar to the one that can be expressed in two's complement representation.

In fact, the 2^{n-1} bias leads to a representation which is equivalent to the two's complement representation, except for a flipped sign bit, solving the famous signed versus unsigned comparison issue mentioned at the beginning of this subsection.

As an example, consider the usual 8-bit bytes with a bias of 128: then, the numerical values 1, 0 and -1 would be represented by the ‘natural’ representation of the values 129, 128 and 127 respectively, i.e. 10000001, 10000000 and 01111111: flipping the most significant bits, we get 00000001, 00000000 and 11111111 which are the two's complement representation of 1, 0 and -1.

Of course, the ‘natural’ bias is not the only option: it is possible to have arbitrary offsets, which makes offset binary extremely useful in applications where the range of possible values is strongly asymmetrical around zero, or where it is far from zero. Of course, such arbitrary biases are rarely supported in hardware, so operation on offset binary usually requires software implementations of even the most common operations, with a consequent performance hit. Still, assuming the hardware uses modular arithmetic, offset binary is at least trivial to implement for the basic operations.

One situation in which offset binary doesn't play particularly well is that of sign-extension, which was trivial in ones' and two's complement represnetations. The biggest issue in the case of offset binary is, obviously, that the offsets in the smaller and larger data types are likely going to be different, although usually not arbitrarily different (biases are often related to the size of the data type).

At least in the case of the ‘natural’ bias (in both the smaller and larger data types), sign extension can be implemented straightforwardly by going through the two's complement equivalent representation: flip the most significant bit of the smaller data type, propagate it to all the remaining bits of the larger data type, and then flip the most significant bit of the larger data type. (In other words: convert to two's complement, sign extend that, convert back to offset binary with the ‘natural’ bias.)

What does a bit pattern mean?

We're now nearing the end of our discussion on integer math on the computers. Before getting into the messy details of the first common non-integer operation (division), I would like to ask the following question: what do you get if you do 10100101 + 01111111?

Divide and despair

To conclude our exposition of the joys of integer math on the computers, we now discuss the beauty of integer division and the related modulus operation.

Since division of the integer e by the integer o only gives an integer (mathematically) if e is a multiple of o, the concept of ‘integer division’ has arised in computer science as a way to obtain an integer d from e/o even when o does not divide e.

The simple case

Let's start by assuming that e is non-negative and o is (strictly) positive. In this case, integer division gives the largest integer d such that d*o ≤ e. In other words, the result of the division of e by o is truncated, or ‘approximated by defect’, however small the remainder might be: 3/5=0 and 5/3=1 with integer division, even though in the latter case we would likely have preferred a value of 2 (think of 2047/1024, for example).

The upside of this choice is that it's trivial to implement other forms of division (that round up, or to the nearest number, for example), by simply adding appropriate correcting factors to the dividend. For example, round-up division is achieved by adding the divisor diminished by a unit to the divident: integer divisoin (e + o - 1)/o will give you e/o, rounded up: (3+5-1)/5 = 7/5 = 1, and (5 + 3 - 1)/3 = 7/3 = 2.

Division by zero

What happens when o is zero? Mathematically, division by zero is not defined (although in some context where infinity is considered a valid value, it may give infinity as a result —as long as the dividend is non-zero). In hardware, anything can happen.

There's hardware that flags the error. There's hardware that produces bogus results without any chance of knowing that a division by zero happened. There's hardware that produces consistent results (always zero, or the maximum representable value), flagging or not flagging the situation.

‘Luckily’, most programming languages always treat a division by zero as an exception, which by default causes a program termination. Of course, this means that to write robust code it's necessary to sprinkle the code with conditionals to check that divisions will successfully complete.

Negative numbers

If the undefined division by zero may not be considered a big issue per se, the situation is much more interesting when either of the operands of the division is a negative number.

First of all, one would be led to think that at least the sign of the result would be well defined: negative if the operands have opposite sign, positive otherwise. But this is not the case for the widespread two's complement representation with modular arithmetic, where the division of two negative numbers can give a negative number: of course, we're talking about the corner case of the largest (in magnitude) negative number, which when divided by -1 returns itself, since its opposite is not representable.

But even when the sign is correct, the result of integer division is not uniquely determined: some implementations round down, so that -7/5 = -2, while others round towards zero, so that -7/5 = -1: both the choices are consistent with the positive integer division, but the results are obviously different, which can introduce subtle but annoying bugs when porting code across different languages or hardware.


The modulo operation is perfectly well defined for positive integers, as the reminder of (integer) division: the quotient d and the reminder r of (integer) division e/o are (non-negative) integers such that e = o*d + r and r < o.

Does the same hold true when either e or o are negative? It depends on the convention adopted by the language and/or hardware. While for negative integer division there are ‘only’ two standards, for the modulo operation there are three:

  • a result with the sign of the dividend;
  • a result with the sign of the divisor;
  • a result that is always non-negative.

In the first two cases, what it means is that, for example, -3 % 5 will have the opposite sign of 3 % -5; hence, if one would satisfy the quotient/reminder equation (which depends on whether integer division rounds down or towards zero), the other obviously won't. In the third case, the equation would only be satisfied if the division rounds down, but not if the division rounds towards zero.

This could lead someone to think that the best choice would be a rounding-down division with an always non-negative modulo. Too bad that rounding-down division suffers from the problem that -(e/o) ≠ (-e)/o.


Integer math on a computer is simple only as far as you never think about dealing with corner cases, which you should if you want to write robust, reliable code. With integer math, this is the minimum of what you should be aware of:

Rounding modes in OpenCL

Introduction (history lost)

OpenCL 1.0 supported an OPENCL SELECT_ROUNDING_MODE pragma in device code, which allowed selection of the rounding mode to be used in a section of a kernel. The pragma was only available after enabling the cl_khr_select_fprounding_mode extension. Support for this extension and the relative pragma(s) has been removed from subsequent version of the standard, with the result that there is no way at all in the current OpenCL standard to have specific parts of a kernel use rounding modes different from the default, except in the explicit type conversion functions with the relevant _rt* suffix.

A consequence of this is that it is currently completely impossible to implement robust numerical code in OpenCL.

In what follows I will explore some typical use cases where directed rounding is a powerful, sometimes essential tool for numerical analysis and scientific computing. This will be followed by a short survey of existing hardware and software support for directed rounding. The article ends with a discussion about what must, and what should, be included in OpenCL to ensure it can be used as a robust scientific programming language.

Why directed rounding is important

Rationale #1: assessing numerical trustworthiness of code

In his paper How Futile are Mindless Assessments of Roundoff in Floating-Point Computation, professor William Kahan (who helped design the IEEE-754 floating-point standard) explains that, given multiple formulas that would compute the same quantity, the fastest way to determine which formulas are numerically trustworthy is to:

Rerun each formula separately on its same input but with different directed roundings; the first one to exhibit hypersensitivity to roundoff is the first to suspect.

Further along in the same paper, Kahan adds (emphasis mine):

The goal of error-analysis is not to find errors but to fix them. They have to be found first. The embarrassing longevity, over three decades, of inaccurate and/ or ugly programs to compute a function so widely used as ∠(X, Y) says something bleak about the difficulty of floating-point error-analysis for experts and nonexperts: Without adequate aids like redirected roundings, diagnosis and cure are becoming practically impossible. Our failure to find errors long suspected or known to exist is too demoralizing. We may just give up.

Essential tools for the error-analysis of scientific computing code cannot be implemented in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the impossibility of specifying the rounding direction.

Rationale #2: enforcing numerical correctness

Directed rounding is an important tool to ensure that arguments to functions with limited domain are computed in such a way that the conditions are respected numerically when they would be analytically. To clarify, in this section I'm talking about correctly rounding the argument of a function, not its result.

When the argument to such a function is computed through an expression (particularly if such an expression is ill-conditioned) whose result is close to one of the limits of the domain, the lack of correct rounding can cause the argument to be evaluated just outside of the domain instead of just inside (which would be the analytically correct answer). This would cause the result of the function to be Not-a-Number instead of the correct(ly rounded) answer.

Common functions for which the requirements might fail to be satisfied numerically include:


when the argument would be a small, non-negative number; to write numerically robust code one would want the argument to sqrt be computed such that the final result is towards plus infinity;

inverse trigonometric functions (asin, acos, etc)

when the argument would be close to, but not greater than 1, or close to, but not less than -1; again, to write numerically robust code one would want the argument to be computed such that the final result is rounded towards zero.

A discussion on the importance of correct rounding can again be found in Kahan's works, see e.g. Why we needed a floating-point standard.

Robust coding of analytically correct formulas is impossible to achieve in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the lack of support for directed rounding.

Rationale #3: Interval Analysis

A typical example of a numerical method for which support for directed rounding rounding modes in different parts of the computation is needed is Interval Analysis (IA). Similar arguments hold for other forms of self-verified computing as well.

Briefly, in IA every (scalar) quantity q is represented by an interval whose extrema are (representable) real numbers l, u such that ‘the true value’ of q is guaranteed to satisfy l ≤ q ≤ u.

Operations on two intervals A = [al, au] and B = [bl, bu] must be conducted in such a way that the resulting interval can preserve this guarantee, and this in turn means that the lower extremum must be computed in rtn (round towards negative infinity) mode, while the upper extremum must be computed in rtp (round towards positive infinity) mode.

For example, assuming add_rtn and add_rtp represent additions that rounds in the suffix direction, we have that C = A + B could be computed as:

cl = add_rtn(al, bl);
cu = add_rtp(au, bu);

In OpenCL 1.0, add_rtn and add_rtp could be defined as:

gentype add_rtn(gentype a, gentype b) {
 return a + b;
gentype add_rtp(gentype a, gentype b) {
 return a + b;
/* restore default */

The same functions could be implemented in C99, in FORTRAN, in MATLAB or even in CUDA (see below). In OpenCL 1.1 and later, this is impossible to achieve, even on hardware that supports rounding mode selection.

Applicative examples

From the rationales presented so far, one could deduce that directed rounding is essentially associated with the stability and robustness of numerical code. There are however other cases where directed rounding can be used, which are not explicitly associated with things such as roundoff errors and error bound estimation.

Rounding down for the neighbors list construction in particle methods

Consider for example an industrial application of mesh-less Lagrangian such as Smoothed Particle Hydrodynamics (SPH).

In these numerical methods, the simulation domain is described by means of ‘particles’ free to move with respect to each other. The motion of these particles is typically determined by the interaction between the particle and its neighbors within a given influence sphere.

Checking for proximity between two particles is done by computing the length of the relative distance vector (differences of positions), and the same distance is often used in the actual computation of the influence between particles. As usual, to avoid bias, both the relative distance vector and its length should be computed with the default round-to-nearest-even rounding mode for normal operations.

To avoid searching for neighbors in the whole domain for every operation, implementations often keep a ‘neighbors list’ of each particle, constructed by checking the proximity of candidate particles once, and storing the indices of the particles that fall within the prescribed influence radius.

Due to the mesh-less nature of the method, neighborhoods may change at every time-step, requiring a rebuild of the neighbors list. To improve performance, this can be avoided by rebuilding the neighbors list at a lower frequency (e.g. every 10 time-steps), assuming (only in this phase) a larger influence radius, taking into account the maximum length that might be traveled by a particle in the given number of time-steps.

When such a strategy is adopted, neighbors need to be re-checked for actual proximity during normal operations, so that, for maximum efficiency, a delicate balance must be found between the reduced frequency and the increased number of potential neighbors caused by the enlarged influence radius.

One way to improve efficiency in this sense is to round towards zero the computation of the relative distance vector and its length during neighbors list construction: this maximizes the impact of the enlarged influence radius by including potential neighbors which are within one or two ULPs. This allows the use of very tight bounds on how much to enlarge the influence radius, without loss of correctness in the simulations.

Directed rounding support

Hardware support for directed rounding

x86-compatible CPUs have had support for setting the rounding mode by setting the appropriate flags in the control registers (either the x87 control word for FPU, or the MXCSR control register for SSE). Similarly, on ARM CPUs with support for the NEON or VFP instruction set, the rounding mode can be set with appropriate flags in the FPCSR

AMD GPUs also have support for rounding modes selection, with the granularity of an ALU clause. As documented in the corresponding reference manuals, the TeraScale 2 and TeraScale 3 architectures support setting the general rounding mode for ALU clauses via the SET_MODE instruction; Graphics Core Next (GCN) architectures can control the rounding mode by setting the appropriate bits in the MODE register via the S_SETREG instruction.

Additionally, the following hardware is capable of directed rounding at the instruction level:


as documented in the CUDA C Programming Guide, Appendix D.2 (Intrinsic Functions), some intrinsic functions can be suffixed with one of _rn,_rz,_ru,_rd to explicitly set the rounding mode of the function;

CPUs with support for the AVX-512 instruction set

the EVEX prefix introduced with AVX-512 supports the rounding mode to be set explicitly for any given instruction, overriding the MXCSR control register, as documented in the Intel® Architecture Instruction Set Extensions Programming Reference, section 4.6.2: “Static Rounding Support in EVEX”.

Software support for directed rounding

At the software level, support for the rounding mode at the processor level can be accessed in C99 and C++11 by enabling the STDC FENV_ACCESS pragma and using fesetenv() (and its counterpart fegetenv()).

In MATLAB, the rounding mode can be selected by the system_dependent('setround', ·) command.

Some FORTRAN implementations also offer functions to get and set the current rounding mode (e.g. IBM's XL FORTRAN offers fpgets and fpsets).

CUDA C exposes the intrinsic functions of CUDA-enabled GPUs that support explicit rounding modes. So, for example, __add_ru(a, b) (resp. __add_rd(a, b)) can be used in CUDA C to obtain the sum of a and b rounded up (resp. down) without having to change the rounding mode of the whole GPU.

Even the GNU implementation of the text-processing language Awk has a method to set the rounding mode in floating-point operations, via the ROUNDMODE variable.

All in all, OpenCL (since 1.1 on) seems to be the only language/API to not support directed rounding.

What can be done for OpenCL

In its present state, OpenCL 1.1 to 2.0 are lagging behind C99, C++11, FORTRAN, MATLAB and CUDA (at the very least) by lacking support for directed rounding. This effectively prevents robust numerical code to be implemented and analyzed in OpenCL.

While I can understand that core support for directed rounding in OpenCL is a bit of a stretch, considering the wide range of hardware that support the specification, I believe that the standard should provide an official extension to (re)introduce support for it. This could be done by re-instating the cl_khr_select_fprounding_mode extension, or through a different extension with better semantics (for example, modelled around the C99/C++11 STDC FENV_ACCESS pragma).

This is the minimum requirement to bring OpenCL C on par with C and C++ as a language for scientific computing.

Ideally (potentially through a different extension), it would be nice to also have explicit support for instruction-level rounding mode selection independently from the current rounding mode, with intrinsics similar to the ones that OpenCL defines already for the conversion functions. On supporting hardware, this would make it possible to implement even more efficient, yet still robust numerical code needing different rounding modes for separate subexpression.


When it comes to the OpenCL programming model, it's important to specify the scope of application of state changes, of which the rounding mode is one. Given the use cases discussed above, we could say that the minimum requirement would be for OpenCL to support changing the rounding mode during kernel execution, and for the whole launch grid to a value known at (kernel) compile time.

So, it should be possible (when the appropriate extension is supported and enabled) to change rounding mode half-way through a kernel. The new:

kernel some_kern(...) {
    /* kernels start in some default rounding mode,
     * e.g. round-to-nearest-even. We do some calculations
     * in this default mode:
    do_something ;

    /* now we change to some other mode. of course this
     * is just pseudo-syntax:

    /* and now we do more calculations, this time
     * with the new rounding mode enabled:

The minimum supported granularity would thus be the whole launch grid, as long as the rounding mode can be changed dynamically during kernel execution, to (any) value known at compile time.

Of course, a finer granularity and a more relaxed (i.e. runtime) selection of the rounding mode would be interesting additional features. These may be made optional, and the hardware capability in this regard could be queried through appropriate device properties.

For example, considering the standard execution model for OpenCL, with work-groups mapped to compute units, it might make sense to support a granularity at the work-group level. This would be a nice addition, since it would allow e.g. to concurrently run the same code with different rounding modes (one per work-group), which would benefit applications geared towards the analysis of the stability of numerical code (as discussed in Rationale #1). But it's not strictly necessary.

Our lives are short

Some time ago someone on FriendFeed asked if anybody (else) would celebrate their kid's 1000th day. Among the negative answers, someone remarked that they'd rather celebrate the 1024th. And as hardened computer geek, that was my first thought as well.

But why stop there? Or rather: if your baby is young, why start there?

So we started collecting the other power-of-two dates (power-of-two-versaries1?) for our baby, and after asking about a few from WolframAlpha, I set up to write a Ruby script to do the computations for me, and the first question that arose was: what's the highest (integer) power of two that should be considered?

Since the script wasn't ready yet, I asked WolframAlpha again, to quickly discover that the best we can do comes significantly short of 216 days, which is almost 180 years (179 years, 5 months, and a few days, with the actual number of days depending on how many leap years are covered by the timespan)2.

Now, as it happens, 16 bits is the (minimum) width of the short data type in the C family of programming languages; in fact, on most platforms and systems, 16 bits its exactly the width of the short data type.

As it turns out, our lives are indeed (unsigned) short.

  1. if anybody has a better name for them, please tell. ↩

  2. most modern-day human adults can aspire at hitting three power-of-two-versaries at best: 213 (22 years, 5 months, and the usual bunch of days) as a young adult, 214 (44 years, 10 months and a week, day more day less) and finally 215 (89 years, 8 months and a half), for the lucky ones. ↩

Free versus Open: the Maps case


(or skip this and jump straight to the actual content)

‘Free’ is an interesting word in English, particularly when pertaining to software and services. Although the original meaning of the word was pertaining to the concept of freedom, it has gained extensive use as a short form for “free of charge”, i.e. gratis, without requiring any (direct, obvious) payment.

Now, while context can usually help clarify which meaning is intended in modern usage, there are situations in which one of them is intended but the other is understood. For example, the sentence “that slave is now free” can mean that it has attained freedom, or that it is being given away (to another slave owner.)

A context where the ambiguity cannot be resolved automatically is that of software and services; in fact, the duplicity of the meaning has been plaguing the free software movement since its inception, which is why now the FLOSS acronym is often used: Free/Libre Open Source Software. The Libre is there to clarify what that Free is supposed to mean.

Of course, free (as in freedom) software and services tend to also be free of charge (and thus gratis), which is what makes them appealing to the large public who is not particularly interested in the ideology behind the software freedom movement.

And specifically to highlight how important the difference between the two concepts is, I'm going to discuss two distinct approaches to making worldwide maps available to the public: one which is free (as in gratis), but essentially closed, and the other which is free (as in libre), and open.

An introduction

Recently, Google has started offering “tourism boards, non-profit, government agencies, universities, research organizations or other entities interested” the possibility to borrow Google's Street View Trekker to help extend Google Maps' Street View.

To clarify, Google is “offering” these subjects the opportunity to expend the subjects' human resources to expand Google's own database. A company whose business is data gathering is giving other entities the “opportunity” to contribute to their data gathering efforts —for free1.

In simpler words, this is a bank asking people to donate money to them, but spinning it as an opportunity.

Google Maps

An objection that I'm sure will be raised is that this is not really like a bank asking for monetary donations, because banks' services are not free (of charge). For example, they loan money at a cost (the interest rate). By contrast, Google's services (and particularly Google Maps and its Street View feature) are free (of charge).

The objection is deeply flawed by the misconception about what Google services are, misconception driven by the inability to realize the difference between Google's consumers and Google's customers.

Google's consumer services (most publicly known Google services: Search, Mail, Maps, Picasa, now Plus) are as much a service as the cheese in a mouse trap is a meal. Their purpose is not to provide the service to the consumer, but to gather data about him or her.

Google is in the business of data gathering. Gathering data about people, gathering data about things people might be interested in. Selling this data to its customers (directly or indirectly) is what Google makes money off: the main income stream is advertisement, targeted advertisement that relies on your usage of Google's consumer services to see what you might be interested in. (And I'm not even getting into the NSA debacle because that would really steer us off topic.)

The key point is understanding who owns and controls the data, which in the case of Google Maps and Street View is Google. While the data is (currently) being made available back to Google's consumers free of charge, in post-processed form, that data remains solidly in the hands of Google, that may choose to use it (or not) as they see fit.

To the potential question “why would Google ever not make the data accessible?” (through its consumer services), the correct answer is, sadly, why not.

Google is in fact (in)famous for discontinuing services now and then, the most recent one being its Reader feed aggregator, the upcoming one being the personalized iGoogle page. But there is in fact one service that was discontinued so silently most people even failed to notice it got discontinued: wireless geolocation.

Google and wireless geolocation

Wireless geolocation is the computation of the location of a wireless device based on which wireless transmitters (of known location) it sees and the strength of the signal. This is typically used with cellphones, for example, based on which cell towers they see and the respective signal strength. It can be used with WiFi devices (laptops, smartphones, tablets, whatever) based on which wireless routers are visible —provided you know the position of the routers.

Now, as it happens, when Google was driving around in their Google Street View Cars snapping pictures for their Street View consumer service, they were also gathering information about wireless routers. The data thus gathered could be used by another Google consumer service, the geolocation API: you could ask Google “where am I if I see such and such routers with such and such signal strengths?” and Google would provide you with an approximate latitude and longitude.

(And let's skip the part where Google was collecting much more than the information needed for geolocation, something that got them in trouble in Germany, although it only led to a ridiculously low fine.)

The wireless geolocation service was available to the public, but that access has been discontinued since 2011, and access to a similar service is only available for business (essentially to allow Android to keep using it), with stricter controls on the its access. So Google still has the data, it still uses it, but the services based on it are not available to the general public anymore. What would you think if something like this happened to data you were “offered” to contribute?

User contributions

In fact, Google interest in external contributions is not limited to the recent offer to use a Trekker: Google Maps now has a number of options to allow users to contribute, ranging from changes and fixes to the map itself, to geotagged panoramic photos publicly shared on Picasa (which can be used for Street View).

I suspect that Google has learned from the experience of OpenStreetMap (which I will discuss later on) how powerful ‘crowdsourcing’ can be, while requiring much less resources on the company's side.

So you can contribute to make Google Maps better. The question is: should you? Or rather, would you? If you're willing to spend whatever small amount of time to contribute to global mapping, why would you do it for a company for which this is business?

Open Street Map

OpenStreetMap (Wikipedia article) was born in 2004 in the UK with the aim of providing a free (as in freedom) map of the world.

It's important to note right from the start the huge difference between how OSM is free versus how Google Maps is free: the latter provides a service that is available to consumer free of charge, the former provides mapping data which is not only available free of charge to anybody, but the use of which is also subject to very little restrictions (the actual license is the Open DataBase License, which, as explained here, essentially, allows anyone to access, modify and make derivatives of the data, provided proper attribution is given and derivatives are shared with the same liberal terms).

So there are two distinctive differences.

The first difference pertains what is made available. Google Maps provides a front-end to Google's data: the visualization (and related services), available mostly through their website and smartphone applications. By contrast, OpenStreetMap provides the actual mapping data, although a slip-map to access it in human-usable form is also provided on the website.

The second difference pertains the terms of use of what is made available. Although Google allows (in fact, encourages) embedding of the maps in other websites (free of charge within certain limits), the Terms of Service are otherwise pretty restrictive. By contrast, the license under which OSM data is made available is quite liberal, in that it only prevents misappropriation of the data, or the imposing of further restrictions on its use (I debate the paradox of restricting restrictions elsewhere in Italian).

OpenStreetMap, as any other collaborative effort for a knowledge base, is such that it benefits from anybody's contribution, but in perfect reciprocity anybody can benefit from it. This is in contrast to situations (such as that of Google Maps) where there is one main entity with dominant interests and control on the data (Google benefits from user contributions, and Google again benefits from consumers using its services, and it can arbitrarily limit their use by third parties).

There are commercial interests in OpenStreetMap. While some are essentially unidirectional (Apple, for example, used OSM data in its photo application for the iPhone —at first without attribution, thereby actually violating the license), others try to build a two-way relationship.

For example, at Flickr they use OSM data for (some of) their maps, and they also introduced OSM-related machine tags that can be used to associate photos to the places they were taken at. Yahoo (the company that owns Flickr) and Microsoft allow usage of their satellite and aerial photos for ‘armchair mapping’ (more on this later). MapQuest (formerly, the mapping website) has an Open alternative that relies on OSM, and they have contributed to the open-source software that drives OpenStreetMap (the renderer, the geocoding and search engine, the online editor), and they have funded the improvement of the actual data.

In some nations, OSM data comes (partially) from government sources, either directly (government-sponsored contributions) or indirectly (through volunteer work from government data). In some ways, it's actually surprising that governments and local administrations are not more involved in the project.

Considering that OSM contribution is essentially voluntary, the amount of information that has been added is actually amazing. Of course, there are large inhomogeneities: places that are mapped to an incredible detail, others where even the most basic information is missing: this site maps the density of information present throughout the world, showing this discrepancy in a spectacular fashion.

Why use OSM

Many (most, likely) end users are not particularly interested with the ideology of a project, nor with the medium and long term consequences on relying on particular vendors. For the most part, what end users are interested in is that a specific product delivers the information or service they seek in an appropriate manner.

In this sense, as long as the information they need is present and accessible, a user won't particularly care about using OpenStreetMap or Google Maps or any other particular service (TomTom, Garmin, Apple Maps, whatever): they will usually go with whatever they have available at hand, or with whatever their cultural context tends to favor.

On the other hand, there are a number of reasons, ranging from the ethical to the practical, why using an open, free (as in freedom) service such as OpenStreetMap should be preferred over opting-in to proprietary solutions. (I will not discuss the ethical ones, since they may be considered subjective, or not equally meaningful to everybody.)

On the practical side, we obviously have a win of OSM over paid proprietary solutions: being open and free (as in freedom), the OSM data is available free of charge as well.

But OSM also wins —somewhat unexpectedly— over other free-of-charge services such as Google Maps, as I found out myself, in a recent discovery that brought me back to OpenStreetMap after my initial, somewhat depressing experience with it over four years ago: the Android application for Google Maps does not offer offline navigation.

Finding that such an otherwise sophisticated application was missing such a basic function was quite surprising. In my case (I own a Transformer Infinity without cellphone functionality) it also rendered Google Maps essentially useless: the application allows you to download map data for offline usage2, which is useful to see where you are even when you can't connect to the Internet, but the functionality to get directions from one place to another is not actually present in the application itself: it's delegated to Google's server.

I was amazed by the discovery, and I'm still wondering why that would be the case. I can understand that optimal routing may depend on some amounts of real-time information, such as traffic conditions, that may only be available with an Internet connection, but why would the navigation features be completely relying on the online service?3

Since the lack of offline navigation meant the Google Maps app on Android was useless for me, I started looking for alternatives, and this is how I found out about, and finally settled for, OsmAnd, an open source4, offline navigator for Android that uses open data (from OpenStreetMap, but also e.g. from Wikipedia).

The existence of applications such as OsmAnd is excellent to explain the importance of open data: when Google Maps does not offer a particular service, it is basically impossible for anybody else to offer it based on their data. By contrast, OpenStreetMap offers no services by itself (aside from basic map rendering), but gives other projects the opportunity —and this time we really mean opportunity, not in the ironic sense we used when discussing Google's outreach to get manpower for free— to provide all possible kinds of services on top of their data.

There are in fact a number of applications, both commercial and not, that provide services based on OSM data. They all benefit from the presence and quality of the data, and they often, in one way or another, give back to OSM. The relevance of OSM is not just in it being a free world mapping website. It's also in the healthy ecosystem which is growing around it.

More interesting, OpenStreetMap sometimes wins over any (gratis or paid) services also in terms of quality and amount of mapped data. This happens whenever the local interest for good mapping is higher than the commercial interests of large external companies providing mapping services. Many small, possibly isolated communities (an example that was pointed out to me is that of Hella, in South Iceland) tend to be neglected by major vendors, as mapping them tends to be high-cost with very little or no return, while local mappers can do an excellent job just being driven by passion.

Why not use OSM

For the end user there are some equally obvious reason why they should not, or cannot, use OpenStreetMap, the most important being, unsurprisingly, lack or low quality of the data.

Although the OSM situation has distinctly improved over time, it's quite evident that there are still huge areas where Google and other proprietary providers of mapping services have more detailed, higher quality data than OSM. Of course, in such areas OpenStreetMap cannot be considered a viable alternative to services such as Google Maps.

It should be noted however that the OSM data is not intrinsically ‘worse’ than the data available from proprietary sources such as Google. In fact, Google itself is well aware of the fact that the data they have is not perfect, which is why they have turned to asking users for help: the amount of manpower required to refine mapping data and keep it up-to-date is far from trivial, and this is precisely where large amounts of small contributions can give their best results.

(Of course, the point then is, who would you rather help refine and improve their data?)

Another important point to be considered, as highlighted by the disclaimer on OSM's own website, is that their data should not be considered the end-all-and-be-all of worldwide mapping; there are use cases for which their data, as complete and detailed as it may be, should still not be used, as its reliability cannot be guaranteed, and it's in no way officially sanctioned. (Of course, similar disclaimers also apply to other map service providers, such as Google itself and MapQuest.)

There are finally types of data which OSM does not collect, because they are considered beyond the scope of the project: things such as Street VIew, or real-time information about public transport, or even the presence and distribution of wireless transmitters (for geolocation purposes). For this OSM obviously can't be used, but this doesn't necessarily mean that Google is the only viable alternative. (More on this later.)

Why (and how to) contribute to OSM

There is a very simple, yet important reason to contribute to OpenStreetMap: the more people are involved, the more everyone benefits from the improvements in the amount and quality of the data, in sharp contrast to the actual beneficiaries of your donated time and efforts to assist a company that thereafter gains control of the data you provide. In other words, if you plan on spending time in improving map data, it would be recommendable to do it for OpenStreetMap rather than a proprietary provider such as Google.

Moreover, contributing nowadays is much simpler than it was in the past, both because of the much more extensive amount of data already available (yes, this makes contributing easier) and because the tools needed to actually provide new data or improving the existing ones are more generally available and easier to use.

I first looked into OpenStreetMap around 2008 or 2009, at a time in which the state of the database was still abysmal (in my whereabouts as in most of the rest of the world). Contributing also required nontrivial amounts of time and resources: it required a GPS device which satisfied some specific conditions in terms of interoperability and functionality, and the use of tools that were everything but refined and easy to use. I gave up.

Things now are much different: if you are in the northern hemisphere (or at least one of the ‘western’ countries), chances are that most of your whereabouts have already been mapped to a high level of detail, so that your efforts can be more focused and integrated. Moreover, dedicated tools such as JOSM or even in-browser editors are available and (relatively) user-friendly (considering the task at hand). Finally, data is much easier to collect, with GPS receivers built in most common smartphones and numerous applications specifically designed to assist in mapping.

Indeed, while trying out the aforementioned OsmAnd to see how viable a navigation app it would have been, I found out a couple of places in my whereabouts where the data was not accurate (e.g. roundabouts not marked as such) or was out of date (former crossing recently turned into roundabouts). This was what finally got me into OSM contribution, as fixing things turned out to be quite easy, when starting from the data already present.

There are a number of ways to contribute to OpenStreetMap, with varying degree of required technological prowess, time investment and relevance of the changes.

The simplest way to contribute to OSM, Notes, has been introduced quite recently; in contrast to other methods it doesn't even require an account on OSM, although having one (and logging in) is still recommended.

The purpose of Notes is to leave a marker to report a problem with a specific location in the map, such as missing or wrong data (such as a one-way street not marked as such or with the opposite direction). Notes are free-form contributions that are not an integral part of the actual map data. Rather, more experienced mappers can use Notes to enact the actual necessary changes on the data, thereby ‘closing’ the Note (for example, fixing the one-way direction of the street).

Notes are a powerful feature since they allow even the less experienced users to contribute to OSM, although of course manual intervention is still needed so that the additional information can be merged with the rest of the data.

Any other contribution to OSM requires an account registered with the site, and the use of an editor to change or add to the actual map data. The website itself offers an online editor (two of them, actually), which can be practical for some quick changes; more sophisticated processing, on the other hand, are better done with external editors such as the aforementioned JOSM.

The simplest change that can be done to map data is the addition or correction of information about Points of Interest (POIs): bars and restaurants, hotels, stations, public toilets, newsstands, anything that can be of interest or useful to residents and tourists alike.

POIs are marked using tags, key-value combinations that describe both the kind of Point and any specific information that might be relevant. For example, amenity=restaurant is used to tag a restaurant, and additional tags may be used to specify the type of cooking available, or the opening hours of the business.

Tagging is almost free-form, in the sense that mappers are free to choose keys and values as they prefer, although a number of conventions are used throughout the map: such common coding is what allows software to identify places and present them to the end-user as appropriate. Most editors come with pre-configured tag sets, allowing less experienced user to mark POIs without detailed knowledge of the tag conventions.

In fact, tags are used everywhere around OSM, since the spatial data itself only comes in two forms: points, that mark individual locations, and ‘ways’, ordered collections of points that can mark anything from a road to a building, so that tags are essential to distinguish the many uses of these fundamental types5.

Contributing to the insertion and improvement of POIs is mostly important in areas where most of the basic information (roads, mostly) has already been mapped.

In less fortunate places, where this information is missing, the best way to contribute is to roll up your sleeves and start mapping. This can be done in two ways.

The preferred way is to get ‘on the ground’ with some kind of GPS receiver (nowadays, most smartphones will do the job nicely) and some way to record your position over time, as you walk or drive around. The GPS tracks thus collected can then be imported into an OSM-capable editor, cleaned up, tagged appropriately and uploaded to OpenStreetMap.

Lacking such a possibility, one can still resort to ‘armchair mapping’, tracing satellite or aerial maps for which this kind of usage has been allowed (e.g. those by Yahoo and Microsoft). Of course, the information thus tracked is more likely to be inaccurate, for example because of incorrect geolocation of the imagery, or because the imagery is simply out of date. Such an approach should thereby only be chosen as a last resort.

Who should contribute to OSM

The obvious answer to such a question would be ‘everybody’, although there quite a number of possible objections.

For example, an interesting paradox about OSM is that the ones better suited to generate the data are not necessarily those that would actually benefit from it: locals have the best ground knowledge about their whereabouts, but exactly because of this they are also the least likely to need it from OSM.

This is where the reciprocity in the benefits of using OpenStreetMap comes into play: with everyone taking care of ‘their curb’, users benefit from each other's contributions.

Of course, there are some parties, such as local administrations and tourism boards, for which accurate mapping is beneficial per se; yet, there aren't many cases in which they are directly involved in the improvement of OSM data. While this may seem surprising, there are many possible explanations for this lack of involvement.

There are, of course, legal reasons: aside from possible licensing issues that the administrations would have to sort out (due to the liberal licensing of OSM data), there is also the risk that an involvement of the administrations could somehow be misrepresented as an official sanctioning of the actual data, a dangerous connotation for content which still maintains a high degree of volatility due to possible third party intervention6.

There is also the issue of knowledge about the existence of OpenStreetMap not being particularly widespread; as such, there is lack of a strong motivation in getting involved. (This, of course, is easily solved by spreading the word.) What's worse, even when the existence of OSM is known, the project is lightly dismissed as an amateurish knock-off of more serious services such as Google Maps.

As an aside, the latter problem is not unique to OSM, and is shared by many open projects in their infancy7 —think e.g. how the perception of Linux has changed over the years. The problem is that this triggers a vicious circle: the less complete OpenStreetMap is, the less it's taken seriously; the less it's taken seriously, the less people contribute to it, making it harder to complete.

This is another reason why every contribution counts: the need to break out of the vicious circle, reach a critical mass such that people will consider it normal to look things up in OpenStreetMap (rather than on other, proprietary services) and eventually fix or augment it as appropriate.

The easiest way to start getting involved is with the addition of Points Of Interest that are personally ‘of interest’. You have a preference for Bitcoins? Help map commercial venues that accept them. You have kids? Help map baby-friendly restaurants and food courts. Are you passionate about Fair trade? Guess what you can help mapping. You get tired easily while walking around? Map the benches. Did you just book a few nights in a hotel which is missing from the map? Add it.

And most of all, spread the world. Get people involved.

Other open map-related services

OpenStreetMap has a rather specific objective, which excludes a number of map-related information. For example, OSM does not provide nor collects street-level imagery, and thus cannot replace StreetView. It also doesn't provide or collect information about wireless transmitters, and thus cannot be used for wireless geolocation. It also doesn't provide or collect real-time information about traffic or public transport, and thus cannot be used for adaptive routing.

As such, OSM cannot be considered an integral replacement for Google Maps (or other non-open mapping services), even when the actual ground map data is on par or even superior (yes it happens). This is where other services, —similarly open, and often integrated with OSM itself— can be of aid, although their current status and quality is often significantly inferior both compared to the current status and quality of OpenStreetMap itself and (of course) compared to the proprietary solutions.

Wireless geolocation

For wireless geolocation, there are actually a number of different solutions available. The largest WiFi mapping project (WiGLE) provides data free of charge, but under a very restrictive license, and thus cannot be considered open by any standard, so we will skip over that.

OpenBMap, active since 2009, can be considered open by most standards: it provides client and server software under the GPL, and it provides the collected data (both raw and processed) under the same Open Database License as OpenStreetMap.

At the time of writing, the OpenBMap database is not very strong (less than 900K data points are present in the processed files that can be downloaded from the website). In itself, this is an issue that is easily remedied, since data gathering for wireless networks is trivial (when compared to e.g. ground mapping) and can be fully automated: improving the database is therefore just a matter of spreading the word and having more people contribute.

As driven by the best intentions as the project can be, however, contributions to it are brought down by an overall amateurish presentation, both at the website level (the aesthetics and layout could use some refinement) and at the software level: albeit open source, its development is not managed as openly as it could be8, which makes collaboration harder.

A more recent project is OpenWLANmap. This project also provides open source software for wireless geolocation, but the openness of its database is more dubious. The license is not clearly indicated anywhere on the website (although the GNU Free Documentation License is distributed with the database downloads), and the downloadable database is only a subset of the data used by the website (about 1.6M of more than 5M data points), leading to the question about what happens to the user-submitted data. In this sense, its openness could be challenged, until these issues are resolved.

Yet another similar project is Geomena, started more or less at the same time as OpenBMap, but with a different licensing (Creative Commons instead of the ODBL). This project seems to be particularly focused on presenting an easy-to-use API both for querying and for contributing to the database. However, quite a few links are broken and the project doesn't seem to have moved much forward both in terms of application development and in terms of database growth (at the time of writing, just about 25K access points are claimed to be available, and the link to download the data is not functioning).

This fragmentation, with application bits and partial data bases spread out across different projects, none of which manages to provide a complete, well-organized, functional solution, is probably the most detrimental situation we could have. Getting these project together, sharing the data base as well as the efforts to provide accessible data and applications would be beneficial to all.

Street-level imagery

While gathering information about wireless networks can be trivially automatized thanks to the widespread diffusion of smartphones and similar devices, the kind of street-level imagery that would be useful to provide an open alternative to Google StreetView is quite laborious to take without specialized hardware. Photo stitching applications and camera software with automatic support for 360° photography can come in handy, but having to do this manually every few meters remains a daunting task.

Additionally, pictures taken may need to be cleaned up by masking out sensitive data such as faces, car license plates or whatever else might need masking depending on where the photo was taken.

These are probably —currently— the most significant obstacles to the creation of a competitive StreetView alternative. Despite them, a few projects that try to provide street-level imagery have been born more or less recently.

We have for example the most obviously-named OpenStreetView, with strong ties to OpenStreetMap and the aim to become a repository of open-licensed street-level imagery. Other projects, such as, use a different approach, acting as a mapping hub of open-licensed, geolocalized photos hosted by other services (Panoramio, Flickr, etc).

While the considerable lack of imagery and the difficulty in obtaining it are undoubtedly the biggest issues these projects face, the unrefined user interfaces and consequent reduced usefulness aren't exactly of assistance.

Real-time traffic and public transport

This is probably the data which is hardest (possibly impossible) to obtain without direct involvement of the interested parties. While routes, stops, and even expected timetables can be mapped and integrated into the standard OpenStreetMap database, real-time information such as actual bus departure time and route progress, or temporary issues such as strikes, abnormal high traffic conditions, or roadworks are completely outside the scope of the OpenStreetMap data and impossible to maintain without a continuous stream of information coming from somewhere or someone.

Talking about openness for such volatile data which can almost only be provided by a central controller is also less important in some ways. A more interesting subject for this topic would be some form of common standard to have access to this data, in place of the plethora of proprietary, inhomogeneous APIs made available by a variety of transport systems throughout the world.

Still, it would be interesting if something was cooked up based on a principle similar to that used for Waze, the crowd-sourced crowd-avoidance navigation system recently acquired by Google9. In fact, it wouldn't be a bad idea if an open alternative to Waze was developed and distributed: enhancing it to include alternative transportation methods (on foot, by bus, by bicycle10) would have the potential of turning it into a viable tool, even surpassing Waze itself.

Heck, it would even be possible to use the last open-source version of the Waze source code as a starting point; of course, openness of the collected data would this time become a strong point in competing against the proprietary yet still crowd-sourced alternative, especially when combined with smart integration with OpenStreetMap and its flourishing ecosystem.

There's good potential there to set up the infrastructure for a powerful, open routing engines with real-time information providers. It just needs someone with the courage to undertake the task.

Some conclusions

There is little doubt, in my opinion, on the importance of open mapping data. The maturity reached by OpenStreetMap over the years is also an excellent example of how a well-focused, well-managed, open, collaborative project can achieve excellent results.

The power of crowd-sourcing is such that OSM has often reached, when not surpassed, the quality of proprietary mapping services, and this has become so evident that even these proprietary mapping services are trying to co-opt their consumers into contributing to their closed, proprietary databases by disguising this racking up of free manpower as an opportunity for the volunteers to donate their time to somebody else's profit.

The biggest obstacle to OpenStreetMap, as with any collaborative project, is getting people involved. The improvements in technology have made participation much easier than it was at the project inception, and the increasing amount and quality of base ground data makes it also much easier to get people interested, as the basic usability of the project is much higher (it's easier to get started by fixing small things here and there than starting from scratch an uncharted area).

There are also other map-related data which is not collected by OpenStreetMap, since it is deemed outside its scope. While other projects have tried stepping in to cover those additional aims, their success so far has been considerably inferior, due sometimes to fragmentation and dispersal of efforts, sometimes to hard-to-overcome technical issues. It is to be hoped that in time even these project will find a way to come together and break through just as OSM managed, disrupting our dependency on commercial vendors.

  1. You will notice an Italian tag for this article. Sorry, I couldn't miss the chance. ↩

  2. update (2013-07-10): apparently in the just-recently released version 7 of the app, offline maps feature has been ‘almost removed’: you can cache the area which is being shown with a completely non-obvious command (“OK maps”, who the fsck was the genius that came up with this), but the sophisticated management of offline maps that was present in earlier versions is gone. ↩

  3. this cannot be about computational power of the devices, since other applications offering offline routing for Android are available; this cannot be about the algorithm depending on some pre-computed optimal partial routes, since these could be downloaded together with the offline data. One possible explanation could be that offline routing is more likely to find less optimal routes due to lack of useful real-time information and limited computational power of the devices, and Google would rather offer an ‘all or nothing’ service: either you get the optimal routing computed on Google servers, or you don't get any routing at all —but this sounds stupid, since a simple warning that the route could not be optimal when offline would be sufficient. Another possible explanation is that offline routing prevents the kind of data gathering that Google is always so eager about; sure, Maps could still provide that information on the next chance to sync with Google, but that would make the data gathering obvious, whereas Google is always trying to be discreet about it. ↩

  4. while the application is open source, it is distributed for free only with reduced functionality, and only the paid version has all features built in, unless you compile application for yourself. ↩

  5. points and ways can additionally be collected in ‘relations’, which are used to represent complex information such as bus routes or collections of polygons that represent individual entities, but delving this deep into the more advanced features of OSM is off-topic for this article. ↩

  6. in other words, even if the data contributed by official sources could be considered official, against the OSM disclaimer, it could still be as easily subject to editing from other users, and thus the metadata about who made changes and when would rise to a much higher importance than what it has now; this could be solved with approaches such as ‘freezing’ these kind of official contributions, but this would require a change in the licensing terms, and go against the hallmark of OpenStreetMap, its openness. ↩

  7. and make no mistake that OpenStreetMap is still in its infancy, although it has been running already for almost 10 years now and is now reasonably usable in large parts of the world. Mapping is a daunting task, and the amount of information that can still be collected is orders of magnitude higher than what has already been done. ↩

  8. the project seems to follow a strategy where the source code is released with (when not after) the public release of the applications, in contrast to the more open approach of keeping the entire development process public, to improve feedback and cooperation with external parties. ↩

  9. in fact, the acquisition of Waze by Google is another indication of how much Google values the opportunity (for them) to use crowd-sourced data gathering to improve the (online-only) routing services they offer to their consumers, thereby further locking in consumers into using their services. ↩

  10. of course, cyclists already have OpenCycleMap, the OpenStreetMap-based project dedicated to cycling routes. ↩

RaiTV, Silverlight e i video

Get the code for
UserJS/Greasemonkey per usare HTML5 invece di Silverlight su RaiTV:


‘Mamma’ RAI, la concessionaria del servizio pubblico radiotelevisivo in Italia, è entrata nel terzo millennio rendendo le proprie trasmissioni disponibili online sul sito RaiTV, sia in streaming sia in archivio.

Quando la piattaforma RaiTV è stata costruita, la scelta sul software da utilizzare per la distribuzione è caduta su Silverlight, un prodotto con cui la Microsoft mirava a soppiantare il già esistente (e ben più supportato) Macromedia Flash.

La caratteristica principale di Silverlight è che si appoggia alla piattaforma Microsoft .NET che, nonostante le presunte potenziali intenzioni, non è multipiattaforma: è pur vero che esistono implementazioni incomplete e non perfettamente funzionanti di .NET per Linux e Mac, ma in entrambe queste altre piattaforme, guarda caso, Silverlight (o equivalenti) non funziona(no) bene.

Senza andare quindi ad indagare sulle possibili motivazioni dietro la scelta di questa piattaforma, il risultato netto è che il sito non è (perfettamente) fruibile senza avere Windows. È interessante notare che questa mancanza di generalità nella fruibilità del sito non è direttamente frutto (come in altri casi) della monocultura web di cui mi sono trovato a parlare in altri casi (il plugin per Silverlight funziona anche in altri browser, in Windows), ma è comunque legata al (sempre meno) dominante monopolio della Microsoft, dentro quanto fuori dal web.

Resta comunque il fatto che il sito istituzionale di un servizio pubblico non è (‘universalmente’) fruibile, e se questo è sempre qualcosa da contestare, lo è particolarmente quando il famoso canone RAI è sulla buona strada per diventare una tassa ‘a tappeto’, a prescindere dal fatto non solo che il cittadino voglia vedere trasmissioni RAI, ma persino dal fatto che possa: per esempio, in una casa come la mia, dove non vi sono televisioni ed i computer sono tutti con Linux, le trasmissioni RAI sono semplicemente inaccessibili.

L'effeto mobile

Che la dipendenza da Silverlight in RaiTV sia sostanzialmente spuria, almeno per quello che riguarda i video d'archivio, è evidente già semplicemente guardando il codice delle pagine: gli indirizzi dei video, in vari formati (WMV, MP4, H264) sono messi in bella vista, spesso già nello head del documento.

Volendo, quindi, si può procedere ‘a manina’, aprendo il sorgente del documento (cosa che si può fare in qualunque browser), cercando il videourl apposito, e passarlo al proprio programma preferito per poterlo finalmente vedere. Non proprio quello che si definisce un web ‘accessibili’.

Sul sito ‘genitore’, inoltre, studiando anche solo superficialmente il codice JavaScript con cui le pagine reagiscono ad alcune richieste (come vedere l'ultimo telegiornale o ascoltare l'ultimo giornale radio) si nota subito che il codice in questione prevede la possibilità che Silverlight non sia disponibile, ma limitatamente alle piattaforme mobile: iOS (sui gadget Apple) o Android: se il browser dice di essere su una tale piattaforma, il codice JavaScript provvede ad usare i tag video e audio introdotti con l'HTML5 e disponibili su queste piattaforme mobile.

La domanda è quindi: perché questo tipo di accesso ai contenuti non è reso disponibile anche sui normali desktop? Perché imporre l'uso di Silverlight in questi casi? La scelta è discutibile non solo per la chiusura della piattaforma di distribuzione della Microsoft, ma anche insensata: se il contenuto c'è, perché tenerlo nascosto?

(Peraltro, ad esempio, uno dei (presunti) benefici di Silverlight è la possibilità di adattare lo streaming video alla banda disponibile (abbassando la qualità per evitare salti o interruzioni in caso di “internet lenta”): paradossalmente, questa funzione sarebbe ben più utile su mobile (dove Silverlight generalmente non può essere utilizzato —non ricordo sui due piedi se Windows su mobile lo supporta), che non su desktop, dove generalmente si è collegati con un'ADSL (che si suppone) funzionante.)

È anche vero che appoggiarsi esclusivamente al supporto moderno per audio e video in HTML5 escluderebbe comunque chi (per un motivo o per un altro) ha un vecchio browser che non supporta questi tag. La risposta è semplice, e viene dalla possibilità offerta dal supporto multimediale di HTML5 di ‘ricadere’ su altre scelte quando i tag (o i formati!) non sono supportati al browser.

La struttura dovrebbe quindi essere la seguente: tag audio o video con le appropriate source multiple, in modo che ciascuna piattaforma (desktop o mobile) possa scegliere la più adatta, con un fallback alla situazione corrente: questo permetterebbe ai contenuti da essere fruibili quasi universalmente, per di più con il beneficio di una maggiore uniformità di codice, senza necessità di imporre manualmente i ‘casi speciali’ come viene attualmente fatto attraverso il JavaScript presente sulle pagine.

User JavaScript

Se (o finché) alla Rai non avranno il buon senso di applicare il suddetto suggerimento, la visione dei siti Rai in Linux (e Mac) richiede un intervento manuale (andarsi a cercare le URL dei video, da scaricare o aprire in programmi esterni).

Ma possiamo fare di meglio, sfruttando la possibilità offerta dalla maggior parte dei browser moderni (Chrome e derivati, Opera, Firefox e derivati —questi ultimi con l'ausilio dell'estensione GreaseMonkey) di concedere a script utenti di manipolare la pagina.

È a questo scopo che nasce questo user script, che agisce (quasi) esattamente come suggerito in chiusura del precedente paragrafo. Quasi, perché per una scelta personale il fallback su Silverlight è soppresso: lo script si prendere quindi briga di sostituire l'oggetto Silverlight con l'appropriato tag multimediale HTML5, con gli indirizzi pubblicizzati sulla pagina stessa.

Con la versione corrente dello script, dovrebbero essere finalmente fruibili senza intoppi non solo i video d'archivio su RaiTV (almeno su browser che supportino i codec utilizzati), ma anche alcune pagine del sito ‘madre’

Buona visione.

A horizontal layout

This is a pure
CSS challenge:
no javascript,
no extra HTML

Webpages, mostly for historical reasons, are assumed to develop vertically: they have a natural horizontal extent that is limited by the viewport (screen, page) width, and an indefinite vertical extent realized through scrolling or pagination.

These assumptions on the natural development of the page are perfectly natural when dealing with standard, classic text layouts (for most modern scripts), where the lines develop horizontally, legibility is improved by limiting the line length and stacking of multiple, shorter lines is preferred to single long lines. In a context where it isn't easy to flow long texts into columns, the vertical development with constrained width is the optimal one.

However, these assumptions become instead an artificial constraint when columns with automatic text flow are possible, and this is exactly what this challenge is about: is it possible to achieve, purely with CSS, a layout that is vertically constrained to the viewport height, but can expand horizontally to fit its content?

The solution that doesn't work

Let's say, for example, that we have the following situation: there's a #page holding a #pageheader, #pagebody and #pagefooter ; the #pagebody itself normally contains simply a #content div that holds the actual page content. We want the #pageheader to sit on top, the #pagefooter to sit at the bottom, and the #pagebody to sit in the middle, with a fixed height, and extending horizontally to arbitrary lengths.

In principle, this should be achievable with the following approach: set the #content to use columns with a fixed width (e.g., columns: 30em) and to have a fixed height (e.g., height: 100%), which lets the #content grow with as many columns as needed, without a vertical overflow: by not constraining the width of any of the #content parents, I would expect it to grow indefinitely. The CSS would look something like this:

html, body, #page {
    height: 100%;
    width: auto;
    max-width: none;

#content {
    width: auto;
    max-width: none;
    height: 100%;
    columns: 30em;

#pagebody {
    width: auto;
    max-width: none;
    height: 100%;
    padding-top: 4em;
    padding-bottom: 8em;

#pageheader {
    position: fixed;
    top: 0;
    width: 100%;
    height: 4em;

#pagefooter {
    position: fixed;
    bottom: 0;
    height: 8em;

Why (and how) doesn't this work? Because there is an implicit assumption that the content will always be limited by the viewport. By setting the html, body, #page and #pagebody heights to 100%, we do manage to tell the layout engine that we don't want the layout to extend arbitrarily in the vertical direction, but even by specifying width: auto and max-width: none, we cannot prevent the width from being limited by the viewport.

What happens with this solution is that the #content overflows its containers, instead of stretching their width arbitrarily (which is what would happen instead in the vertical direction): this is clearly visible when setting up ornaments (e.g. borders) for #page and/or #pagebody. While this results in a readable page, the aesthetics are seriously hindered.

The challenge proper

How do you ‘unclamp’ the HTML width from the viewport width, with pure CSS? How do you allow elements to stretch indefinitely in the horizontal direction?

A potential alternative

In fact, there is a potential alternative to the unclamped width, which relies on the experimental ‘paged’ media proposed by Opera achievable, in the experimental builds of Opera supporting this feature, by setting #content { overflow-x: -o-paged-x} or #content { overflow-x: -o-paged-x-controls}.

In this case, the page would still be clamped to the viewport width (and in this sense this cannot be considered an actual solution to the challenge), but the content would be browsable with ease, and the aesthetics would not be spoiled by the mismanagement of the overflow.

An Opera Requiem? Part II

The news this time aren't as bad the previous rumor. And this time it's news, not rumors: an official announcement has been made on the Opera Developer News that the Opera browser will switch to WebKit as rendering engine and to the V8 JavaScript engine.

Despite the nature of Opera as minority browser (very small userbase compare to other browsers, despite the announcement of ‘over 300 million users’ done in the same article), this announcement has started making quite some noise on the web, with lots of people siding with/against the choice, for a variety of reasons.

Honestly, I think this is ‘bad news’, not so much for Opera itself, but for the (future, upcoming) ‘state of the web’ in general.

On its impact for Opera

Let's start from the beginning, and summarize a few things.

So far, Opera's rendering engine(s) (currently Presto) has been the fourth (third, before the surge of WebKit) major rendering engine, beyond Trident (the one in Internet Explorer) and Gecko (the one in Mozilla Firefox). It was completely developed in-house, independently from the others, and it has for long held the crown as being the fastest, most lightweight and most standard-compliant renderer (I'll get back to this later).

Of course, Opera is not just its rendering engine: there are lots of user interface ideas that were pioneered in this browser (but made famous by other browsers), and the browser itself is actually a full-fledged Internet suite, including a mail and news client, an IRC client, a BitTorrent client, and so on and so forth. Amazingly, this has always been done while still keeping the program size and memory footprint surprisingly small.

With the announcement of the migration to WebKit as rendering engine, one of the things that used to make Opera unique (the speed, lightness and compliance of its rendering engine) disappears. This doesn't mean that the reason for Opera to exist suddenly ceases: in fact, its UI, keyboard shortcuts, development tools etc are the things that make me stick with Opera most of all.

Opera can still maintain (part of) its uniqueness, and even continue innovating, because of this. Not everybody agrees, of course, because the general impression is that by switching to WebKit Opera just becomes “another Chrome skin” (forgetting the WebKit was born from Konqueror's KHTML engine, and became WebKit after being re-released by Apple, cleaned up and refactored, as Safari's rendering engine).

And in fact, its UI and tools will keep being for me a primary reason to stick to Opera.

While the argument “if Opera is just a skin over Chrome or Chromium, why would I choose to use that instead of the original?” is probably going to make Opera lose a bunch of its 300 million users, I feel that the focus is being kept on the wrong side of the coin, because it focuses specifically on the impact of the choice for Opera itself, which is not the major problem with the migration, as I will discuss later.

There are also a number of people that remark how the change is a smart move on Opera's part, because it means less development cost and higher website compatibility.

On its impact on the web

And this, which an astoundingly high number of people see as a good thing, is actually what is horribly bad about Opera's migration to WebKit. The “it's good” argument goes along the line that the reduction in the number of rendering engines is good for Opera itself (breaks on less sites), and (more importantly) it's also good for all the web developers, that have to check their site against one less rendering engine.

And this, that they see as good, is bad. Actually, it's worse than bad, it's horrible. It's the kind of lazy thinking that is fed by, and feeds, a vicious circle, monocultures (sorry, it's in Italian). It's the kind of lazy thinking that leads to swamping.

Unsurprisingly, this is exactly the same thing that IE developers warn about: now that they find themselves on the losing side of the (new) browser wars, they suddenly realize how standard compliance is a good thing, and how browser-specific (or engine-specific, in this case) development is bad. And they are right.

Monoculture, and the consequent swamping, is what happened when IE6 won the last browser war, and it is what will happen —again— now that WebKit is becoming the new de facto “standard implementation”. There are a number of websites that fail to work (at all or correctly) when not used with WebKit: this is currently due to their use of experimental, non standard features, but it's still problematic.

It's problematic because it violates one of the most important tenets of web development, which is graceful degradation. It's problematic because it encourages the use of a single, specific rendering engine. It's problematic because as this ‘convention’ spreads, other rendering engines will start to be completely ignored. It's problematic because getting out of monocultures is slow and painful, and falling into one should be prevented at all costs.

Some people think that the current situation cannot be compared with the IE6 horror because the new emerging monoculture is dominated by an open-source engine. And I have little doubt that an open-source monoculture is better than a closed-source one, but I also think that this barely affects the bad undersides of it being a monoculture.

Monocultural issues, take II

The biggest issue with monoculture is that it triggers a vicious circle of lazyness.

This may seem paradoxical in the current context, because one of the reasons WebKit-specific websites choose WebKit is because it has lots of fancy, fuzzy, funny, interesting experimental features. People choose WebKit because it innovates, it introduces new things, it provides more power and flexibility! Which is true, as it was true about Trident when IE6 won the war.

But these experimental features don't always mature into properly-functioning, accepted standards. Building entire websites around them is dangerous, and wrong when no fallback is provided. (Unless the entire purpose of the website is to test those features, of course, but we're talking about production here.)

One of the complaints I've read about Opera is that the support of its Presto engine for the more recent CSS3 standard is lacking, as in it implements less of it than WebKit. This is quite probably true, but in my experience the compliance of the Presto engine to the parts of the standard that it actually implements is much better than any other rendering engine.

What is happening with WebKit is that hundreds of ‘new features’ are being poorly implemented, and their poor implementation is becoming the de facto standard. As people target WebKit, they only care about working around the WebKit bugs. This is bad for two reasons: one, it prevents other engines that don't have those bugs from presenting those pages correctly; two, it demotivates bugfixing in WebKit itself.

As I happen to use Opera as my primary browser, I tend to design for Presto first, and then check against WebKit (with Chromium) and Gecko (with Firefox) if the page still renders correctly. So far, whenever I've found unexpected rendering in either of the other browsers, it has been because those browsers violate the standard. In fact, checking the rendering of a webpage in other browser is still the best way to check where the bug lies. (Just an example of a bug in Chromium that I came across while working on this page.) Still now, when reporting bugs, the first question that is often posed is “how does this thing behave in other browsers?”

If all browsers use WebKit, there is suddenly no more motivation in fixing the bug, even when the standard claims a different behavior should be expected, and the different behavior is the sane one. If there is only one implementation, even if it's wrong and this creates a problem, it is quite likely that the bug will go unfixed, especially if fixing it will break sites that have adopted the wrong behavior as being the ‘good’ one.

The loss of Presto as a rendering engine is not something that should be cherished with relief. It's something that should be cried with desperation, because it's a nail in the coffin of web development.

My hope, at this point, is that Opera will do something really helpful, and it is to fix all of the horrible messy non-compliant stuff that currently cripples WebKit. They started already. Let's at least hope that this continues, and that their patches will make their way upstream: and even if they don't, let's at least hope that they will keep the fixes in their implementation.

(But if the switch is motivated (also) by the need to cut development resources —an hypothesis which is likely to be at least partially true— how much can we hope that these kind of contribution will continue?)

Transformer Infinity: an expensive toy

In late 2012 I decided it was finally time to gift myself with a Transformer Infinity, a piece of hardware (or rather a class of hardware) I had had my eyes on for sometime. After a couple months of usage, I can finally start writing up my thoughts on its hardware and software.

This ‘review’ should be read keeping in mind that I'm not an ‘average user’: I'm not the ‘intended target’ for this class of devices, I'm a power user that prefers control to “kid-proof interfaces”, and I have some very specific needs for what I could consider the ultimate device. This is particularly important, and will be stressed again, in discussing software. And all of this will come together in the conclusions (there's also a TL;DR version for the lazy).

The hardware

One of the things, if not the thing, I like the most about the Transformer line is the brilliant idea of enhancing the practical form of the tablet with the possibility of converting it to a netbook: after all, one of the things that had been bothering me about the whole tablet concept was the impracticality of the on-screen keyboard, stealing reading estate to offer something on which typing for long periods is not exactly the most comfortable experience.

While it is possible to use external (bluetooth, typically) keyboards with other tablets, the simple yet brilliant idea in the Transformer is to have the keyboard as an integral (yet detachable, and separately bought) part of the tablet, additionally acting as a cover. The idea was very poorly copied also by the Microsoft Surface, that however misses some of the key points that make the Transformer so good, such as the fact that the TF keyboard is actually a full-fledged docking station, also offering extra battery and additional connectors.

This feature has thus been the major selling point of the TF while I was evaluating which tablet to get, once I was settled on getting one. The other factors that came into play were screen resolution and connectivity. The next competitor in line was Google's Nexus, that while sporting a much better resolution than the TF700T, was missing any kind of external media support: the Transformer, in addition to a micro-SD slot on the tablet also features an USB port on the docking station (yes, you can plug USB pen drives into it). Oh, and of course the fact that the 10" version of the Nexus was not actually available in my country also influenced the decision.

In the end I decided that the TF700T had a high enough resolution for my tastes. In fact, screen resolution is the reason why I finally got the Transformer Infinity rather than the Padphone, this other brilliant idea from Asus of having a smartphone that gets embedded in a tablet which is itself essentially like a Transformer (supporting the same keyboards/docking stations). If the tablet component of the Padphone didn't lag behind the actual Transformer line in terms of hardware, I would have definitely shelled the extra euros to get it.

And while we're talking about screen, I'm among those that doesn't like the wide formats (16:9, 16:10) which is currently standard (if not the only option) for monitors; however, I do believe that these formats are a good idea compared to the 4:3 ratio of auld times and modern iPads when it comes to tablets.

Indeed, the most annoying part about widescreen monitors (for computers) is that a lot of the available screen estate is wasted for many common usages (everything that revolve around text, essentially), and while they do come handy with multiple text windows side by side or when forced to read long-ass lines (such as some wide tables and stuff like that), they are not really a good alternative to just a bigger, higher-resolution (4:3) display, as the widescreen allows reading longer text lines at the expense of the number of text lines. And like or not, much of our computer usage (even if it's just social networking) still revolves around text.

However, what is annoying in computer monitors is actually a bonus point on tablets: since these devices are often used in portrait mode, with the longer dimension being kept vertical, their widescreen format is actually a long screen format, keeping more text lines in view and requiring less page-flipping. And it's not only about text: when reading full-page comics, the widescreen (longscreen) format actually wastes less screen estate, typically, than the 4:3 format, at least in my experience (but then again it might depend on what format the comics you read are in).

Stuff I don't like

Although I'm overall pretty satisfied with the Transformer Infinity hardware, there are a few things I don't like.

The first issue I have is with the glossy display. I have an issues with glossy displays in general, not just on this tablet. I hate glossy displays. I find it astounding that after years of efforts to make computer monitors anti-glare, no-reflection and overall less straining for the eyes of long-term users (when computers meant office space and work), the last 10 years have seen this fall back to displays that can only be decently used in optimal lighting conditions.

And no, no IPS (plus or nonplus) or other trick is going to solve the problem of a horrendously reflective surface. It helps, but it doesn't solve the problem. Even my colleague, die-hard Mac fan, finally had to acknowledge that the purportedly unreflective glossy display in the latest MacBook Pros is still more straining than the crappiest matte display in suboptimal lighting conditions, i.e. almost always.

But sadly, matte displays don't seem to be an option on tablets, that I can see (ebook readers do have them, though). So regardless of how much it bothers me, there seems to be no alternative to watching myself in the mirror when looking at darker content. Seriously, can some manufacturer come up and offer matte displays please? I'm even willing to spend a couple of extra euros for that (not 50, but up to 10 I would accept, even knowing that the process costs just a few cents per display).

The second thing I don't like about the Transformer Infinity is the connector. When I unpacked the TF700T (which I got before the docking keyboard), I was seriously pissed. What the heck Asus, I spend my days mocking iPad users for their ass proprietary connector and you play this dirty trick on me? Not cool.

Then I realized that the connector is actually the same connector that ties the tablet to the docking keyboard, and I realized that a standard USB port would have not made sense. While the external USB port of the dock, its SD card slot and the keyboard could possibly all have been made accessible to the tablet by presenting the dock as an unpowered USB hub, it would have been impossible to also allow charging the pad from the dock, which is in fact one of the most useful feature the Transformer has.

Tightly related to the connector issue is the power issue: although the other end of the power cable for the Transformer is a standard USB cable, you can't typically charge it from a standard power source, except for slow trickle charging with the device off or at least in standby, due to the power draw. Asus' own “wall wart” (the wall plug/USB adapter), on the other hand, detects the presence of the Transformer and can feed it more current (15V, 2A if I'm not mistaken) thereby allowing faster charging and making it possible to charge the device even while in use (very useful for bedside use at the end of the day, when the device battery is likely to be close to exhaustion)

I honestly wouldn't mind if the industry came up with a common standard that worked around the current limitations of USB, at least for device charging: even the N900, Nokia's best although now obsolescent phone, is a little picky on its power source, refusing to charge from some low-cost ‘universal’ chargers (it does work correctly with the wall wart that shipped with an HTC smartphone we bought a couple of years ago, so it does work with non-Nokia chargers).

Finally, not really necessary but a bonus point of docking keyboard would have been an Ethernet port: Asus themselves have found a very smart way to keep the port ‘thin’ when not in use, a solution that they use in their latest ultrabooks (or whatever you want to call thin, 11" laptops), a solution that I believe could be employed also in the docking keyboard of the Transformer. Its absence is not really a negative point (how often are you going to need to use a network cable with a device as portable as a tablet), but would have been a nice addition for its use in netbook form.

The software

The Transformer Infinity is an Android tablet. Most of what I'm going to say here is therefore about Android in general, except for the few things that have been ‘enhanced’ or otherwise changed by Asus.

Android, for me, is interesting, because it's probably the first (successful) example of large-scale ‘macroscopic’ deployment of the Linux kernel beyond the ‘classic’ server or workstation use (only recently trickled down into domestic use with Ubuntu and related distributions). (By macroscopic I am here referring to the systems with which the user interacts frequently, thereby excluding embedded systems —think of the many Linux-based ADSL modem/routers.)

While Android shares a very important part of its core with ‘classic’ Linux distributions (and even there, not really, since the Linux kernel in Android is heavily modified and it has only been recently that its changes have started trickling upstream into the main Linux source), the userspace part of Android, and specifically the middleware, the software layer between the Linux kernel and the actual user applications, is completely different.

Because of this, Android is actually the first system that suddenly motivates the FSF insistence on having the classic Linux systems be called GNU/Linux rather than simply Linux. On the other hand, the userspace in classic Linux system is not just GNU (and it's not like the X server, or desktop environments such as KDE, are insignificant components), so isn't just GNU/Linux just a little arrogant?

But I digress. The fact that Android is not a classic Linux distribution, however, is an important point, especially for someone like me, for reasons that I'm going to explain in the following.

Android, much like iOS, is an operating system designed for devices whose main target use is (interactive) consumption rather than production. Sure, there are applications available for both systems that can exploit the device features in creative ways, but even these are mostly focused on personal entertainment than anything else.

It's not like the operating systems actively prevent more sophisticated and heavy-duty usages: it's just that they don't particularly encourage it, since the usually limited hardware of the devices they run on wouldn't make productivity particularly comfortable.

After all, even I, a tinkerer and power user, finally bought the tablet having comic book reading in mind as its primary use (although admittedly 700€ is a little too much for just that).

For this intended target, Android is exceptionally well-designed. Thanks also to the very tight integration with the wide range of ‘cloud’ services offered by Google, it provides a very functional environment right from the start; and since it's “all in the cloud”, you don't even have to worry about synchronization among devices. All very fine and dandy —as long as you don't care having all your data in the hands of a single huge company whose main interest is advertising.

As if selling your soul to Google wasn't enough, Asus adds some of its own, by keeping track of your device with ridiculously extreme precision, even when geolocation services are disabled. I would recommend not carrying your Asus-branded Android device when committing crimes (I don't actually know if this “phoning home” thing is Asus-specific or general for Android), but the feature could come in handy if somebody stole it.

Aside from these creepy aspects, as I was saying, Android is actually quite nice. The software choice so far has also been rather satisfactory: aside from the games that I bought from the Humble Indie Bundles, the must-have Simon Tatham's Puzzles collection (which I have everywhere) and a few others available for free (many with ads), the most important piece of software I took care of installing was Perfect Viewer, which I obviously used mostly as a comic book reader.

In fact, Perfect Viewer is an excellent example to introduce what I really hate about Android: control. Perfect Viewer has a very useful feature, which is the ability to access files on remote machines. Why is this feature useful? Because Android doesn't provide it by default.

This, in my opinion, is a horrible failure on the part of the operating system: it should be its duty, after all, to provide a unified method to access remote files, which would be transparently available to all applications. Why should every application reimplement this basic functionality? Result: you can't peruse your home-server-stored media collection from VLC on Android, but you can peruse your graphics novel collection because one application went the extra mile to implement the feature.

The failure of Android to actually provide a built-in method to access remote directories is particularly grave considering that there is no practical reason why this shouldn't be available: the kernel (Linux) is very apt at mounting remote filesystems with a variety of protocols, and Android itself is already designed to expose mount points transparently to applications (easily seen when making use of the (micro-)SD card slots available on devices that provide them).

So not providing the possibility to mount remote shares is actually a design choice that require disabling feature Linux can provide. And I find it interesting that the web offers a plethora of tutorials to guide people through the gimmicks necessary to make the feature available (gimmicks that include ‘rooting’ your device to gain complete control of it). I find this interesting because it shows that it's not just ‘power users’ like me that need this feature (unsurprisingly, as media collections on home servers are common, and growing in popularity, and tables are good for media consumption —if you have a way to actually access the stupid media).

One is taken to wonder why is this feature not available in Android's stock builds. Sadly, the only reason I can think for this is that this forces people to unnecessarily use online services (such as —oh right— the ones offered by Google) that provide selective, targeted, and pricier alternatives to the widespread home server approach.

But I see this as a single instance of a more general problem with Android, i.e. the lack of control from the user. There's a lot in Android happening ‘behind the scenes’, and much of it is something which is intentionally hidden from the user, and which the user is actively prevented from operating on.

While the devices where Android runs are general purpose devices (like all computers), the operating system is designed to only allow exposure to selected features in selected ways. And even though it's not as bad as the competition (for example, in contrast to iOS, enabling “out of band” installations, i.e. installation of applications not downloaded from “official” channels like the Google Play Store, is a simple option in the settings), it's still a strong contribution to the war against general purpose computing (a topic on which I have a lot of things to say but for which I haven't yet found the time to patiently write them down).

Compare this with Maemo, the stock operating system of the N900 (ah, sorry, that's in Italian only for the time being): a full-fledged Debian-based Linux distribution; while the device is still very easy to operate, the underlying power and flexibility of the (GNU and more) classic Linux userspace remains accessible for those that want it. Maemo showed pretty clearly that you don't need to sacrifice power and flexibility to offer ease of use —unless that's what you actually want to do. And the fact that you do want to do that is for me an extremely negative sign.

There are efforts to make Android more power-user friendly, even without requiring hacks or rooting, the most significant probably being applications such as Irssi ConnectBot and the Terminal IDE. However, there's only so much they can do to work around some intrinsic, intentional deficiencies in the operating system.

Some conclusions

Ultimately, I was actually hoping to be able to exploit the convertible nature of the Transformer Infinity to make the device supplant my current ‘bedtime computer’, a Samsung N150 netbook that has been faithfully serving us since we bought it when we married.

At the hardware level, the TF700T could quite easily do it: it has the same screen size, with higher resolution (although the Samsung display does have the benefit of being matte), the keyboard is similarly sized, the battery (especially when docked) lasts longer, the CPU is better, the GPU is better, the webcam is better (and there are two of them), the amount of RAM is the same. There are some things in which the Transformer falls behind, such as having a smaller hard-disk, or less connectors, but these are things that don't normally have a weight in the usage I have for my netbook.

Where the Transformer falls really behind, though, is in the software space. The netbook has Windows preinstalled (and I'm keeping it that way so for those rare emergencies where an actual Windows installation might be needed), but I have a nifty USB pendrive with a Linux distribution on it, which I boot from to have exactly the system that I need: all of tools are there, it's configured to behave the way I want it, and so on and so forth. On the TF700T, I can't boot from the USB pendrive (I'd have to prepare a new one anyway, because of the different hardware architecture, but that wouldn't be difficult). I wouldn't even need to, in fact, if only Android wasn't such a crippled Linux.

It's no surprise that there are efforts underway to be have both the Android and the more classical Linux userspaces are your hands, such as this one, that gives you both an Android and Debian systems running on the same (Android) kernel. It's not perfect (for example, it still relies heavily on the Android subsystem for keyboard handling, and the X server must be accessed ‘remotely’), but it's a step in the right direction.

My ideal system? A Dalvik virtual machine running on something like Maemo or Meego. There actually was a company (Myriad Group) working on something like this (what they call “Alien Dalvik”): not open source, though, nor universally accessible (it's for OEMs, apparently). Pity.


I like (with a couple of caveats) the hardware of the Transformer Infinity (TF700T) and its capability of becoming a netbook by adding the mobile dock. I wish the software (Android) were friendlier to power users, though, to better exploit this.

GPGPU: what it is, what it isn't, what it can be

A little bit of history

My first contact with GPGPU happened during my first post-doc, although for reasons totally unrelated to work: an open-source videogame (an implementation of the Settlers of Catan boardgame) I was giving small contributions to happened to have a header file which was (legitimately) copied over from the GPGPU programming project.

This was 2006, so the stuff at the time was extremely preliminary and not directly supported by the hardware manufacturers, but it did open my eyes to the possibility of using graphic cards, whose main development was geared towards hard-core gaming, for other computational purposes, and particularly scientific ones.

So where does GPGPU come from?

The term GPU (Graphic Processing Unit) emerges in the mid-90s, to describe graphic cards and other video hardware with enough computational power to take care of the heavy-duty task of rendering complex, animated three-dimensional scenes in real time.

Initially, although GPUs were computationally more gifted than their predecessors whose most complex task was blitting (combining rectangular pixel blocks with binary operators such as AND, OR or XOR), their computational power was limited to a set of operations which is nowadays knows as the “fixed-functions pipeline”.

The barebone essentials you need to render a three-dimensional scene is: a way to describe the geometry of the objects, a way to describe the position of the light(s), and a way to describe the position of the observer. Light and observer positions are little more than points in three-dimensional space (for the observer you also need to know which way is ‘up’ and what his field of view is, but those are details we don't particularly care about now), and geometries can be described by simple two-dimensional figures immersed in three-dimensional space: triangles, squares. Of course, since simple colors will not get you far, you also want to paint the inside of these triangles and squares with some given pictures (e.g. something that resembles cobblestone), a process that is called ‘texturing’.

Once you have the geometry (vertices), lights and observer, rendering the scene is just a matter of doing some mathematical operations on them, such as interpolation between vertices to draw lines, or projections (i.e. matrix/vector products) from the three-dimensional space onto the two-dimensional visual plane of the observer. Of course, this has to be done for every single triangle in the scene (and you can have hundreds, thousands, hundreds of thousands or even millions of triangles in a scene), every time the scene is rendered (which should be at least as often as the screen refreshes, so at least some 50, nowadays 60, times per second).

Fixed-function pipelines in GPUs are therefore optimized for very simple mathematical operations, repeated millions (nowadays even billions) of times per second. But as powerful as you can get, there are limits to where simple triangles and a naive lighting model can get you: and this is why, by the end of the XX century, hardware support for shaders started popping up on GPUs.

Shaders are programs that can compute sophisticated lighting effects (of which shadows are only a small part). Since the effects that may be achieved with shaders are very varied, they may not be implemented within the classic fixed-function pipeline. Dedicated computational hardware that could execute these programs (called kernels) had to be introduced.

And suddenly, video cards were not fixed-function devices anymore, but had become programmable, even though still with limitations and peculiar behavior: shader kernels are programs that gets executed on each vertex of the geometry, or on each pixel of the scene, and only a limited number of computational features were initially available, since the hardware was still designed for the kind of manipulation that would be of interest for 3D rendering.

However, with all their limitations, GPUs now had a very interesting feature: you could tell them to do a specific set of operations on each element of a set (vertex, pixel). The essence of parallel programming, with hardware designed specifically for it. So why not abuse this capability to do things which have nothing to do with three-dimensional scene rendering?

This is where GPGPU started, with some impressive (for the time) results. Of course, it was all but trivial: you had to fake a scene to be rendered, pass it to the card, ask it to render it and manipulate the scene data with some shader kernels, and then get the resulting rendered scene and interpret it as the result of the computation. Possible, but clumsy, so that a number of libraries and other development tools started appearing (such as Brook) to make the task easier.

As usage of GPUs for non-graphical tasks spread, hardware manufacturers started to realize that there was an opportunity in making things easier for developers, and the true power of GPGPU was made available.

The first ‘real’ GPGPU solutions started appearing between 2006 and 2007, when the two (remaining) major GPU manufacturer (ATi —shortly after acquired by AMD— and NVIDIA) realized that with minimal effort it was possible to expose the shader cores of the GPU and make them available beyond the simple scope of scene rendering.

Although buffers, texture engines and shader cores were now made accessible outside of the rendering pipeline, their functional behavior was not altered significantly, something that has a significant impact on their optimal usage patterns and some behavior peculiarities that inevitably arise during the use of GPUs as computing devices.

The GPU is (not) my co-processor

Before the Pentium, Intel (and compatible) CPUs had very limited (floating point) math capabilities, since they were deemed unnecessary for the common market. If you were a scientist or other white-collar worker that needed fast floating-point computations, you could however shell money for an FPU (Floating-point Unit), an auxiliary processor specialized in floating-point operations; these units were marked with a numerical code which was the same as the CPU, except for the final digit, a 7 instead of a 6: so you would have the 8087 next to an 8086, or a 387 next to a 386; and by ‘next’ I mean physically next to it, because the socket where the FPU had to be inserted was typically adjacent to the socket of the CPU.

The FPU as a co-processor started disappearing with the 486, which had two variants, whose high-level one (the 486DX) had an FPU integrated in the actual CPU. With the introduction of the Pentium, the FPU started being a permanent component of the CPU, and it started evolving (it had remained essentially unchanged since the inception) to support the famous extended ‘multimedia’ instruction sets (MMX, 3DNow!, the various SSE generations, up until the latest AVX extension) of subsequent CPUs. (And by the way, the fact that the FPU and MMX functionalities were implemented in the same piece of hardware had a horrible impact on performance when you used both at the same time. But that's a different topic.)

One of the tenets of GPGPU (marketing) is that the GPU can be used as a co-processor of the CPU. However, there are some very important differences between a co-processor like the FPU, and the GPU.

First of all, the FPU was physically attached to the same bus as the CPU, and FPU instructions were part of the CPU instruction set: the CPU had to detect the FPU instructions and either pass control to the FPU or decode the instructions itself and then pass the decoded instruction to the FPU. Secondly, even though the FPU has a stack of registers, it doesn't have its own RAM.

By contrast, the GPU is more like a co-computer: it has its own RAM, and its own instruction set of which the CPU is completely unaware. The GPU is not controlled by the CPU directly, as it happens with a co-processor, but rather the software driver instructs the CPU to send the GPU specific bits which the GPU will interpret as more or less abstract commands such as “load this program (kernel)”, “copy this data”, “execute this other program (kernel)”.

Since all communication has to go through the PCI bus, naively using the GPU as a coprocessor is extremely inefficient: most of the time would be spent just exchanging commands and data; this, in fact, was one of the reasons why the old GPGPU approach based on the graphics stack ended up consistently underperforming with respect to the expectable GPU speed.

The most efficient use of the GPU is therefore as an external machine, communication with which should be limited to the bare minimum: upload as much data as possible at once, load all the programs, issue the programs in sequence, and don't get any (intermediate) data back until it's actually needed on the CPU. It's not just a matter of offloading heavy computations to the GPU: it's about using a separate, complex device for what it was designed for.

How much faster is a GPU?

When the GPGPU craze found its way into marketing (especially with NVIDIA's push for their new CUDA technology), the GPUs were boasted as cheap high-performance co-processors that would allow programs to reach a speed-up of two orders of magnitude (over a hundred times faster!), and a large collection of examples showcasing these incredible benefits started coming up. The orders of magnitude of speed-ups even became the almost only topic of the first published ‘research’ papers on the subject.

Although such incredible speed-ups are quite possible when using GPUs, the reality is quite more complex, and a non-negligible part of these speed-ups are actually possible even on standard CPUs. To understand more in detail what practical speed-ups can be expected, we have to look at the fundamental areas where GPUs perform (potentially) much better than CPUs (computational power and memory bandwidth), and the conditions under which this better performance can actually be achieved.

Faster memory (?)

Let us look at memory first. It's undeniably true that GPU memory is designed to have a much higher bandwidth than the RAM normally mounted on the computer motherboard (hereafter referred to as ‘CPU memory’): in 2007, when the GPGPU started being officially supported by hardware manufacturers, GPUs' memory had peak theoretical bandwidths ranging from 6.4 GB/s (on low-end GPUs using DDR2 chips for memory) to over 100 GB/s (on high-end cards using GDDR3 chips). By comparison, CPUs usually had DDR2 chips, whose performance ranges from 3.2 GB/s to 8.5 GB/s. Now (2012) GPUs can reach bandwidths of almost 200 GB/s with GDDR5 memory, whereas the best CPUs can hope for is less than 20 GB/s on DDR3.

Since the bandwidth is almost consistently an order of magnitude higher on GPUs than on CPUs, one should expect an order of magnitude in speed-up for a problem that is memory-bound (does a lot of memory access and very little computations), assuming it can get close to the theoretical bandwidth peak and assuming the data is already on the device.

We'll talk about the problem of putting data on the device later on, but we can mention a few things about reaching the peak bandwidth already, without getting into too much details.

The silver lining in the higher bandwidth on GPUs is latency. While CPU to (uncached) RAM access latency is usually less than 100ns, on GPUs this is 3 to 5 times higher; and the first GPUs had no cache to speak of (except for textures, but that's a different matter, since textures also have lower bandwidth). Of course, GPUs have specific methods to cover this high latency: after all, a GPU is optimized for moving large slabs of data around, as long as such data is organized ‘appropriately’, and the memory access are designed accordingly.

Therefore, memory-bound GPU algorithms have to be designed in such a way that they make as much as possible use of these latency reduction techniques (coalescing on NVIDIA GPUs, fastpath usage on AMD GPUs), lest they see their performance drop from being 10 times faster than on CPU to being no more than 2 or 3 times faster. These remarks are particularly important for the implementation of scatter/gather or sorting algorithms.

Faster computing (?)

Of course, where GPUs really shine is not in juggling data around, but in doing actual computations on them: gamer GPUs passed the (theoretical) teraFLOPS barrier in 2008 (Radeon HD 4850), when the best (desktop) CPU of the time fell short of achieving some theoretical 60 gigaFLOPS, and most common ones couldn't dream of getting half as much.

But from 20 gigaFLOPS to 1 teraFLOPs there's only a factor of 50: so where do the claimed two orders of magnitude in speedup come from? Unsurprisingly, the difference comes from a consistent underutilization of the CPUs. We'll leave that aside for the moment, though, and focus instead on the impressive (theoretical) performance sported by GPUs.

The first thing that should be mentioned about GPUs is that they are not designed to be fast in the sense that CPUs are fast. For years, CPU performance was strongly tied to the frequency at which it operates, with a theoretical upper limit of one instruction per cycle, which would mean that a CPU running at 1GHz couldn't do more than one (short) billion operations per seconds. These days, the fastest desktop CPUs run at over 3GHz, while the fastest GPUs have computing clocks which are at about 1GHz, or even less.

However, GPUs are designed for massively parallel tasks, such as running a specific sequence of instructions on each element of a set of vertices or pixels, with each element being processed independently (or almost independently) from the other. The shaders in GPUs are made up by a large number of processing elements collected in multiprocessors, with each multiprocessor capable of executing the same single instruction (sequence) on a large number of elements at once.

In some sense, GPUs can be seen as a collection of SIMD (Single Instruction, Multiple Data) processors (typically, 10 or more), each with a very wide vector width (typically 32 for NVIDIA, 64 for AMD); while modern CPUs are also SIMD-capable, with their MMX and SSE instructions, and can also sport multiple cores, they have less SIMD lanes (typically 4 or 8), and less cores (2 to 6) than GPUs.

The GPU programming tools and languages expose this massive parallel computing capability, and make it very easy to exploit it. The simplest GPU programs consist in a kernel, i.e. sequence of instructions that are to be executed (typically) on a single element of a set, which is given to the GPU to be run on all the elements of a given set.

By contrast, exploiting the vector capabilities of modern CPUs and their multiple cores require complex programming techniques, special instructions which are barely more abstract than their hardware counterparts, and complex interactions with the operating system to handle multiple threads to be distributed across the cores.

In other words, it's much easier to exploit the massively parallel nature of the GPUs than it is to exploit the available parallel computing capabilities of the CPUs. And this is where the two orders of magnitude in performance difference come from: CPUs are rarely used as more than single-core scalar processors.

Still, even when comparing well-designed CPU programs with well-designed GPU programs it's not surprising to see minutes of runtime be reduced to seconds. If you see less than that, you're probably doing something wrong, and if you're seeing much more than that, your CPU program is probably far from being optimal.

The question then becomes: how hard is it to write a well-designed GPU program as opposed to a well-designed CPU program? But this is a question I'll leave for later. For the time being, let's just leave it at: non-trivial.

Up and down (loads)

As previously mentioned, GPUs normally have their own memory, separate from the system memory (this is not true for integrated GPUs, but they deserve a separate paragraph). Therefore, using the GPU involves transferring data to it, and then retrieving the results when the kernel(s) have finished their operation.

The time spent uploading data to the GPU and downloading data from it is not necessarily insignificant: through a PCI express 2.0 ×16 link you can hope for an 8 GB/s transfer rate, but 5 or 6 GB/s are more likely; and this is pretty close to being the top of the line. When compared to the GPU or even the CPU memory bandwidth, this can be a very significant bottleneck.

This, combined with the relatively small amount of memory available on GPUs (less than a gigabyte when GPGPU started, slightly over the gigabyte four years later), poses an interesting paradox on the convenience of GPUs.

On the one hand, GPUs are most convenient when processing large amounts of data in parallel: this ensures, together with well-designed algorithms, that the GPU hardware is used full-scale for an adequate amount of time.

On the other hand, there's a limit to the amount of data you can load at once on the GPU: desktops today are commonly equipped with 4 gigabytes of RAM or more (dedicated workstations or servers can easily go in the tens of gigabytes), so they can typically hold larger amounts of data. The only way to process this on standard desktop GPUs is to do it in chunks, which means uploading the first chunk, processing it, downloading the result, uploading the new chunck, and so on.

Luckily enough, GPUs are typically equipped with asynchronous copy engines, so the situation is not as dreary as it would be. In many cases, especially with modern devices, it is possible to overlap computations and host/device data transfers so as to hide the overhead of the data exchange. This, in fact, is one of the many ways in which GPGPU programming can become complex when optimal performance is sought.

Is it still worth it?

Even if two orders of magnitude may not be possible to achieve without extensive programming efforts to produce extremely optimized code, the simple order of magnitude one can get for most trivially parallelizable problems is most often worth the time necessary to reimplement computationally-heavy code to use GPGPU.

One of the most interesting feature of the shared memory parallel programming approach needed for GPGPU is that, when it can be employed, it's a much more future-proof way of coding than serial execution. The reason is clear: while serial execution can only improve by using faster processors (and there are upper physical limits which are getting closer by the year to how fast a scalar processor can go), parallel algorithms can get faster by ‘just’ adding more computationa units. In theory, a perfectly parallel algorithm will take half the time to run on twice the cores, and while the reality is less ideal, the performance gain is still quite perceptible.

The hidden benefits of GPGPU

In many ways, the most important contribution of GPGPU to the domain of computer science and software engineering has not been the actual performance benefits that a lot of fields have seen from the availability of cheap parallel computing platforms.

There's indeed a much longer-term benefit, that will be reaped over the next years, and it's precisely the shift from serial to parallel programing we just mentioned. Before GPGPU, parallel programming was left to the domain of expensive computational clusters and sophisticated programming techniques; GPGPU has shown that there are huge opportunities for shared-memory parallel programming even on the lower end of the market.

The reborn interest in parallel programming triggered by GPGPU is gradually leading to the development of an entirely new mentality both in terms of software development and hardware realities. Although it is still to be seen how it will ultimately pan out, there are significant signs that we're only starting to scratch the surface of technologies that can revolutionize computing to an extent that could only be compared with the effects of the commercialization of the Internet twenty years ago, and the introduction of the first microcomputers twenty years before that.

GPGPU is bleeding out of the GPU market, in an interesting combination of paradoxical feedbacks and returns. There are people that have implemented ray-tracing using GPGPU: the GPUs go back to their intended task, but through features that were designed to make them usable outside of their domain. At the same time, CPUs gain more powerful parallel computing features, and integrated CPU/GPU solutions bring the GPU more in line with the standard co-processor role marketing wanted to sell GPGPU for.

We are starting to see a convergence in technolgy. At this point, the only danger to the rich potential of this dawning era is the petty commercial interest of companies that would rather see the market collapse under fragmentation than prosper without their dominance.

Let us hope that this won't happen.

Linux and the desktop

Discussing this after the recent debate that involved big names such as Linus Torvalds and Miguel de Icaza may seem a little inappropriate, but I guess I'll have to count this against my usual laziness for writing stuff up when I think it instead of waiting for it to become a fashion.


The reason why the topic has recently re-emerged (as it periodically does) has been a write-up by the afore-mentioned Miguel de Icaza, titled What killed Linux on the desktop.

According to the author of the rant, there are two main reasons. The first was the generally disrespect for backwards compatibility, in the name of some mystic code purity or code elegance:

We deprecated APIs, because there was a better way. We removed functionality because "that approach is broken", for degrees of broken from "it is a security hole" all the way to "it does not conform to the new style we are using".

We replaced core subsystems in the operating system, with poor transitions paths. We introduced compatibility layers that were not really compatible, nor were they maintained.

paired with a dismissing attitude towards those for which this interface breakage caused problems. The second problem has been, still according to de Icaza, incompatibility between distributions.

Miguel de Icaza then compares the Linux failure with the success of Apple and its operating system, which apparently did things “the way they should have been done” (my words), highlighting how Mac OSX is a UNIX system where things (audio, video, support for typical content formats) works.

Karma whoring

There's little doubt that “Linux on the desktop” is a hot topic with easy polarizing and bandwagoning. By mentioning three of the most famous problematic areas in the most widely adopted Linux distribution(s) (audio support, video support and support for proprietary audio and video formats), de Icaza basically offered the best bait for “me-too”-ing around the Internet.

And unsurprisingly this is precisely what happened: a lot of people coming up with replies to the article (or referencing the article) with little more than “yeah! exactly! these were exactly the problems I had with Linux!”.

An optimist could notice how many people have reacted this way, combine with those that have reacted the opposite way (“BS, I never had a problem with Linux!”) and be happy about how large the pool of (desktop) Linux users and potential users is. On the other hand, the whole point of the article is to (try and) discuss the reasons why many of these are only potential users, why so many have been driven off Linux despite their attempts at switching over to it.

Linus, Linux and The CADT model

The first point of de Icaza's critique is nothing new. It's what Jamie Zawinski coined the term CADT, Cascade of Attention-Deficit Teenagers, for. However, the way in which de Icaza presents the issue has two significant problems.

One is his use of «we», a pronoun which is somehow supposed to refer to the entire Linux developer community; someone could see it as a diplomatic way of not coming out with the specific names and examples of developers and project that break backwards compatibility every time (which would be ‘bad form’), while at the same time putting himself personally in the number of people that did so.

The other is how he tries to follow the dismissing attitude back to Linus Torvalds, which by position and charisma may be considered the one that «sets the tone for our community», assuming that Linus (and the kernel development community) feeling free to break the internal kernel interfaces even at minor releases somehow give userspace developers the entitlement to do the same about external interfaces.

These two points have sparked a debate in which Linus himself (together with other important Linux personalities) intervened, a debate that has made the news. And the reasons for which the debate sparked is that these two points are among the most critical issues indicating what's wrong with the article. Since in the debate I find myself on the opposite camp of Miguel de Icaza (and, as I found out later, mostly in Linus' camp), I'm going to discuss this in more detail, in a form that is more appropriate for an article than for a comment, as I found myself doing so far.

Kernel, middleware and user space

I'm going to start this explanation with a rough, inadequate but still essential description of the general structure of a modern operating system.

First of all, there's the kernel. The kernel is a piece of software that sits right on top of (and controls and receives signal from) the hardware. It abstracts the hardware from the rest of the operating systems, and provides interfaces to allow other pieces of the operating system to interact with the hardware itself. Linux itself is properly only the kernel, which is why a lot of people (especially the GNU guys) insist on calling it GNU/Linux instead; after all, even Android uses the Linux kernel: it's everything else that is different.

By application one usually means the programs that are executed by the user: web browsers, email clients, photo manipulation programs, games, you name it. These user space applications, which is what users typically interact with, don't usually interact directly with the kernel themselves: there's a rather thick layer of libraries and other programs that ease the communication between user space applications and the kernel. Allow me to call this layer ‘middleware’.

Example middleware in Linux and similar systems includes the first program launched by the kernel when it finished loading (typically init), the C library (libc, in Linux often the one written by the GNU project) and things that manage the graphical user interface, such as the X Window System (these days typically provided by the server in Linux).

All the components of the various layers of the operating system must be able to communicate with each other. This happens through a set of interfaces, which are known as APIs (Application Programming Interfaces) and ABIs (Application Binary Interfaces), some of which are internal (for example, if a kernel module has to communicate with something else inside the kernel, it uses an internal kernel API) while others are external (for example, if the C library needs to communicate with the kernel, it does so using an external kernel API).

Interface stability and application development

Let's say that I'm writing a (user space) application: a photo manipulation program, an office suite, whatever. I'm going to develop it for a specific operating system, and it will be such a ‘killer app’ that everybody will switch to that operating system just for the sake of using my application.

My application will use the external interfaces from a number of middleware libraries and applications (for example, it may interface with the graphics system for visualization, and/or with the C library for file access). My application, on the other hand, does not care at all if the internal interfaces of the kernel, or of any middleware component, change. As long as the external interfaces are frozen, my application will run on any future version of the operating system.

A respectable operating system component never removes an interface: it adds to them, it extends them, but it never removes them. This allows old programs to run on newer versions of the operating system without problems. If the developers think of a better way to do things, they don't change the semantics of the current interface; rather, they add a new, similar interface (and maybe deprecate the old one). This is why Windows APIs have call names with suffixes such as Extended and whatever. This is why we still have the (unsafe) sprintf alongside the (safe) snprintf in the POSIX C library specification.

Let me take the opportunity to highlight two important things that come from this.

One: the stability of internal interfaces is more or less irrelevant as far as user space applications are concerned. On the other hand, stability of external interfaces is extremely important, to the point that it may be considered a necessary condition for the success of an operating system.

Two: it may be a little bit of a misnomer to talk about interface stability. It's perfectly ok to have interfaces grow by adding new methods. What's important is that no interface or method is removed. But we'll keep talking about stability, simply noting that interfaces that grow are stable as long as they keep supporting ‘old style’ interactions.

Are Linux interfaces stable?

Miguel de Icaza's point is that one of the main reasons for the failure of Linux as a desktop operating system is that its interfaces are not stable. Since (as we mentioned briefly before) interface stability is a necessary condition for the success of an operating system, his reasoning may be correct (unstable interfaces imply unsuccessful operating system).

However, when we start looking at the stability of the interfaces in a Linux environment we see that de Icaza's rant is misguided at best and intellectually dishonest at worst.

The three core component of a Linux desktop are the kernel, the C library and the X Window System. And the external interfaces of each of these pieces of software are incredibly stable.

Linus Torvalds has always made a point of never breaking user space when changing the kernel. Although the internal kernel interfaces change at an incredible pace, the external interface is a prime example of backwards compatibility, sometimes to the point of stupidity. { Link to round table with Linus mentioning examples of interfaces that should have never be exposed or had issues, but were still kept because programs started relying on the broken behavior. }

A prime example of the interface stability is given by the much-critiqued sound support, which is an area where the Linux kernel has had some drastic changes over time. Sound support was initially implemented via the ironically-named Open Sound System, but this was —not much later— replaced by the completely different Advanced Linux Sound Architecture; yet OSS compatibility layers, interfaces and devices have been kept around since, to allow old applications using OSS to still run (and produce sound) on modern Linux versions.

This, by the way, explains why Linus was somewhat pissed off at de Icaza in the aforementioned debate: if developers in the Linux and open source worlds had to learn anything from Linus, it would have been to never break external interfaces.

Another good example in stability is given by the GNU C Library. Even though it has grown at an alarming pace, its interface has been essentially stable since the release of version 2, 15 years ago, and any application that links to libc6 has forward-compatibility essentially guaranteed, modulo bugs (for example, the Flash player incorrectly used memcpy where they should have used memmove, and this resulted in problems with audio in Flash movies when some optimizations where done to the C library; this has since been fixed).

But what is the most amazing example of stability is the X Window System. This graphical user interface system is famous for having a client/server structure and being network transparent: you can have ‘clients’ (applications) run on a computer and their user interface appear on another computer (where the X server is running). X clients and server communicate using a protocol that is currently at version 11 (X11) and has been stable for 25 years.

The first release of the X11 protocol was in 1987, and an application that old would still play fine with an X11 server of today, even though, of course, it wouldn't be able to exploit any of the more advanced and sophisticated features that the servers and protocol have been extended with. The heck, Linux didn't even exist 25 years ago, but running on Linux would still be perfectly able to support an application written 25 years ago. How's that for stability?

If the three core components of a Linux desktop operating system have been so stable, why can Miguel de Icaza talk about “deprecating APIs”, “removing functionality” and “replacing core subsystem”, and still be right? The answer is that, of course, there have been some very high-profile cases where this has happened.

The prime example of such developer misbehavior is given by GNOME, a desktop environment, something that sits on top of the graphical subsystem of the operating system (X, in the Linux case) and provides a number of interfaces for applets and applications to present a uniform and consistent behavior and graphical appearance, and an integrated environment to operate in.

Applications can be written for a specific desktop environment (there are more than one available for Linux), and for this it's important for the desktop environment (DE, for short) to provide a stable interface. This has not been the case with GNOME. In fact, the mentioned CADT expression was invented specifically for the way GNOME was developed.

We can now start to see why Linus Torvalds was so pissed off at Miguel de Icaza in the mentioned debate: not only the Linux kernel is one of the primary examples of (external) interface stability, so trying to trace CADT to Linus is ridiculous, but GNOME, of which Miguel de Icaza himself has been a prominent figure for a long time, is the primary example of interface instability.

The «we» Miguel uses to refer to the open source and Linux community as a whole now suddenly sounds like an attempt to divert the blame for a misbehavior from the presenter of the argument itself to the entire community, a generalization that has no basis whatsoever, and that most of all can't call for Linus as being the exemplum.

Ubuntu the Breaker

Of course, the GNOME developer community is not the only one suffering from CADT, and in this Miguel is right. Another high-profile project that has had very low sensitivity to the problem of backwards compatibility in the name of “the new and the shiny” is Ubuntu.

This is particularly sad because Ubuntu started with excellent premises and promises to become the Linux distribution for the ‘common user’, and hence the Linux distribution that could make Linux successful on the desktop. And for a few years, it worked really hard, and with some success, in that direction.

But then something happened, and the purpose of Ubuntu stopped being to provide a solid desktop environment for the ‘common user’, and it started being the playground for trying exciting new stuff. However, the exciting new stuff was brought forward without solid transition paths from the ‘old and stable’, with limited if any backwards compatibility, and without any solidification process that would lead the exciting new stuff to actually be working before gaining widespread usage.

This, for example, is the way PulseAudio was brought in, breaking everybody's functioning audio systems, and plaguing Ubuntu (and hence Linux) with the infamous idea of not having a working audio system (which it still has: ALSA). Similar things happened with the other important subsystems, such as the alternatives to the System V init traditionally used (systemd and upstart); and then with the replacement of the GNOME desktop environment with the new Unity system; And finally with the ‘promise’ (or should we say threat) of an entirely new graphical stack, Wayland, to replace the ‘antiquate’ X Window System.

It's important to note that none of these components are essential to a Linux desktop system. But since they've been forced down the throat of every Ubuntu user, and since Ubuntu has already gained enough traction to be considered the Linux distribution, a lot of people project the abysmal instability of recent Ubuntu developments onto Linux itself. What promised to be the road for the success of Linux on the desktop became its worst enemy.

Common failures: getting inspiration on the wrong side of the pond

There's an interesting thing common to the persons behind the two highest-profile failures in interface stability in the Linux world: their love for proprietary stuff.

Miguel de Icaza

Miguel de Icaza founded the GNOME project (for which we've said enough bad things for the moment), but also the Mono project, an attempt to create an open-source implementation of the .NET Framework.

His love for everything Microsoft has never been a mystery. Long before this recent rant, for example, he blamed Linus for not following Windows example of a stable (internal) kernel ABI. At the time, this was not because it ‘set the wrong example’ for the rest of the community, but because it allegedly created actual problems to hardware manufacturer that didn't contribute open source drivers, thereby slowing down Linux adoption due to missing hardware support.

As you can see, the guy has a pet peeve with Linus and the instability of the kernel ABI. When history proved him wrong, with hardware nowadays gaining Linux support very quickly, often even before release, and most vendors contributing open source drivers (more on this later), he switched his rant to the risible claim that instability of the kernel ABI set ‘a bad example’ for the rest of the community.

It's worse than that, in fact, since the stability of the Windows kernel ABI is little more than a myth. First of all, there are at least two different families of Windows kernels, the (in)famous Win9x series and the WinNT series. In the first family we have Windows 95, Windows 95 OSR2, Windows 98, Windows ME (that family is, luckily, dead). In the second family we have the old Windows NT releases, then Windows 2000 (NT 5.0), Windows XP (NT 5.1), Windows Vista (6.0), Seven (6.1). And not only are the kernel families totally incompatible with each other, there are incompatibilities even within the same series: I have pieces of hardware whose Windows 98 drivers don't work in any other Win9x kernel, earlier or later, and even within the NT series you can't just plop a driver for Windows 2000 into Windows 7 and hope it'll work without issues, especially if it's a graphic driver.

However, what Windows has done has been to provide a consistent user space API (Win32) that essentially allows programs written for it to run on any Windows release supporting it, be it either of the Win9x family or of the WinNT family.

(Well, except when they cannot, because newer releases sometimes created incompatibilities that broke older Win32 applications, hence the necessity for things such as the “Windows XP” emulation mode present in later Windows releases, an actual full Windows XP install within Windows, sort of like WINE in Linux —and let's not talk about how the new Metro interface in the upcoming Windows 8 is going to be a pain for everybody. We'll talk about these slips further down.)

But WINE and Mono will be discussed later on in more details.

Mark Shuttleworth

Mark Shuttleworth is the man behind Canonical and ultimately Ubuntu. Rather than a Microsoft fan, he comes out more on the Apple side (which is where Miguel de Icaza seems to have directed his attention now too). It's not difficult to look at the last couple of years of Ubuntu transformations and note how the user interfaces and application behavior has changed away from a Windows-inspired one to one to mimics the Mac OSX user experience.

This is rather sad (someone could say ‘pathetic’), considering Linux desktops have had nothing to envy of Mac OSX desktops for a long time: in 2006 a Samba developer was prevented from presenting on his own computer, because it was graphically too much better than what Mac OSX had to offer at the time.

But instead of pushing in that direction, bringing progressive enhancements to the existing, stable base, Ubuntu decided to stray from the usability path and shift towards some form of unstable ‘permanent revolution’ that only served the purpose of disgruntling existing users and reducing the appeal to further increase its user base.

The number of Ubuntu derivatives that have started gaining ground simply by being more conservative about the (default) choice of software environment should be playing all possible alarm bells, but apparently it's not enough to bring Canonical back on the right track.

The fascination with proprietary systems

So why are such prominent figures in the open source figures so fascinated with proprietary operating systems and environments, be it Microsoft or Apple? That's a good question, but I can only give tentative answers to it.

One major point, I suspect, is their success. Windows has been a monopolistically dominant operating system for decades. Even if we only start counting from the release of Windows 95, that's still almost 20 years of dominion. And the only thing that has managed to make a visible dent in its dominance has been Apple's Mac OSX. There is little doubt that Apple's operating system has been considerably more successful than Linux in gaining ground as a desktop operating system.

While there's nothing wrong with admiring successful projects, there is something wrong in trying to emulate them by trying to ‘do as they do’: even more so when you actually fail completely at doing what they really did to achieve success.

Windows' and Mac OSX' success has been dictated (among other reasons which I'm not going to discuss for the moment) thanks to a strong push towards a consistency between different applications, and the applications and the surrounding operating systems. It has never been because of this or that specific aesthetic characteristic, or this or that specific behavior; it has been for the fact that all applications behaved in a certain way, had certain sets of common controls, etc.

This is why both operating systems provide extensive guidelines to describe how applications should look and behave, and why both operating systems provide interfaces to achieve such looks and behavior —interfaces that have not changed with time, even when they have been superseded or deprecated in favour of newer, more modern ones.

Doing the same in Linux would have meant defining clear guidelines for application behavior, providing interfaces to easily follow those guidelines and then keeping those interfaces stable. Instead, what both GNOME (initially under Miguel de Icaza's guide) and Ubuntu (under Mark Shuttleworth's guide) tried to do was try to mimic this or that (and actually worse: first this then that) behavior or visual aspect of either of the two other operating systems, without any well-defined and stable guideline, and without stable and consistent interfaces: they tried to mimic the outcome without focusing on the inner mechanisms behind it.

In the mean time, every other open source project whose development hasn't been dazzled by the dominance of proprietary software has managed to chug along, slowly but steadily gaining market share whenever the proprietary alternatives slipped.

One important difference between dominant environments and underdogs is that dominants are allowed to slip: dominants can break the user experience, and still be ‘forgiven’ for it. Microsoft has done it in the past (Ribbon interface anyone? Vista anyone?), and it seems to be bound to do it again (Metro interface anyone?): they can afford it, because they are still the dominant desktop system; Appple is more of an underdog, and it's more careful about changing things that can affect the user experience, but they still break things at times (not all applications written for the first Mac OSX release will run smoothly —or at all— on the latest one). But the underdogs trying to emulate either cannot afford such slips: if they're going to be incompatible as much as the dominants are, why shouldn't a user stick with the dominant one, after all?

Linux and the desktop

And this leads to the final part of this article, beyond a simple critique to Miguel de Icaza's article. Two important questions arise here. Can Linux succeed in the desktop? And: does it actually matter?

Does it matter?

There has been a lot of talking, recently, about whether the desktop operating system concept itself is bound to soon fall into oblivion, as other electronic platforms (tablets and ‘smart’ phones) raise into common and widespread usage.

There is a reason why the so-called ‘mobile’ or ‘touch’ interfaces have been appearing everywhere: the already mentioned Metro interface in Windows 8 is a bold move into the direction of convergence between desktop and tablet and mobile interfaces; Mac OSX itself is getting more and more similar to iOS, the mobile operating system Apple uses on its iPods and iPads; even in the Linux world, the much-criticized Unity of the latest Ubuntu, and its Gnome Shell competitor, are efforts to build ‘touch-friendly’ user interfaces.

Unsurprisingly, the one that seems to be approaching this transition better is Apple; note that this is not because the Mac OSX and iOS user interfaces are inherently better, but simply because the change is happening gradually, without ‘interface shocks’. And there are open source projects that are acting in the same direction the same way, even though they don't try to mimic the Apple interface particularly.

The most significant example of an open source project that is handling the desktop/touch user interface convergence more smoothly is KDE, a desktop environment that in many ways has often tried (albeit sadly not always successfully) to be more attentive of the user needs. (In fact, I'd love to rant about how I've always thought that KDE would have been a much superior choice to GNOME as default desktop environment for Ubuntu, and about how history has proven me right, but that's probably going to sidetrack me from the main topic of discussion.)

If everything and everyone is dropping desktops right and left and switching to phones and tablets, does it really matter if Linux can become ‘the perfect desktop operating system’ or not?

I believe it does, for two reasons, a trivial one and a more serious one.

The trivial reason is that Linux, in the sense of specifically the Linux kernel, has already succeeded in the mobile market, thanks to Android, which is built on a Linux kernel. I'm not going to get into the debate on which is better, superior and/or more successful between Android and iOS, because it's irrelevant to the topic at hand; but one thing can be said for sure: Android is successful enough to make Apple feel threatened and to let them drop to the most anticompetitive practices and underhanded assaults they can legally (and economically) afford to avert such threat.

But there is a reason why the success of an open Linux system is important: when the mobile and tablet crazes will have passed, people will start realizing that there were a lot of useful things their desktops could do that their new systems cannot do.

They'll notice that they can't just plug a TV to their iPad and watch a legally-downloaded movie, because the TV will not be ‘enabled’ for reproduction. They'll start noticing that the music they legally bought from online music stores will stop playing, or just disappear. They'll notice that their own personal photos and videos can't be safely preserved for posterity.

They will start noticing that the powerful capability of personal computers to flatten out the difference between producer and consumer has been destroyed by the locked-in systems they've been fascinated by.

The real difference between the information technology up to now and the one that is coming is not between desktop and mobile: it's between open and locked computing.

Up until now, this contrast has been seen as being about “access to the source”, ‘proprietary’ software versus open source software. But even the closed-source Windows operating systems allows the user to install any program they want, and do whatever they want with the data; at worst it allowed you to replace the operating system with a different one.

This is exactly what is changing with the mobile market: taking advantage of the perception that a tablet or smartphone is not a computer, vendors have built the systems to prevent users from installing arbitrary software and doing whatever with their data. But the same kind of constraints are also being brought onto the desktop environment. This is where the Mac OSX market comes from, this is why Microsoft is doubling their efforts to make Windows 8 unreplaceable on hardware that wants to be certified: Secure Boot is a requirement on both mobile and desktop system that want to claim support for Windows 8, and on the classical mobile architecture (ARM), it must be implemented in such a way that it cannot be disabled.

Why this difference between ARM and non-ARM? Because non-ARM for Windows means the typical Intel-compatible desktop system, and this is where the current Linux distributions have waged wars against the Secure Boot enforcement.

And this is specifically the reason why it's important for an open system to be readily available, up to date and user-accessible: it offers an alternative, and the mere presence of the alternative can put pressure on keeping the other platforms more open.

And this is why the possibility for Linux to succeed matters.

Can Linux succeed?

From a technical point of view, there are no significant barriers for a widespread adoption of Linux as a desktop operating system. The chicken and egg problem that plagued it in the beginning (it doesn't have much support, so it doesn't get adopted; it's not adopted, so it doesn't get much support), in terms of hardware support, has long been solved. Most hardware manufacturer acknowledge its presence, and be it by direct cooperation with kernel development, be it by providing hardware specifications, be it by providing closed, ‘proprietary’ drivers, they allow their devices to be used with Linux; even though Linux is far from being the primary target for development, support for most hardware comes shortly if not before the actual hardware is made available.

There are exceptions, of course. NVIDIA, for example, is considered by Linus Torvalds the single worst company they've ever dealt with, due to their enormous reticence in cooperating with open source. The lack of support (in Linux) for the Nvidia Optimus dual-card system found in many modern laptops is a result of this attitude, but Linus' publicity stunt (“Fuck you, Nvidia!”) seems to have moved things in the right direction, and Nvidia is now cooperating with and kernel developers to add Optimus support in Linux.

In terms of software, there are open source options available for the most common needs of desktop users: browsers, email clients, office suites. Most of these applications are in fact cross-platform, and there are versions available also for Windows and Mac OSX, and the number of people using them on those operating systems is steadily growing: for them, a potential transition from their current operating system to Linux will be smoother.

Some more or less widespread closed-source applications are also available: most notably, the Skype VoIP program (even though its recent acquisition by Microsoft has been considered by some a threat for its continuing existence in Linux) and the Opera web browser.

The WINE in Linux

There are, however, very few if any large commercial applications. A notable exception is WordPerfect, for which repeated attempt were made at a Linux version. Of the three attempts (version 6, version 8, and version 9), the last is a very interesting one: rather than a native Linux application, as was the case for the other two versions, Corel decided to port the entire WordPerfect Office suite to Linux by relying on WINE, an implementation of the Win32 API that tries to allow running Windows program under Linux directly.

The choice of using WINE rather than rewriting the applications for Linux, although a tactically sound strategy (it made it possible to ship the product in a considerably short time), was considered by many a poor choice, with the perception that it was a principal cause in the perceived bloat and instability of the programs. There are however two little-known aspects of this choice, one of which is of extreme importance for Linux.

First of all, WordPerfect Office for Linux was not just a set of Windows applications that would run under WINE in an emulated Windows environment: the applications where actually recompiled for Linux, linking them to Winelib, a library produced by the WINE project specifically to help port Windows applications to Linux. The difference is subtle but important: a Winelib application is not particularly ‘less native’ to Linux than an application written specifically to make use of the KDE or GNOME libraries. Of course, it will still look ‘alien’ due to its Windows-ish look, but no less alien than a KDE application on a GNOME desktop or vice versa, especially at the time (2000, 2001).

The other important but little-known aspect of Corel's effort is that it gave an enormous practical push to the WINE project. At the time when the port was attempted, the WINE implementation of the Win32 API was too limited to support applications as sophisticated as those of the WordPerfect Office suite, and this led Corel to invest and contribute to the development of WINE. The result of that sponsorship are quite evident when the status of WINE before and after the contribution is considered. Since Corel was trying their own hand at distributing Linux itself with what was later spun off as Xandros, the improved WINE was to their benefit more than just for the ability to support the office suite.

In the Linux world, WINE is a rather controversial project, since its presence is seen as an obstacle to the development of native Linux applications (which in a sense it is). However, I find myself more in agreement with the WINE developers, seeing WINE as an opportunity for Linux on the desktop.

It's not hard to see why. Desktop users mostly don't care about the operating system; they could be running PotatoOS for all they care, as long as it allows them to do what they want to do, and what they see other people doing. What users care about are applications. And while it's undoubtedly true that for many common applications there are open source (and often cross-platform) alternatives that are as good when not better of the proprietary applications, there are still some important cases where people have to (or want to) use specific applications which are not available for Linux, and possibly never will. This is where WINE comes in.

Of course, in some way WINE also encourages ‘laziness’ on the part of companies that don't want to put too much effort in porting their applications to Linux. Understandably, when Linux support is an afterthought it's much easier (and cheaper) to rely on WINE than to rewrite the program for Linux. And even when getting started for the first time, it might be considered easier to write for Windows and then rely on WINE that approaching cross-platformness with some other toolkit, be it using Qt, whose licensing for commercial applications makes it a pricey option, be it using GTK, whose Windows support is debatable at best, be it using wxWidgets, one of the oldest cross-platform widgets, or any less-tried option. In some sense, the existence of WINE turns Win32 into a cross-platform API, whose Windows support just happens to be extremely superior to that of other platforms.

It's interesting to observe that when LIMBO was included in the cross-platform Humble Indie Bundle V, it caused a bit of an uproar because it wasn't really cross-platform, as it relied on WINE. Interesting, Bastion, that builds on top of the .NET Framework (and thus uses the aforementioned Mono on Linux), didn't cause the same reaction, despite being included in the same package. Yet, to a critical analysis, an application written for the .NET Framework is not any more native to Linux than one written for the Win32 API.

If anything, the .NET Framework may be considered “not native” in any operating system; in reality, it turns out to be little more than a different API for Windows, whose theoretical cross-platformness is only guaranteed by the existence of Mono. It's funny to think that Mono and its implementation of the .NET Framework is seen in a much better light than WINE and its implementation of the Win32 API, even though under all respects they are essentially the same.

The lack of commercial applications

In some way, what Miguel de Icaza's rant tried to address was specifically the problem of the missing commercial applications, on the basis that no applications implies no users, and therefore no success on the desktop market. While the instability of the interfaces of some high-profile environments and the multitude of more or less (in)compatible distributions are detrimental and discouraging for commercial developers, the overall motivations are much more varied.

There is obviously the software chicken and egg problem: Linux doesn't get widespread adoption due to the lack of applications, applications don't support Linux because it doesn't have widespread adoption.

Another important point is the perceived reticence of Linux users to pay for software: since there are tons of applications available for free, why would Linux users think about buying something? Windows and Mac OSX users, on the other hand, are used to paying for software, so a commercial application is more likely to be bought by a Windows or Mac user than by a Linux user; this further reduces the relevance of the potential Linux market for commercial interests.

This line of reasoning is quite debatable: the Humble Indie Bundle project periodically packages a number of small cross-platform games which users can buy by paying any amount of their choice, and the statistics consistently show that even though there are significantly more bundles sold for Windows than for Linux, the average amount paid per platform is distinctly higher in Linux than in Windows: in other words, while still paying what they want, Linux users are willing to pay more, on average, which totally contradicts the perception about Linux users and their willingness to pay.

There's more: if piracy is really as rampant in Windows (starting from the operating system itself) as many software companies want us to believe, one should be led to think that Windows users are not particularly used to pay: rather, they are used to not paying for what they should be paying, in sharp contrast with Linux users who would rather choose applications and operating systems they don't actually have to pay for. In a sense, choosing free software rather than pirated one becomes an indicator of user honesty, if anything. But still, the perception of Linux users as tightwads remains, and hinders the deployment of applications for the platform.

It's only at this point, in third place so to say, that technical reasons, such as the instability of interfaces or the excessive choice in distributions and toolkits, become an obstacle. Should we target Linux through one of the existing cross-platform toolkits, or should we go for a distinct native application? Should this be targeting a specific desktop environment? And which toolkit, which desktop environment should be selected?

However, the truth is that these choices are not really extremely important. For example, Skype simply opted for relying on the Qt toolkit. Opera on the other hand, after various attempts, decided to go straight for the least common denominator, interfacing directly with Xlib. And of course, for the least adventurous, there's the possibility to go with WINE, in which case just contributing to WINE to help it support your program might be a cheaper option than porting your program to Linux; this, for example, is the way Google decided to go for Picasa.

Finally, of course, there are applications that will never be ported to Linux, Microsoft Office being a primary example of this. And for this there is sadly no direct hope.


There is one final issue with Linux on the desktop, and this is pre-installation. Most users won't go out of their way to replace the existing operating system with some other, because, as already mentioned, users don't usually care about the operating system.

This is the true key for Linux to the desktop: being the default operating system on machines as they are bought. However, none of the major desktop and laptop computer companies seem particularly interested in making such a bold move, or even make an official commitment at having full Linux support for their hardware.

One notable exception in this has been Asus, whose Eee PC series initially shipped with Linux as the only option for the operating system, even though later strong-arming from Microsoft led to it shipping both Linux and Windows machines (with the Windows ones having inferior technical specifications to comply with Microsoft's request that Windows machine shouldn't cost more than Linux ones).

It's interesting, and a little sad, that there are vendors that sell desktops, laptops and servers with Linux preloaded (see for example the Linux Preloaded website). The question is: why don't major vendors offer it as an option? And if the do, why don't they advertise the option more aggressively?

I doubt it's a matter of instability. It wouldn't be hard for them to limit official support to specific Linux distributions and/or versions that they have verified to work on their hardware: it wouldn't be different than the way they offer Windows as an option. And this makes me suspect that there's something else behind it; is Microsoft back to their tactics of blackmailing vendors into not offering alternatives?

Get Rid Of CD Images

A recent message on the Ubuntu mailing list proposing to drop the “alternate CDs” for the upcoming release of Ubuntu (12.10) had me thinking: why are modern Linux distributions still packaged for install in the form of CD images?

There are obvious historical reasons for the current situation. CDs and DVDs are the most common physical support to distribute operating systems, and Linux is no exception to this. Before broadband Internet got as widespread as it is today, the only or the cheapest way to get Linux was to buy an issue of a Linux magazine, as they commonly offered one or more CDs with one or more Linux distributions to try out and/or install.

Today, “install CDs” are still the most common, or one of the most common ways, to do a clean Linux install, even though one doesn't usually get them from a Linux magazine issue, but rather by downloading a CD image (.iso) from the Internet and then either burning it on an actual CD or putting it on a USB pendrive. In fact, I'd gather that using a pendrive is more common than actually burning a CD these days, and there's even a website dedicated to the fine art on using a pendrive to hold one or more install images for a number of different Linux distributions.

In contrast to Windows installation media, Linux install images, be they on CD or on pendrives, are useful beyond the plain purpose of installing the operating system: most of them are designed in such a way that they can also be used ‘live’, providing a fully-functional Linux environment that can be used, for example, as a rescue system (there are, in fact, images that are tuned for this), or just to try out the latest version of the operating system without actually installing anything (useful to test hardware compatibility or whatnot).

This is actually one more reason why pendrives are a superior option to CDs. Not only they are usually faster, and being rewritable can be reused when a new install comes out (and you are not left with hundreds of old CDs laying around): since they are not read-only they can be used to store permament data (and customizations) on the pendrive when the image is used as a ‘live’ system.

Finally, CDs (and CD images) have obvious, significant size constraints: This, for example, is one of the reasons why Debian, from which Ubuntu is derived, recently replaced with XFCE as the default desktop environemtn: the former was too large to fit on the (first) CD.

There is in fact one and only one reason I can think of why a CD would be preferrable to a pendrive, and that's when you want to install Linux on a machine that is so old that its BIOS doesn't allow booting from USB disks (or on a machine that doesn't have USB ports at all). So why would a modern Linux distribution (that probably won't even run on such an old system) still design its install images around CDs?

I suspect that, aside from legacy (“we've always done it like this”, so habit, toolchains, and general knowledge is geared towards this), one of the most important reasons is simplicity: a user can download an .iso image, double-click on the downloaded file and, whatever their current operating system (Windows, Mac OS X, Linux), they will be (probably) offered the option to burn the image on CD. That is, of course, assuming your computer does have a CD or DVD burner, which a surprisingly high number of more recent computers (esp. laptops) don't actually have.

By contrast, using a pendrive requires tools which are usually not as readily available on common operating systems (and particularly on Windows), or available but non-trivial to use (for example the diskimage utility on Mac OS X). Compare for example the instructions for creating a USB image for Ubuntu, and compare them with [the ones about CD installs]. Or consider the existence of websites such as the aforementioned PenDriveLinux, or LiLi (the latter being geared specifically towards the set up of live systems on USB keys).

There is also the matter of price: blank CDs and DVDs are still cheaper, per megabyte, than a USB flash disk, with prices that hover around 10¢ per CD and 50¢ per DVD. By contrast, the cheapest 1GB USB flash drives go for 1€/piece when bought in bulk: much more expensive for physical mass distribution.

The situation is a little paradoxical. On the one hand, USB would offer the chance to reduce the number of images, but on the other hand it would make downloads heavier, making physical distribution more convenient.

Consider this: a distribution like Ubuntu currently makes available something between 5 and 10 different installation image: there's the standard one, the alternative text-based (which they want to get rid of), the server one and the ‘Cloud’ one, for each of the available architectures (Intel-compatible 32-bit and 64-bit at least, without counting the ones that are also available for ARM); and then there's the localized DVD images.

A CD image is between 200 and 300MB smaller than the smallest USB flash drive that can hold it (CDs hold about 700MB, USB keys come in either 512MB or 1GB sizes); since CD images often share most of the data (especially if they are for the same architecture), an image could be specifically prepared to fit a 1GB USB disk and still be equivalent to two or three different images, and maybe even more, for example offering the functionality of both the standard and alternative CDs, or the server and Cloud images; I wouldn't be surprised if all four of them could fit in 1GB, and I'm sure they can all fit in a 2GB image.

Of course, this would mean abandoning the CD as a physical media for physical distribution, since such an image could only be placed on a USB disk or burned on DVD. It would also mean that downloading such an image would take more time than before (at the expense of those who would be satisfied with the feature of a single disk, but in favour of those that found themselves needing two different images).

If I remember correctly, Windows 98 was the first PC operating system that required a CD drive, since it shipped in the form of a floppy plus bootable CD (in contrast to the ridiculously large amount of floppies its predecessor came in). Modern Windows, as far I know, come in DVD format. I think it's time for modern Linux distributions to go beyond plain CDs as well, but in a smarter way than just “ok, we'll go to DVDs”.

I know that Linux can be made to run in the stranges of places, and it has been often used to revive older systems. But the question is: does the particular distribution you're still designing CD installation images for actually support such old systems? As an example, I think of two tightly related, but still significantly different, Linux distributions: Debian and Ubuntu. It may make sense to have a CD installation image for Debian, but I really don't see why it would be useful to have one for Ubuntu. Why?

Ubuntu is designed to be easily installable by the most ignorant user, its installer is heavily designed around graphic user interfaces, and the desktop environment it installs requires a decent video card: can a user without particular knowledge really install it (from CD) and use it on a system that doesn't have a DVD drive and doesn't support for booting from USB disks? I doubt it.

On the other hand, Debian is a ‘geekier’ distribution, it has a less whizzy installer, and by default it installs a much lighter desktop environment, if you really want to install one. You're much more likely to have success in running it on older hardware ‘out of the box’, or with minimal care during installation. A CD image to install this on such a system is a sensible thing to have; even a floppy disk to allow installation on systems that have a CD drive but can't boot from it would be sensible for Debian. For Ubuntu? Not so much.

Now, Ubuntu development these days seems to be all about revolutionizing this or that part of the user experience or the system internals: Ubiquity, Unity, Wayland, you name it, often with debatable results. Why don't they think about revolutionizing the installation media support instead? (Actually, I think I know: not enough visual impact on the end user.)

Printers, costs and content distribution

I've recently come across a rant about printers. The rant mixes in a number of complaints ranging from technology (e.g. paper jamming) to economics (e.g. the cost of ink cartridges). Since it's a rant, it should probably not be considered too serious, but it does raise some interesting points that I'd like to discuss. Of course, I'm going to skip on the obviously ridiculous complaints (such as the need to get out of bed), and go straight to the ones worth answering.

Let's start easy.

The first complaint is about all the stuff that has to be done before you can actually get to printing, such as having to plug the printer into the laptop and wait for startup times and handshakes and stuff like that.

This is actually an easy matter to solve, and that has been addressed by people that replied to the rant: get a network printer. The printer we have at home is not networked, but we have a small server, and the printer is plugged there and shared to all computers our small home network.

The paper jam complaint also doesn't carry much weight. As far as I know, even professional printing equipment jams from time to time, although of course miniaturized mechanisms such as the ones found on desktop printers are bound to jam more easily. The frequency of jams could probably be reduced by using better materials, but I doubt they could be eliminated altogether.

The biggest complain is, obviously, about ink cartridges and their cost (which the author of the rant hyperbolicly, but not even that much, states as being ten times more than diesel fuel). The author of the rant also remarks (correctly) that this is

how Epson, Lexmark, Canon, Brother et al. make money: They make shitty low-end printers the break easily so they need to be regularly replaced and make the ink cost ten times more than diesel fuel (a hyperbole that is close to accurate, btw) so they can have a steady flow of cash from those printers that do work.

Except that printer manufacturers don't make shitty low-end printers only; they also have mid-range and high-end products. So one would be taken to wonder, if the author of the rant is aware of this, why doesn't he invest a little more money in getting a better product that is going to give him less problems?

Getting a complex piece of hardware like a printer for less than 50€ (including the famous “initial cartridges”) and expecting it to last forever, with cheap consumables to go with it, is naive at best: the cheap printers are obviously sold at a loss, with money recouped by the sale of consumables. It's up to the buyer to be aware (and wary) of the mechanism, and choose accordingly, because there are alternatives; of course, they do require a higher initial investment.

The author of the rant goes on to compare the printer matter with video rental:

It’s bullshit the way Blockbuster was bullshit with late fees and poor customer service and high rental prices and yearly membership fees. Remember how Netflix and it’s similar services worldwide practically destroyed them? I really want some hipster engineer at Apple or Microsoft or anywhere to make a printer that Netflixes the fuck out of the consumer printing market.

The comparison, I'm afraid, is quite invalid. I'm not going to discuss the Blockbuster vs Netflix thing in detail, although I will mention that the Blockbuster model was not ‘bullshit’ (remember, you have to compare renting a DVD with buying a movie or going to the cinema; also, at least in my country Blockbuster had no membership fees, although the late fees were outrageous): it has been, however, obsoleted by the Netflix model.

The chief difference between the Blockbuster and Netflix models? The difference is that Blockbuster was on demand, while Netflix is a subscription. But is a subscription model intrinsically superior to an on-demand model?

The answer is, I'm afraid, no. The convenience (for the user) of one model over the other is entirely related to the cost ratio of the two services compared to the amount of service ‘use’ they get. If I only watch a movie once every two or three months, I'm much better off renting stuff one-shot when needed and then forget about shelling any money for the rest of my time. (Of course, most people watch way more movies per month, hence a subscription is usually better.)

Now let's think about this for a moment: the way Netflix disrupted Blockbuster was by offering a subscription service that was more convenient than the on-demand service offered by Blockbuster, for a lot of people.

The question is: would such a model really be applicable to printing? How would it even work? For 10 bucks a month you get a new (used) printer delivered to your door?

In fact, I really think that the current cheap printing business is the closest you can get to a subscription model, considering the enormous differences that exist between simple content distribution (which is what Blockbuster and Netflix do) and the production of complex devices such as printers.

In other words, the “Netflix revolution” has already happened in the printing business, except that it went in totally the opposite way for the consumer, while still being extremely profitable for the provider (as most subscription models are).

So what the author of the rant should probably aim at is to break out of the subscription model and go back to something more on demand. Can this be achieved?

There are many ways to work in that direction.

The cheaper way is to make heavy use of the various DIY cartridges refill kits, or referring to the knock-off or ‘regenerated’ cartridges instead of buying the official ones. However, most people probably know by experience that the quality of prints degrades in this case, something which in my opinion shows how there is actually a reason why ink cartridges are expensive, regardless of how overcharged they are.

Another, possibly smarter but more expensive (especially in the short run) solution has already been mentioned: don't buy a 30€ printer every six months; buy a mid-range printer and save in the long run. You can get professional or semi-professional color laser networked multifunction (including fax/scan capability) printers for around 500€. If you're willing to sacrifice on wireless and fax, even less than that. The toner cartridges aren't that more expensive than the ink ones (esp. in terms of price/pages), and they are much more durable (no more throwing away a cartridge because you didn't use the printer for a month and the ink dried up).

And finally, Print On Demand. You send them the file, they mail you the printed stuff. I've always been curious about this particular kind of service, and I see it making a lot of sense for some very specific cases. But I doubt the domestic use cases the author of the rants probably based his rant on would fit in this.

High resolution

When I got my first laptop about 10 years ago, I wanted something that was top of the line and could last several years with a modicum of maintenance. I finally opted for a Dell Inspiron 8200 that I was still using in 2008, and whose total price, considering also the additional RAM, new hard disk, replacement batteries and replacement cooling fans I opted or had to buy the course of that long period, was in the whereabouts of 3k€, most of which was the initial price.

One of the most significant qualities of that laptop, and the one thing that I miss the most still today, was the display, whose price accounted for something like half of the money initially spent on the laptop. We're talking about a 15" matte (i.e. non-glossy) display at a 4:3 aspect ratio, with 1600×1200 resolution (slightly more than 133 dots per linear inch), 180 cd/m² brightness at maximum settings. In 2002.

When six years later I had to get a new one for work reasons (the GeForce 2 Go on that machine was absolutly not something you could use for scientific computing, whereas GPGPU was going to be the backbone of all of my future work), I was shocked to find out there was no way to get my hands on a display matching the quality of what I was going to leave.

This was not just a matter of price: there were simply no manufacturer selling laptops with a high-quality display like the one of my old Dell Inspiron 8200. All available displays had lower resolution (110, 120 dpi top) and the only one that provided matte displays was Apple (that offered the option with a 50€ addition to the price tag).

I can tell you that dropping from the aptly named TrueLife™ Dell display to the crappy 1280×800 glossy display featured on my current laptop (an HP Pavilion dv5, if anybody is interested) was quite a shock. Even now, four years after the transition, I still miss my old display; and how could I not, when I see myself mirrored on top of whatever is on screen at the time, every time I'm not in perfect lighting conditions, i.e. most of the time?

Not that the desktop (or stand-alone) display was going any better: somehow, with the ‘standardization’ of the 1920×1080 as the ‘full HD’ display resolution (something that possibly made sense in terms of digital TV or home-theater digital media like DVD or BluRay discs, but is purely meaningless otherwise), all monitor manufacturer seemed to have switched to thinking ‘ok, that's enough’.

The dearth of high-resolution, high-quality displays has always surprised me. Was there really no market for them (or not enough to justify their production), or was there some other reason behind it? What happened to make the return-on-investment for research in producing higher quality, higher density, bright, crisp displays too low to justify ‘daring’ moves in that direction? Did manufacturers switch to just competing in a downwards spiral, going for cheaper components all around, trying to scrape every single possible cent of margin, without speding anything in research and development at all?

Luckily, things seem to be changing, and the spark has been the introduction of the Retina display in Apple products.

The first time I heard the rumors about a Retina display being available on the future MacBook Pro, I was so happy I even pondered, for the first time in my life, the opportunity to get an Apple product, despite my profound hate and despise for the company overall (a topic which would be off-topic here).

My second, saner thought, on the other hand, was a hope that Apple's initiative would lead to a reignition of the resolution war, or at least push the other manufacturers to offer high-resolution displays again. Apparently, I was right. Asus, for example, has announced a new product in the Transformer line (the tablets that can be easily converted into netbooks) with a 220 dpi resolution; Dell has announced that HD displays (1920×1080) on their 15.4" products will be available again.

How I wish Dell had thought about that four years ago. My current laptopt would have been a Dell instead of an HP. Now however, as the time for a new laptop approaches, I've grown more pretentious.

How much can we hope for? It's interesting to note that the Apple products featuring the higher resolution displays actually have a decreasing pixel density, with the iPhone beating the iPad beating the just-announced MacBook Pro. This should not be surprising: when the expected viewing distance between user and device is taken into consideration, the angular pixel density is probably the same across devices (see also the discussion about reference pixels on the WWW standard definition of CSS lengths).

However, as good as the Retina is, it still doesn't match the actual resolution of a human retina. Will the resolution war heat up to the point where devices reach the physical limits of the human eyes? Will we have laptop displays that match printed paper? (I don't ask for 600 dpi or higher, but I wouldn't mind 300 dpi).

An Opera Requiem?

Opera has been my browser of choice for a long time. As far as I can recall, I started looking at it as my preferred browser since version 5, the ad-supported version, released around the turn of the millennium. It wasn't until two or three versions later, however, that I could stick to it as my primary browser, given the number of websites that had troubles with it (some due to bugs in Opera, but most often due to stupidity from the web designer and their general lack of respect for the universality of the web).

While a strong open-source supporter, I've always considered myself a pragmatist, and the choice for Opera as my preferred browser (together with other software, such as WordPerfect, which is off topic here) has always been a clear example of this attitude of mine: when a close-source proprietary software is hands-down superior to any open-source alternative, I'd rather use the closed-source software, especially when it's free (gratis).

One of the reasons why I've always preferred Opera is that it has been the pioneer, when not the inventor, of many of the web technologies and user interface choices that are nowadays widely available in more popular browsers. I've often found myself replying with “oh, they finally caught up with features Opera has had for years?” when people started boasting of “innovative” features like tabs being introduced in browsers such as Firefox.

Opera was the first browser to have decent CSS support, and except for brief moments in history, has always been leading in its compliance to the specification (to the point of resulting in ‘broken’ rendering when CSS was written to fit the errors in other, more common, implementations).

Opera has sported a Multiple Document Interface (a stripped-down version of which is the tabbing interface exposed by all major browsers today) since its earlier release.

Opera has had proper support for SVG as an image format before any other browser (still today, some browsers have problems with it being used e.g. as a background image defined by CSS).

Opera has a much more sophisticated User JavaScript functionality than that offered by the GreaseMonkey extension in Firefox. (This is partly due to the fact that the same technology is used to work around issues in websites that autodetect Opera and send it broken HTML, CSS or JavaScript.)

In fact, Opera hasn't had any support for extensions until very recently, since it managed to implement most of the features provided by extensions to other browsers, in a single package, while still remaining relatively lightweight in terms of resource consumption and program size.

When the Mozilla suite was being re-engineered into separate components (Firefox the browser, Thunderbird for mail and news, etc) in the hopes of reducing bloat, Opera managed to squeeze all those components in a single application that was smaller and less memory-hungry than anything that ever came out of Mozilla.

When Firefox stopped supporting Windows 98, Opera could still run (even if just barely) on a fairly updated Windows 95 on quite old hardware. When Firefox started having troubles being built on 32-bit systems, Opera still shipped a ridiculously large amount of features in an incredibly small package. Feed discovery? Built-in. Navigation bar? Built-in. Content blocking? Built-in. Developer tools? Built-in.

Opera pioneered Widgets. Opera pioneered having a webserver built in the browser (Opera Unite). I could probably go on forever enumerating how Opera has constantly been one when not several steps ahead of the competition.

Additionally, Opera has long been available on mobile, in two versions (a full-fledged browser as well as a Mini version); and even the desktop version of the browser has the opportunity to render things as if it were on mobile (an excellent feature for web developers). Opera is essentially the only alternative to the built-in browser of the N900 (or more in general for Maemo). Opera is also the browser of the Wii, and the browser present in most web-enabled television sets.

Despite its technical superiority and its wide availability, Opera has never seen a significant growth in user share on the desktop, consistently floating in the whereabouts of a 2% usage worldwide (that still amounts to a few hundred million, possible more than half a billion, users).

I'm not going to debate on the reasons for this lack of progress (aside from mentioning people's stupidity and/or laziness, and marketing), but I will highlight the fact that Opera users can be considered “atypical”. Although I'm sure some of them stick to the browser for the hipster feeling of using something which is oh so non-mainstream, I'd say most Opera aficionados are such specifically because of its technical quality and its general aim towards an open, standard web.

Although the overall percentage of Opera users has not grown nor waned significantly in the last ten years or so, it's not hard to think of scenarios that would cause an en masse migration away from the browser (sadly, there aren't as many scenarios where people would migrate to it).

One of these scenarios is Opera being bought by Facebook, a scenario that may become a reality if the rumor that has been circulating these past days has any merit. I've even been contacted about this rumor by at least two distinct friends of mine, people that know me well as an Opera aficionado and Facebook despiser.

One of them just pointed me to a website discussing the rumor in an email aptly titled “Cognitive dissonance in 3, 2, 1, …”. To me, the most interesting part of that article were the comments, with many current Opera fans remarking how that would be the moment they'd drop Opera to switch to some other browser.

These are exactly my own feelings, in direct contrast with what the other friend of mine claimed with a Nelson-like attitude (“ah-ha, you'll become a ‘facebooker’ even against your will”), pointing to another website discussing the same rumor. But that would be like saying that I'd become a Windows user because of the Nokia/Microsoft deal, while I've just ditched Nokia for good.

In fact, an Opera/Facebook deal would have a lot of similarities with the Nokia/Microsoft deal that has sealed Nokia's failure in the smartphone market. For Facebook (resp. Microsoft), getting their hands on an excellent browser (resp. hardware manufacturer) like Opera (resp. Nokia) is an excellent strategic move to enter a market that they would have (resp. have had) immense troubles penetrating otherwise; for Opera (resp. Nokia), on the other hand, and especially for its users, the acquisition would be (resp. has been) disastruous.

In many ways, Facebook on the web represents the opposite of what Opera has struggled for; where Opera has actively pursued an open, interoperable web based on common standards and free from vendor lock-in, Facebook has tried to become the de facto infrastructure of a ‘new web’, with websites depending on Facebook for logins and user comments, and where the idea itself of websites starts losing importance, when just the “Facebook page” is sufficient. This is essentially the server-side (‘cloud’) counterpart of what Microsoft was done years ago with the introduction of the ActiveX controls and other proprietary ‘web’ features in Internet Explorer.

I'm not going to spend too many words on how and why this is a bad thing (from the single point of failure to the total loss of control over content, its management and freedom of expression), but even just the rumor of Opera, a browser that was actually going in the opposite direction by providing everybody with a personal web server to easily share content with other people from their own machine, is enough to send chills down my spine.

It's interesting to note that Opera has halted development of their Unite platform (as well as their Widgets platform) citing resource constraints (between Unite, Widgets, and the recently introduced Extensions, they did look a little as if they were biting off more than they could chew). And unless there will be Opera extensions developed to match the features formerly provided by the Unite and Widgets platform, their end of life marks a very sad loss for the “more power to the user” strategy that has made Opera such an excellent choice for the cognoscenti.

There are forms of partnership possible between Facebook and Opera that could be designed to benefit both partners without turning up as a complete loss for one at the benefit of the other, in pretty much the same way as Opera has built partnership with many hardware vendors to ship their browser. And since these markets are exactly what Facebook is interested in, I doubt that they would settle for anything less than buying out Opera altogether.

On a purely personal level, I'm going to wait and see this rumor through. If it does turn out to be true, I'll have no second thoughts in considering the Opera I like at its end of life, and I'll start looking into alternatives more in tune with my personal choices of server- and client-side platforms. (And what a step would that be, from actually considering applying for a job at Opera!)

Judging from the comments to the rumor on the Opera fora, I'm not the only Opera user thinking along those lines. One would wonder if Opera would still be worth that much to Facebook after its market share suddenly drops to something around zero, but if the only thing Facebook is interested in is the market penetration obtained by Opera pre-installations on mobile devices and ‘smart’ TV sets, how much would they even care for the desktop users that actually switched to that browser by choice?

{ This article will be updated when the rumor will be definitely confirmed or refuted. }

The CSS filling menu, part 2

This is a pure
CSS challenge:
no javascript,
no extra HTML

Consider the container and menu from the first part of this challenge. The idea now is to find a solution that not only does what was require in the first part, but also allows wrapping around when the natural width of the menu is wider than the width of the container (which, remember is a wrapping container for other content: we don't want the menu to disturb its layout).

Of course, when the menu doesn't fit in a single line, we want each row to fill up the container nicely, and possibly we want the items to be distributed evenly across lines (for example, 4 and 4 rather than 7 and 1 or 6 and 2, or 3 and 3 and 2 when even more lines are necessary).

Can this be achieved without extra HTML markup hinting at where the splits should happen?

The CSS filling menu

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container with a non-fixed width (for example, the CSS wrapping container from another challenge). Inside this container we have a menu (which can be done in the usual gracefully degrading unordered list), for which the number of items is not known (i.e. we don't want to change the CSS when a new item is added to the menu). The requirement is that the menu should be laid out horizontally, filling up the total width of the container, flexibly.

Extra credit

Additionally, we would like the menu entries to be uniformly spaced (we're talking here of the space between them, not the whitespace inside them, nor large enough borders), with the same spacing occurring all around the menu.

A (suboptimal?) solution

There is, in fact, a solution for this challenge. The idea is to use tables behind the scene, without actually coding tables into the HTML. If the menu is coded with the standard gracefully degrading unordered list, the ul element is set to display: table; width: 100% and the items are set to display: table-cell.

Note that this solution should not necessarily be considered ‘quirky’: the purpose of getting rid of tables from the HTML was to ensure that tables were not being used for layout, but only to display tabular content. There is no reason to not use the display: table* options to obtain table-like layouts from structural markup!

Additionally, we can also get the extra credit by using something like border-collapse: separate; border-spacing: 1ex 0, which is almost perfect, except for the fact that it introduces an extra spacing (1ex in this case) left and right of the menu. This can be solved in CSS3 using the (currently mostly unsupported) calc() operator, by styling the ul with width: calc(100% + 2ex); margin-left: -1ex.

Of course, in this case, to prevent the mispositioning of the menu in browsers that do not support calc, the margin is better specified as margin-left: calc(-1ex), that has exactly the same effect but only comes into action if the calc()ed width was supported as well.

The challenge proper

While I'm pretty satisfied with this solution, it's not perfect:

  • there are people that cringe at the use of tables, even if just in CSS form; a solution without CSS tables would thus be better;
  • when the container is smaller than the overall natural length of the menu, the menu will overflow instead of wrapping.

Note that without using the CSS tables, the uniform spacing is quite easy to set up (something as trivial as margin: 0 1ex for the entries, for example), but having the entries adapt their size to make the ul fill its container is rather non-trivial.

I'll actually consider the way to make it wrap nicely a different challenge.

The CSS shrinkwrapping container

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container (e.g. an ‘outer’ div) and inside this container we have N boxes with constrained width (i.e. width or max-width is specified). We want to lay out the boxes side by side, as many as fit inside the viewport, and we want the outer container to wrap these boxes as tightly as possible (considering, of course, all padding and margins). The container (and thereby the collection of boxes inside it) should be centered inside the viewport.

The problem here is that we want to lay out the inner boxes (almost) without taking the outer box into consideration, and then lay out the outer box as if it had {width: 100%; margin-left: auto; margin-right: auto}.

A suboptimal solution

There is, in fact, a suboptimal solution for this challenge. The idea is to fix the container width based on the width necessary to wrap the actual number of boxes that would fit at a given viewport width, and then let the box fill the container as appropriate (I prefer a display: inline-block, without floats, since this spaces out the boxes evenly).

For example, if we know that, considering padding and margins, the container would have to be large 33em when holding only one box, and that only one box would fit with a viewport smaller than 66em, and that only two boxes would fit with a viewport smaller than 98em, etc, we could use something like the following:

@media (max-width: 98em) {
    #content {
        width: 66em;
@media (max-width: 66em) {
    #content {
        width: 33em;

Now, the reason why this is a horrible solution. To work perfectly, it requires the following things to be known:

  • the number of boxes (one media query for each ‘additional box’ configuration is needed),
  • the width of each box (note that the solution works regardless of whether the boxes have the same or different widths, as long as each box width is known, in the order in which they are to be laid out).

The challenge proper

The question is: is it possible to achieve this effect without knowing the number of boxes and without writing an infinite (nor a ‘reasonably high’) number of media queries? Even a solution in the case of fixed, equal-width boxes would be welcome.

The CSS challenges

In the beginning was HTML, and HTML mixed structure and presentation. And people saw that this was a bad thing, so an effort was made to separate content structure from layout and presentation.

This resulted in the deprecation of all HTML tags and tag attributes whose main or only purpose was to change the presentation of the text, and in the birth of Cascading Style Sheets (CSS), to collect the description of the presentation and layout descriptions.

This was a very good thing. And in fact, CSS succeeded fairly well in achieving the separation of content from styling: it is now possible, using only structural (‘semantic’) HTML and CSS to achieve an impressive richness of colors, font styles and decorations.

However, while one of the purposes of CSS was to get rid of the use of ‘extra’ HTML (infamously, tables, but not just that) to control the layout, i.e. the positioning of elements on the page and with respect to each other, this has been an area where CSS has failed. Miserably.

So miserably, in fact, that sometimes it's not even sufficient to just add extra markup (container elements whose only purpose is to force some layout constraints): it might be necessary to resort to JavaScript just for the sake of obtaining the desired layout. And this, even before taking into consideration the various errors and deficiencies in the CSS implementations of most common layout engines.

I'm going to present here a number of challenges whose main purpose is to highlight limitations in the current CSS specifications: the things I'm going to ask for are going to be hard, if not impossible, to achieve regardless of the quality of the implementation, i.e. even on a layout engine that implemented everything in the current specification, and did it without any bugs whatsoever.

These challenges should be solved using only HTML and CSS, without any hint of JavaScript, and possibly without having to resort to non-structural markup in the HTML.

(Attentive people will notice that some these challenges have a remarkably close affinity with some of the features of this wok. This is not by chance, of course: one of the purposes of this wok is to act as my personal HTML testing ground for sophisticated features.)

{ And here, I might add in the future some further considerations and remarks which would not be considered challenges. }

Vettoriale manuale

Ho recentemente scoperto la bellezza dell'hand-editing dell'SVG: un po' come lo scrivere a mano l'HTML delle pagine web, ma assai più laborioso e spesso molto meno gratificante, soprattutto se, come nel mio caso, manca un senso estetico di appoggio all'abilità tecnica.

A dirla tutta, scrivere a mano questi verbosi formati di markup è estemamente tedioso, anzi faticoso, e pesa parecchio su polsi e sulle dita. La cosa non dovrebbe sorprendere: sono formati intesi più per la produzione e la consumazione da parte di macchine, che non per la modifica diretta da parte degli esseri umani. (In realtà, il discorso per l'HTML è leggermente più complesso.)

Per di più, scrivere SVG a mano significa fare grafica (SVG, dopo tutto, vuol dire scalable vector graphics) senza vederla. Abituati come siamo ad un mondo ‘punta e clicca’, anche per del semplice testo1, quanto più può sembrare strano, se non assurdo, fare grafica senza (un'interfaccia) grafica?

Ovviamente, l'opportunità o meno di lavorare senza un feedback grafico immediato dipende pesamente dal tipo di grafica che si deve fare (oltre che, ovviamente, dall'attitudine individuale). Per un rapido schizzo estemporaneo un classico programma di grafica (vettoriale) come Inkscape è certamente lo strumento ideale; ma vi sono alcuni casi (che discuteremo a breve) in cui lavorare ‘a mano’ è nettamente superiore.

Beninteso, anche quando si lavora a mano il feedback è necessario, per assicusarsi di aver scritto giusto, per controllare il risultato ed eventualmente migliorarlo; quando lavoro su un SVG, ad esempio, tengo sempre il file aperto anche in una finestra del browser, aggiornandolo quando finisco una iterazione di modifiche.

Quali sono dunque i casi in cui la stesura manuale di un verboso XML è preferibile ad una interfaccia grafica? Le risposte sono due, e benché antipodali sono strettamente legate da due fili conduttori: quello dell'eleganza e quello dell'efficienza.

L'SVG può essere considerato come un linguaggio estremamente sofisticato e complesso per la descrizione di figure in due dimensioni (figure descritte da segmenti, archi di cerchio e cubiche di Bézier), con ricche opzioni stilistiche su come queste figure (descritte geometricamente) devono apparire (colori, frecce, riempimenti).

In effetti, l'SVG è talmente complesso che è ben possibile che i programmi visuali a nostra disposizione semplicemente non supportino l'intera ricchezza espressiva del linguaggio; in tal caso, la possibilità di modificare l'SVG a mano si può rivelare preziosa (il già citato Inkscape, ad esempio, che usa una verione bastarda dell'SVG come formato nativo, permette anche modifiche manuali al codice interno dell'imagine).

Il caso opposto è quello di un disegno estremamente semplice: perché prendersi la briga di aspettare i lunghi minuti che spesso i programmi di grafica impiegano all'avvio, quando un semplice editor di testo può bastare?

Il più grosso vantaggio della codifica manuale rispetto all'uso di un classico programma per la grafica vettoriale è la netta semplificazione del file stesso: anche l'immagine più sempice, infatti, salvata da un programma di grafica, si trova infatti sommersa da una immensa e spesso ingiustifica tonnellata di informazioni supplementari che sono inessenziali, ma che riproducono le strutture di controllo utilizzate internamente dal programma stesso.

Così ad esempio, ho potuto ottenere una versione vettoriale del logo Grammar Nazi che occupa meno di metà dello spazio su disco rispetto a quella a cui è ispirata, senza perdere minimamente né in qualità né in informazione. Anzi, il mio approccio alla descrizione della G stilizzata risulta essere ben più comprensibile, essendo disegnato ‘in piano’ e poi ruotato/riscalato opportunamente.

Questo è proprio un altro vantaggio della scrittura manuale rispetto al disegno grafico: la possibilità di esprimere già a livello di codifica la distinzione tra il design della singola componente e le trasformazioni geometriche necessarie per la sua integrazione con il resto del disegno.

Benché questo sia anche una possibilità spesso offerta dai programmi visuali, l'informazione viene spesso sfruttata sul momento per deformare/riposizionare le componenti come richieste, ma non è preservata nel salvataggio su file, ed è quindi ‘persa’ dopo la sua applicazione: non essenziale per la versione definitiva di un progetto, ma alquanto scomodo nel periodo di design.

Scrivere a mano risulta quindi in file non solo più efficienti (cosa che può avere un impatto per l'utente medio, con ridotti tempi di caricamento o meno fatica da parte del computer nella rasterizzazione dell'immagne), ma anche più eleganti: un'esigenza un po' ‘segreta’ (in quanti si ritrovano abitualmente a guardare il codice sorgente di un file, piuttosto che il suo risultato?) e che per l'utente medio in genere non ha impatto (anche se può risultare talvolta opposto a quello dell'efficienza, richiedendo maggiori calcoli in fase di rasterizzazione).

La codifica manuale non è ovviamente la panacea: oltre ad essere (per qualcuno ingiustificatamente) laboriosa, ad esempio, non può sopperire ai limiti intrinseci del formato. L'SVG, ad esempio, manca della capacità di esprimere le dimensioni e le posizioni delle componenti in rapporto l'una all'altra, se non in casi molto semplici e ricorrendo a sofisticati artifici con raggruppament e dimensionamenti fatti con fattori di scala; in più, le costanti numeriche devono essere espresse in forma decimale e quindi, per valori quali π/3 o la sezione aurea, approssimate.

L'SVG, d'altro canto, non è l'unico linguaggio per la grafica vettoriale: programmi come MetaPost ed il suo progenitore MetaFont sono nati come linguaggi di programmazione per la grafica vettoriale, sono stati scritti con un occhio di riguardo per gli aspetti numerici della matematica della grafica vettoriale, e no soffrono dei limiti suenunciati dell'SVG; d'altronde, un paragone diretto tra MetaPost ed SVG è altamente inappropriato, tanto per le rispettive caratteristiche quanto per i rispettivi dominî di applicazione per cui sono stati intesi.

Il MetaFont nasce dalla mente follemente geniale di Donald Ervin Knuth con lo scopo di permette la generazione matematica di famiglie di caratteri per la stampa. I caratteri di un MetaFont sono descritte da cubiche di Bézier opportunamente parametrizzate e combinate, e questo principio (trasformando i caratteri in immagini e l'output rasterizzato in output vettoriale in formato PostScript) sarà pure la componente fondamentale del MetaPost.

MetaFont e MetaPost indicano tanto i programmi in sé quanto il linguaggio di programmazione (molto simile per entrambi) che permette agli utenti di sviluppare famiglie di caratteri o immagini vettoriali, con descrizioni di tipo matematico e relazionale (sono permesse descrizioni del tipo: traccia una curva dall'intersezione di queste altre due curve ai due terzi di quell'altra curva). Un file MetaPost è come il sorgente di un qualunque linguaggio di programmazione, e va compilato per la produzione di una o più immagini.

Per contro, l'SVG nasce come linguaggio di descrizione di immagini vettoriali, ed è mirato (seppur non in maniera esclusiva) alla fruizione del web, includendo pertanto funzionalità come la possibilità di descrivere semplici animazioni, eventualmente controllate mediante intersazione con l'utente.

D'altra parte, l'SVG si integra piuttosto bene con il JavaScript, il linguaggio di programmazione dominante sul web, e grazie a questo può assumere tutta una serie di capacità la cui mancanza lo rende in certi casi inferiore al MetaPost; d'altra parte, trovo personalmente molto fastidioso dover ricorre ad un linguaggio di programmazione ausiliario per la descrizione di immagini statiche.

Se nel MetaPost questo era una necessità legata alla natura intrinseca del programma (compensata dall'immensa flessibilità offerta dalla possibilità di esprimere i tratti salienti di un'immagine in maniera relazionale), la necessità di utilizzare il JavaScript in SVG per raggiungere certi effetti statici continua a pesare come una limitazione dell'SVG stesso.

Si potrebbe suppore che se non avessi avuto una precedente esperienza con la potente flessibilità del MetaPost, non avrei mai sentito i limiti dell'SVG come tali. Ne dubito: avrei comunque molto rapidamente trovato frustrante l'impossibilità di usare quantità numeriche ‘esatte’ lasciando al computer il compito di interpolare, avrei comunque sentito fortemente la mancaza di esprimere come tali le relazioni tra componenti diverse di un'immagine.

Piuttosto, quello che penso potrebbe essere un interessante compromesso è qualcosa di simile al Markdown (che permette di scrivere documenti HTML quasi come se fosse del testo semplice) per l'SVG. Se il MetaPost stesso si pensa sia troppo complicato, si potrebbe cominciare da qualcosa di più semplice, come Eukleides (pacchetto attualmente specializzato per la geometria).

Ovviamente, è importante che gli SVG prodotti da questi programmi siano quanto più minimalistici possibile, e quindi che in qualche modo riflettano, nel prodotto finale, quello spirito di eleganza, semplicità ed efficienza che caratterizza la codifica a mano rispetto all'uso di un'interfaccia grafica. E come il Markdown, dovrebbe permettere l'inserimento di codice SVG ‘nudo’. Quasi quasi mi ci metto.

  1. una nozione che io trovo raccapricciante: trovo faticoso già solo guardare la gente che stacca le mani dalla tastiera per selezionare del testo con il mouse, per poi andare a cliccare su un bottone per l'apposita funzione d'interesse (grassetto, corsivo, cancella, copia, whatever). ↩

RCS fast export

Get the code for
RCS fast export:


RCS is one of the oldest, if not the oldest, revision control systems (in fact, that's exactly what the name stands for). It may seem incredible, but there's still software around whose history is kept under RCS or a derivative thereof (even without counting CVS in this family).

Despite its age and its distinctive lack of many if not most of the features found in more modern revision control systems, RCS can still be considered a valid piece of software for simple maintenance requirements, such as single-file editing for a single user: even I, despite my strong passion for git, have found myself learning RCS, not earlier than 2010, for such menial tasks.

In fact, the clumsiness of RCS usage when coming from a sophisticate version control software like git was exactly what prompted me to develop zit, the single-file wrapper for git. And so I found myself with the need to convert my (usually brief, single-file) RCS histories to git/zit.

I was not exactly surprised by the lack of a tool ready for the job: after all, how many people could have needed such a thing? Most large-scale project had already migrated in time to some other system (even if just CVS) for which even quite sophisticated tools to convert to git exist. So I set down to write the RCS/git conversion tool myself: I studied the RCS file format as well as the git fast-import protocol, and sketched in a relatively short time the first draft of rcs-fast-export, whose development can be followed from its git repository.

The first release of the software was quite simple, supporting only linear histories for single files (after all, that was exactly what I needed), but I nevertheless decided to publish my work; who knows, someone else in the internet could have had some need for it.

In fact, what has been surprising so far to me has been the number of people that have had need for this small piece of software. Since the public release of the software, I've been contacted by some five or six different people (among which the most notable is maybe ESR) for suggestions/question/patches, and as with all software developed on a need/use basis, the capabilities of the script have hence grown to accommodate the needs of these other people.

In the current situation it can handle files with branched histories as well as multi-file projects with a linear history. It does not, however, currently support multi-file histories with branching, which is, unsurprisingly, the “most requested” feature at present times. I have actually been looking for a repository with this characteristics, to try and tackle the task, but it seems that finding one such repository is nigh impossible; after all, how many people still have RCS repositories around?

Design finlandese

Come ho già detto, l'N900 è un gran bel telefonino, la cui esperienza d'uso è però intralciata da qualche piccolo difetto di funzionamento. Il più strano è forse il problema del sensore di prossimità.

Per evitare che durante una chiamata il contatto con il viso o qualche mossa azzardata della mano possa chiudere le chiamate o caricare programmi non desiderati, il programma di telefonia legge il sensore di prossimità e blocca il telefonino se il sensore è chiuso.

Il sensore funziona ad infrarossi: è quindi accoppiato ad un LED a luce infrarossa che, se riflessa ad esempio dal viso dell'utente, torna al sensore invece di disperdersi nell'ambiente.

Cosa succede se nell'ambiente c'è un'altra sorgente di luce infrarossa sulle stesse frequenze del sensore? Il sensore ‘pensa’ di essere ostruito anche quando è libero; è quindi necessario che il sensore sia regolato per accettare interferenze ‘tipiche’ da sorgenti esterne.

Sorgenti tipo il Sole.

Solo che la quantità di luce (infrarossa) solare disponibile in zone come, che so, la Sicilia non è esattamente la stessa di zone come, che so, la Finlandia. Indovinate dove è stato tarato il sensore? (Suggerimento: di che nazionalità è la Nokia?)

(Davvero, per far funzionare il sensore correttamente basta ostruirlo parzialmente in modo da ridurre l'interferenza da luce solare.)

N900, o di come la Nokia ha scelto il suicidio dopo essersi aperta la strada per il futuro

Un paio d'anni fa —quando l'iPhone ormai spopolava tra i fighettini ed il concetto di smartphone si era abbondatemente esteso oltre quello di semplice Personal Digital Assistant con telefonino incluso, creando un nuovo mercato che andava ben oltre l'aspirante manager ed il suo Palm (prima) o il suo BlackBerry (dopo)— la Nokia, il cui Symbian era il sistema operativo mobile più diffuso (coprendo una gamma di prodotti che andava dai cellulari da Dash ai più sofisticati communicators), fece uscire un prodotto che per la prima volta in vita mia mi fece seriamente prendere in considerazione l'idea di prendere uno smartphone di fascia alta.

Il prodotto in questione era l'N900, uscito sul finire del 2009. L'unica cosa che allora mi trattenne dal prenderne possesso fu il prezzo, che si aggirava sui 600€ (un prezzo in realtà non inusuale per la classe dell'apparecchio, ma che ad esempio la Apple nascondeva dietro ‘offerte’ con contratti inestinguibili legati a questo o quel fornitore di connettività). Potete immaginare la mia sensazione di invidia quando ho scoperto che ad un corso libero tenuto all'università davano proprio questo modello —gratis— agli studenti (ingegneria informatica).

Qualche mese fa, approfittando della ‘zona compleanno’ e di una proposta via internet, ho deciso di prenderne uno di seconda mano ad un terzo del prezzo, ed ho finalmente avuto modo di giocarci a mio piacimento. Ed in breve posso dire che è stata forse la spesa (personale) più soddisfacente degli ultimi anni.

L'N900 si posiziona in evidente competizione con l'iPhone 3GS, uscito pochi mesi prima, con una interessante combinazione di pro e contro; una sintesi delle differenze si può trovare su questa pagina.

In realtà, l'unico ‘contro’ dell'N900 rispetto al concorrente Apple è nel touchscreen, resistivo nel Nokia (già questo ad alcuni può dare fastidio), ma soprattutto incapace di tracciare più dita, rendendo quindi impossibili i famosi gesti di pizzico e distensione per lo zoom. Per il resto, il Nokia vince praticamente su tutto, tranne lo spessore (un 60% in più, non tutto dovuto alla tastiera fisica scorrevole, che è uno dei punti di forza dell'N900): il display del Nokia ha una risoluzione quasi doppia, il Nokia ha sia una fotocamera posteriore (con flash e autofocus, ed una risoluzione superire a quella dell'iPhone) sia una anteriore (bassa risoluzione, per le videochiamate), il display del Nokia può essere usato sia con le dita sia con un pennino (incluso), il Nokia ha un ricevitore ed un trasmettitore FM (anche se, chi usa ancora le vecchie radio?), il Nokia ha una tastiera fisica (già detto), il Nokia ha un lettore per schede MicroSD, il Nokia ha la batteria sostituibile, il Nokia ha un'uscita video standard ed il cavo per la connessione ai televisori è incluso.

Infine, il Nokia ha Linux: non una macchina virtuale semi-proprietaria come il Dalvik di Android (su kernel Linux), non una variante proprietaria (iOS) del kernel open-source BSD dei telefoni Apple, ma una distribuzione Linux ad-hoc (Maemo) costituita quasi per intero da software open source.

Da quando sono entrato in possesso di questo giocattolino l'ho usato per un'infinità di cose: giocare, leggere libri e fumetti, scattare foto, girare filmati, amministrare il mio server, ascoltare musica, leggere e scrivere email, chattare e parlare via Skype e Google Talk. In sostanza l'unica cosa per cui non l'ho usato è stato telefonare, e questo principalmente perché non ho ancora trovato una SIM con buone tariffe per internet.

Molte delle cose per cui ho usato ed uso il telefonino hanno richiesto l'installazione di nuovi programmi; e benché sia disponibile un OVI store, io ho potuto trovare tutto quello che mi serviva nei repository ufficiali (dopo tutto si tratta sempre di una distribuzione Linux completa e basata su Debian). In sostanza, non ho dovuto spendere un centesimo più del costo del telefonino.

Non si può dire che l'N900 fosse perfetto: nell'uso quotidiano si possono riscontrare facilmente problemi anche molto fastidiosi che vanno da un'antenna non eccellente a qualche problema con il sensore di prossimità. Eppure, non sono certo stati questi a decretare il profondo insuccesso del tentativo della Nokia di entrare nel gioco degli smartphone di nuova generazione.

La Nokia, piuttosto, ha di fatto messo in atto un vero e proprio suicidio. L'N900, che altro non era che il primo passo verso un mercato in cui la Nokia stava entrando già con un notevole ritardo, è diventato invece, purtroppo, l'apice degli smartphone Nokia.

La strategia da seguire dopo l'uscita dell'N900 sarebbe dovuta essere focalizzata sul raffinamento ed il miglioramento del tipo di piattaforma già sperimentata con l'N900 ed il suo Maemo 5, per produrre nel minor tempo possibile un successore che ponesse rimedio ai limiti hardware e software del primo vero smartphone Nokia.

Dal lato hardware non c'era nemmeno nemmeno molto da fare: aggiungere funzionalità multi-touch al display, migliorare l'antenna interna e risolvere i problemi del sensore di prossimità sarebbero dovuti essere gli obiettivi principali. In nuove generazioni hardware, processori più potenti, batterie più capiente ed un design magari più sottile avrebbero reso insuperabili i successori dell'N900.

Ma è soprattutto sul lato software che la Nokia è caduta nel più infantile degli errori: ripartire da zero, proponendo una piattaforma completamente nuova, MeeGo, nata in teoria dalla fusione di Maemo con il Moblin della Intel: in sostanza, una terza alternativa ai due sistemi esistenti, da riprogettare dall'inizio e con la conseguente, inevitabile dilatazione dei tempi di uscita dei nuovi prodotti in un mercato che non aveva certo voglia di aspettare la Nokia.

A metà del 2010 la Nokia si trovava quindi con progetti interessanti cominciati (ma non finiti) tra le mani, il più importante dei quali l'acquisizione della Trolltech e per conseguenza il controllo sulle Qt, la più importante interfacce multipiattaforma in circolazione, e potenzialmente il punto d'incontro tra i mai nati Maemo 6 e Symbian4. Nello stesso periodo, l'azienda sforna qualche altro timido tentativo di smartphone basato sull'ormai moribondo Symbian, e l'unica nota veramente positiva dell'anno è la pubblicazione di una serie di aggiornamenti software che rendono l'N900, nei limiti imposti da un hardware dell'anno precedente, l'ottimo smartphone che mi ritrovo ora tra le mani.

Si sarebbe dovuto aspettare il 2011 per vedere i frutti dei nuovi progetti del 2010 —ed è infatti solo a giugno del 2011 che la Nokia renderà disponibile due nuovi modelli (N950 ed N9) ed il nuovo sistema operativo (Harmattan, il raccordo tra Maemo e Meego) che potrebbero farla tornare ad essere rilevante nel mercato degli smartphone.

Purtroppo, però, la latenza introdotta dalla reinvenzione di Maemo in MeeGo ed il crollo delle vendite dei prodotti Symbian portano alla decisione di un cambiamento di gestione a livello aziendale, e nel settembre 2010 l'allora CEO della Nokia viene sostituito da Stephen Elop, ex-direttore della divisione business della Microsoft (sostanzialmente, il responsabile di Microsoft Office per la release del 2010), che decide di cambiare rotta di nuovo: per sfondare nel mondo degli smartphone, secondo Elop, la Nokia dovrà appoggiarsi al più irrilevante dei sistemi operativi per smartphone, Windows Phone (precedentemente noto come Windows Mobile) della Microsoft.

Purtroppo per Elop, a fine 2010 la Nokia ha già in cantiere i successori dell'N900, e dopo la grande attesa che si è creata nel corso dell'anno per questi nuovi modelli, è impossibile impedirne l'uscita. La strategia adottata diventa quindi quella di renderli irrilevanti, distruggendo così la possibilità di una seria competizione con i nuovi modelli di iPhone della Apple, il principale singolo avversario contro cui la Nokia deve combattere.

L'N950 (il vero successore dell'N900) viene quindi rilasciato solo come developer preview per l'N9: non viene messo in vendita, ma viene reso disponibile solo a sviluppatori, attraverso strani procedimenti di selezione; la scelta è peraltro profondamente discutibile, poiché i due modelli hanno hardware abbastanza diverso (ad esempio, l'N950 ha una tastiera fisica, l'N9 no).

Per completare il suicidio, l'N9 viene reso disponibile solo su alcuni mercati (Finlandia, Hong Kong, Svizzera, India), e negato al resto del mondo. Tutto questo viene accompagnato da una gran fanfara per pubblicizzare l'uscita dei Lumia, i primi cellulari Nokia con il nuovo sistema operativo Microsoft.

Nonostante questi goffi e disperati tentativi di soffocare l'alternativa alla Microsoft, l'erede dell'N900 è talmente ambìto che varî rivenditori online rendono l'N9 disponibile anche su mercati (come quello italiano) che la Nokia aveva invece escluso. Persino altri modelli basati sull'ormai moribondo Symbian continuano a vendere più dei Lumia.

Se il CEO della Nokia non fosse stato un ‘cavallo di Troia’ mandato dalla Microsoft, la scelta da seguire per la Nokia sarebbe stata ovvia. Purtroppo, invece, ci ritroviamo in una situazione in cui a perdere sono sia la Nokia (che continua a perdere mercato ad un ritmo incredibile) sia gli utenti, che si ritrovano infine senza un degno successore per l'N900, quell'ottima combinazione di hardware Nokia (da sempre superiore alla concorrenza) e software di qualità che ne avrebbero potuto decretare il successo.

Verrebbe voglia di metter su un'azienda per costruire un clone dell'N950 e del suo compagno senza tastiera, per proseguire sulla strada che la Nokia ha scelto di abbandonare.

Reti asociali

Breve storia della socialità su Internet

Rispetto ad altre forme di comunicazione di massa, Internet si è sempre contraddistinta per la sua natura “da molti a molti”: sia in forma sincrona (IRC) sia asincrona (mailing list, newsgroup, forum) Internet ha sempre offerto la possibilità a tutti di raggiungere tutti. Fino ai tardi anni '90, per la maggior parte degli utenti questa possibilità era offerta in contesti che avevano molto della piazza e poco dell'individuale: pochi potevano permettersi una presenza fissa su internet con siti personali.

A cambiare questo sono stati la nascita dei blog (pagine personali in forma diaristica), il passaggio dalla loro cura manuale allo sviluppo di strumenti più o meno automatici per la loro gestione, ed infine il diffondersi di piattaforme che offrivano ‘a chiunque’ la possibilità di (ed in particolare lo spazio web necessario per) tenerne uno (LiveJournal, Blogger, i nazionali Splinder o ilCannocchiale, ed infine l'attualmente famosissimo WordPress).

Si avvia così un processo di individualizzazione di Internet, in cui il singolo assume (per il soggetto stesso) un peso sempre maggiore e le comunità cominciano a disgregarsi. Ai blog si affiancano siti in cui la pubblicazione e la condivisione del proprio (in forme non solo o non prevalentemente testuali) è il punto centrale: disegni (DeviantART), foto (Flickr, Zooomr), filmati (YouTube, Vimeo).

Per il singolo diventa sempre più facile pubblicarsi, ma sempre più difficile trovare e farsi trovare: il ruolo un tempo assunto principalmente dalle comunità che si aggrega(va)no in luoghi virtuali ben definiti (canali IRC a tema, gruppi specifici nell'immensa gerarchia dei newsgroup) viene progressivamente sostituito, in maniera oltremodo inefficiente, dalla rete di conoscenze (reali o virtuali).

L'apice di questo processo è la nascita dei cosiddetti social network, siti la cui spina dorsale non è più composta dai contenuti, bensì dai membri e dai modi in cui questi sono legati tra loro, invertendo così il rapporto tra utenti e contenuti che invece domina i servizi precedentemente menzionati.

Social network ed altre piattaforme

I social network non sono monchi della possibilità di pubblicare contenuti; anzi, un punto di forza su cui fanno leva per attirare utenti è la facilità con cui ‘tutto’ (testi, foto, video) può essere messo online, e soprattutto condiviso. Le possibilità offerte per la pubblicazione dei contenuti sono spesso di qualità nettamente inferiore a quelle offerte da piattaforme dedicate, ma sono per lo più sufficientemente buone (e soprattutto semplici) per l'utenza obiettivo preferenziale di questi servizi, con in più la comodità della centralizzazione.

Il social network si propaga facendo leva sulla natura sociale dell'animale umano, e la possibilità di condividere è lo strumento principale della nuova socialità virtuale. Così il social network diffonde la propria presenza oltre i limiti del proprio sito, e diventa lo strumento principale di diffusione anche di contenuti esterni. Prima dell'avvento dei social network, per un sito era importante essere ben indicizzato da un buon motore di ricerca; dopo l'avvento dei social network, per un sito diventa importante poter essere condiviso sui social network.

Ma la principale caratteristica che differenzia il social network dalle altre piattaforme è l'inversione dei rapporti tra utenti, prodotti, servizi e clienti. Mentre le altre piattaforme offrono servizi ai propri utenti, che sono anche i clienti, nei social network i servizi offerti pubblicamente sono solo un'esca per attirare utenti, le cui reti di connessioni ed i cui contenuti condivisi sono il prodotto da vendere ai clienti (principalmente, agenzie di pubblicità).

Il sottoscritto nella rete sociale

Già prima dell'avvento dei social network, i molti passaggi dell'evoluzione delle forme principali di interazione su Internet mi hanno lasciato molto tiepido. Non essendo quello che si definirebbe un early adopter, sono arrivato tardi su IRC (su cui permango tuttora), sui newsgroup e sulle mailing list (tra i quali rimango, e solo molto moderatamente attivo, solo su alcuni gruppi di interesse tecnico molto specifico), sui forum (che ho smesso quasi interamente di seguire). Ho aperto tardi un blog, e la mia presenza sui social network è pressoché inesistente.

Ogni approccio ad una nuova forma di interazione ha avuto origini e motivazioni ben precisi: persino i primi accessi ad internet (ben prima dell'arrivo della banda larga, quando ci si collegava a 56k —se andava bene— bloccando l'uso del telefono) furono motivati (ricordo che la prima connessione, usando uno di quei floppy di ‘prova internet per 15 giorni’, la feci per cercare un walkthrough per Myst, e stiamo quindi parlando dei primi anni '90).

I social network, invece, sfuggono tuttora al mio interesse: vuoi per la mia natura asociale (pardon: ‘selettiva’), vuoi per la mia totale estraneità (non priva di un certo disgusto) a quel vacuo entusiasmo per il numero di amici ed a quella passione quasi ossessivo-compulsiva per la condivisione sfrenata di ogni aspetto della propria vita nonché di quella degli altri tanto amplificata dai social network, vuoi per quella modaiolità intrinseca persino nella scelta dell'“ambiente” (ieri tutti su MySpace, oggi tutti su Facebook, domani tutti chissà dove), mi sono sempre sentito estraneo a queste forme di socialità virtuale.

Eccezionale veramente

Non nego tuttavia che alcuni aspetti del social networking possono essere utili, in determinati contesti o in particolari forme.


In ambito lavorativo, ad esempio, il ‘grafo sociale’ di un individuo, le persone che conosce (ed il suo giudizio su di loro) e quelle che lo conoscono (ed il loro giudizio su di lui) hanno spesso un'importanza non inferiore a quella delle qualifiche dell'individuo stesso (soprattutto quando il lavoro scarseggia e per qualità e per quantità).

Con quest'ottica mi sono iscritto a LinkedIn, social network incentrato sul lavoro, i cui utenti sono definiti dalla professione, dall'azienda per cui lavorano e da quelle per cui hanno lavorato, dall'istruzione che hanno ricevuto. Uno scarno profilo ed una curata selezione di contatti costituiscono la mia ‘partecipazione’ al social network.


L'altro aspetto che può rivelarsi utile dei social network è la centralità, ma piuttosto che incarnata in un ‘luogo unico’ per la pubblicazione dei contenuti, una centralità vista come punto di raccolta di contenuti pubblicati altrove.

Prima dei social network, lo strumento più comodo per seguire gli aggiornamenti di contenuti dei vari blog, siti di fotografia e quant'altro era l'utilizzo dei famosi feed; se della stessa persone si seguiva il blog, i video su YouTube, le foto su FlickR e quant'altro, ci si iscriveva a ciascuno dei feed separatamente. In questo, almeno una centralizzazione dei feed di tutti gli account sparsi per l'Internet avrebbe fatto comodo.

Ed è proprio su questo che ha puntato FriendFeed, ed è stato proprio questa sua principale caratteristica di aggregatore di contenuti, contenuti per i quali la scelta della piattaforma di appoggio rimane in mano agli utenti, che mi ha attirato.

Il profilo dell'utente FriendFeed è composto sostanzialmente dall'elenco dei feed che il social network si prenderà cura di controllare e raggruppare, per diffonderli poi automaticamente ai seguaci, ovvero a coloro che sono ‘iscritti’ all'account dell'utente. Per chi seguisse gente non iscritta a FriendFeed, questo social network permette anche la creazione di ‘amici immaginari’: nuovamente, nient'altro che elenchi di feed esterni raccolti a rappresentare un'unica persona virtuale.

Il punto chiave in tutto ciò è che FriendFeed non controlla i contenuti, ma aiuta molto a gestirli. D'altronde, non tutti i suoi utenti la vedono così, e non sono in pochi ad usare FriendFeed scrivendo messaggi e caricando foto direttamente sul social network.

La storia evolutiva di FriendFeed è stata intensa quanto breve, e si è conclusa con il suo acquisto, nel giro di meno di due anni, da parte di Facebook, mossa che per quanto favorevole agli sviluppatori di FriendFeed ha sostanzialmente sospeso, a tempo indeterminato, ogni speranza di sviluppo delle sue funzioni.

Così, ad esempio, difficilmente verrà aggiunto a FriendFeed il supporto per nuovi servizi e nuove piattaforme; difficilmente verrà migliorata la gestione della privacy, limitata alla possibilità di rendere privato il proprio account, senza ad esempio un'integrazione con la capacità di raggruppare gli amici in liste; difficilmente verrà migliorata la gestione dei gruppi, aggiungendo ad esempio la possibilità di pubblicare alcuni servizi direttamente su gruppi specifici.

Se l'approccio di FriendFeed ad una rete sociale rimane, a mio parere, il più intelligente e corretto, esso non sembra convincere coloro che invece vedono in quello tradizionale il modo migliore per sfruttare la ‘risorsa utente’. Così Google, nel suo ennesimo approccio al social network, dopo il fallimento di Wave e Buzz, opta per la strategià ‘à la Facebook’ in Google+: ed è questo, a mio parere, il grande errore di questo gigante della Rete che su altri fronti ha avuto invece tanti successi.

L'unica novità che Google+ offre in più rispetto a Facebook è infatti la gestione delle cerchie, e benché sia sorprendente che sulla rete sociale ci sia voluto tanto perché la distinzione tra gruppi di contatti molto diversi da loro fosse così ben integrata con il resto della piattaforma, non potrà mai essere questa la killer feature con cui Google+ potrà diventare per Facebook quello che Facebook è stato per MySpace.

Nel mio mondo ideal

{ Come progetterei io un social network. }

Vivere senza Windows(?)

Si può? È una domanda che dovrebbe venire spontanea a tutti quelli che continuamente si trovano ad affrontare virus, malfunzionamenti, appesantimenti ingiustificati del computer e quant'altro.

Invece, purtroppo con poca sorpresa, si scopre che la gente, principalmente per ignoranza, spesso anche per abitudine, e certamente anche per quella forma di conformistica pigrizia che ci fa preferire affrontare i problemi affrontati dalla maggioranza degli altri ai passi in più da compiere per averne molti meno ma diversi, la maggior parte della gente preferisce seguire la propaganda che vede in Windows il sistema “per tutti”, distinguendolo dallo chic alternativo della Apple e dal radical-comunista (e poco conosciuto) Linux.

Se però ci si pone davanti alla questione, la risposta (naturale quanto ovvia) è “dipende”. Dipende dal computer, dal tipo di uso che se ne fa, ed infine dal grado di interoperabilità richiesto con altri (e quindi, in definitiva, dai programmi che si intende usare).


Dal punto di vista hardware, il problema, sempre meno frequente, è legato alla possibilità del sistema operativo di farne uso; se i produttori, per ovvie ragioni di mercato, hanno sempre fornito dischi di installazione con i driver per Windows del loro hardware, la situazione con Linux non è sempre così sorridente: si va dall'hardware con un supporto completo che in alcuni casi supera persino in qualità quello per Windows, ad hardware di cui si è fortunati se si riesce a far sapere al sistema operativo che quel particolare pezzo è presente, passando per tutta la possibile gamma di varianti.

La situazione è in realtà sempre meno tragica, ed ormai è alquanto raro trovare hardware che sia completamente non supportato: allo stato attuale, credo che i lettori di impronte digitali siano grossomodo l'unica classe di hardware quasi totalmente inutilizzabile. Più spesso capita che al momento dell'uscita di un nuovo modello questo non sia immediatamente supportato (ho avuto un'esperienza negativa in tal senso con la tavoletta grafica Wacom Bamboo Pen&Touch, che però adesso uso senza problemi), o che alcune funzioni avanzate non siano configurabili con l'immediatezza delle interfacce "a prova di idioti" che spesso si trovano in Windows (stesso esempio della Wacom, per le funzioni multi-touch).

Ovviamente, il livello di supporto per le componenti e le periferiche dei computer è molto legato alla disponibilità del produttore a cooperare con il mondo Linux. Si riscontrano classicamente quattro livelli:

  1. produttori che contribuiscono attivamente al supporto con driver e strumenti open source (esempi: Intel, HP),
  2. produttori che contribuiscono attivamente al supporto con driver e strumenti proprietari (esempio: ATI, NVIDIA, Broadcom),
  3. produttori che forniscono le specifiche dell'hardware, e quindi rendendo possibile la scrittura di driver e strumenti open source, ma non contribuiscono attivamente con codice di alcun tipo (esempi: ACECAD, Wacom),
  4. produttori il cui hardware è supportato solo grazie al paziente lavoro di reverse-engineering di gente senza alcun legame con la casa produttrice (esempi: troppi).

Per potersi lasciare alle spalle Linux è quindi opportuno diventare un po' più oculati nelle scelte, a meno di non avere interessi smanettoni. Per fortuna, è sempre più difficile trovare cose che non funzionino “out of the box”, ed ancora più difficile trovare cose che non si possano fare funzionare con un attimo di pazienza e qualche rapida ricerca su internet.


L'uso più diffuso dei computer è dato (oggigiorno) probabilmente dalla navigazione in internet, seguita a ruota dall'uso di una suite per ufficio, o quanto meno del suo elaboratore testi (leggi: Word di Microsoft Office). Segue poi un po' di multimedialità, forse ascoltare musica e magari vedere qualche film, con utenze più smaliziate a cui interessa organizzare le proprie foto o i propri video (dal saggio di danza della figlia undicenne agli atti impuri con la compagna).

La scelta dell'applicazione per ciascun uso è, nuovamente, per lo più dettato dall'ignoranza: non sono pochi coloro per cui “la e blu sullo schermo” è internet (quando va bene) o Facebook (la parte di internet con cui si interfacciano il 96% del tempo, il restante 4% essendo YouTube, a cui magari arrivano da Facebook). Forse in questo caso non è proprio opportuno parlare di ‘scelta’.

Saltando la solita questione dell'inerzia (“sul computer c'è questo preinstallato, quindi uso questo”) un altro fattore determinante è l'interoperabilità, intesa specificamente in riferimento alla necessità di scambiare dati con altre persone. Se dalla monocultura web siamo finalmente usciti e sono ormai pochissimi i siti non correttamente fruibili senza Internet Explorer, per documenti di testo e fogli elettronici si continua a dipendere pesantemente dai formati stabiliti dalla suite per ufficio della Microsoft, nonostante per un accesso completo a questi documenti sia necessaria la suite stessa1.

La cosa è un po' paradossale, perché se davvero si puntasse all'interoperabilità ci si dovrebbe rivolgere a qualcosa di più universalmente disponibile, e quindi ad applicativi e formati che non siano legati ad uno specifico sistema operativo. Ma nuovamente l'inerzia e la necessità di compatibilità all'indietro con anni di monocultura (e la relativa legacy di documenti in quei formati) rendono difficile la transizione a soluzioni più sensate.

Cosa usare, e come

Per facilitare la transizione da Windows ad un altro sistema operativo, è meglio cominciare ad usare già in Windows stesso le stesse applicazioni che ci si troverebbe ad usare ‘dall'altra parte’. Prima di buttarsi a capofitto nell'ultima Ubuntu, ad esempio, è meglio rimanere nell'ambiente che ci è familiare (Windows), abbandonando però il nostro Internet Explorer (per chi lo usasse ancora), il nostro Microsoft Office, etc, per prendere dimestichezza con programmi equivalenti che siano disponibili anche sulle altre piattaforme. Questo spesso vuol dire rivolgersi al software open source, ma non sempre.

Si usi quindi ad esempio un browser come Opera, Firefox o Chrome per navigare in internet. Si usi lo stesso Opera di cui sopra o Thunderbird per gestire la posta. Gimp non sarà Photoshop, ma è un buon punto di partenza per il fotoritocco, ed Inkscape dà tranquillamente punti a Corel Draw se non ad Indesign. Come suite per ufficio LibreOffice (derivata dalla più nota è una validissima alternativa al Microsoft Office, salvo casi particolari. DigiKam è eccellente per gestire le proprie foto (anche se forse non banale da installare in Windows; un'alternativa potrebbe essere Picasa, utilizzabile in Linux tramite Wine), VLC è un po' il media player universale, e così via.

Dopo tutto, le applicazioni sono ciò con cui ci si interfaccia più spesso, molto più che non il sottostante sistema operativo, usato per lo più per lanciare le applicazioni stesse ed eventualmente per un minimo di gestione (copia dei file, stampa).

Una volta presa dimestichezza con i nuovi programmi, la transizione al nuovo sistema operativo sarà molto più leggera, grazie anche agli enormi sforzi fatti negli ultimi anni (principalmente sotto la spinta di Ubuntu) per rendere Linux più accessibile all'utonto2 medio.

Ma a me serve …

Ci sono casi in cui non si può fare a meno di utilizzare uno specifico programma, vuoi perché in Linux non è disponibile un'alternativa, vuoi perché le alternative esistenti non sono sufficientemente valide (ad esempio non leggono correttamente i documenti su cui si sta lavorando, o mancano di funzioni essenziali).

La soluzione migliore, in tal caso, è offerta dalla virtualizzazione, alternativa più efficiente, su macchine recenti, al dual boot. Mentre con il secondo approccio si tengono sulla stessa macchina i due sistemi operativi, scegliendo quale utilizzare a ciascun avvio ed essendo eventualmente costretti a riavviare qualora si volesse anche solo temporaneamente utilizzare l'altro, la virtualizzazione consiste nell'assegnazione di risorse (memoria, CPU, un pezzo di disco) ad un computer appunto virtuale, emulato internamente dall'altro.

In tal modo, utilizzando Linux come sistema operativo principale, si può ‘accendere’ la macchina virtuale, avviando Windows in una finestra a sé stante che non interferisca con il resto del computer se non nelle forme imposte dalla virtualizzazione stessa.

Successi virtuali

Ho sperimentato personalmente e con successo questa situazione, che mi è tornata utile in almeno due momenti: la necessità di utilizzare Microsoft Office per la rendicontazione di un progetto che doveva seguire un ben preciso modello costruito in Excel, con tanto di macro ed altre funzioni per le quali l' di allora non forniva sufficiente compatibilità, e più recentemente per recuperare rubrica e messaggi dal mio cellulare non proprio defunto ma nemmeno proprio funzionante.

Ma il mio più grande successo in tal senso è stato un ingegnere incallito che usa per lavoro i computer dai tempi in cui 64K erano un lusso e doveva attendere la notte per poter utilizzare tutti e 256 i kilobyte di una macchina normalmente segmentata per il timesharing. Stiamo parlando di un uomo che è passato al Lotus 1-2-3 quando dal VisiCalc non si poteva più spremere una goccia, per poi restare con il QuattroPro sotto DOS finché i problemi di compatibilità non hanno superato i benefici dell'abitudine, e che ha riformattato il computer nuovo per poter rimettere Excel 95 su Windows XP per limitare al minimo indispensabile i cambiamenti rispetto alla sua macchina precedente.

Stiamo parlando di un uomo che si è convinto a mettere Linux solo dopo la terza irrecuperabile morte del suddetto Windows XP ed il mio ormai totale e definitivo (nonché abbastanza incazzato) rifiuto ad offrirgli il benché minimo aiuto per qualunque tipo di problemi gli si dovesse presentare con la sua beneamata configurazione. (E sinceramente non ne potevo più di sentirmi raccontare ogni volta di come Windows crashava, di come Excel rifiutava di salvare, di come questo, di come quest'altro.) Un uomo che si è convinto soltanto a condizione che (1) potessi trovargli sotto Linux qualcosa che potesse sostituire in maniera integrale le sue due uniche grosse applicazioni (Excel ed AutoCAD) senza fargli perdere alcunché del lavoro svolto fino ad allora e (2) gli offrissi aiuto ogni volta che avesse problemi con il nuovo sistema operativo.

Avendo già aiutato altre persone nella migrazione, il secondo punto non era affatto un problema: chi mi conosce sa bene che non mi sono mai rifiutato di aiutare gente che avesse problemi con il computer, ma ho recentemente maturato la decisione di rifiutare categoricamente aiuto a chi avesse problemi con Windows, con lo specifico obiettivo di far notare che il sistema operativo in questione non è affatto più ‘amichevole’ nei confronti dell'utente.

Per il primo punto, il problema è stato maggiore: anche l'ultima versione di LibreOffice continua ad avere problemi con i complicatissimi fogli Excel di mio padre, e nessun CAD disponibile per Linux regge minimamente il confronto con AutoCAD.

La virtualizzazione è quindi stata la soluzione da me proposta: una installazione pulita di Windows XP con solo i programmi in questione, in una macchina virtuale gestita da Linux; dati salvati in Linux su una directory accessibile come disco di rete dalla macchina virtuale; copia di riserva della macchina virtuale, con cui sovrascrivere quella in uso in caso si sviluppino problemi.

Fortunatamente, il supporto hardware per la virtualizzazione sul suo computer si è rivelato sufficiente ad un comodo utilizzo quotidiano. Sfortunatamente, l'ingegnere in questione non riesce a trovare la pazienza di imparare gli equivalenti in Linux di quell'infinità di piccoli programmini che era solito utilizzare sotto Windows, quindi l'installazione pulita si sporca poco dopo il ripristino, e Linux viene usato quasi principalmente per la navigazione in internet (tramite Firefox).

La situazione è però alquanto soddisfacente, con l'unico neo di non poter usufruire della complessa interfaccia basata su Silverlight che i siti RAI offrono per la visualizzazione delle trasmissioni (in particolare AnnoZero). Di Silverlight, tecnologia della Microsoft, esiste una parziale implementazione in Linux tramite Moonlight, ma a quanto pare il plugin permette solo la visualizzazione della pubblicità, mentre la trasmissione vera e propria rimane inaccessibile.

Anche senza tener conto del fatto che la stessa Microsoft sta pensando di abbandonare .NET e l'associato Silverlight per la prossima versione di Windows, la scelta della RAI (o di chi per lei; a chi è stato appaltato il lavoro della piattaforma web multimediale della RAI? Telecom? ) puzza da lontano di quel tipo di scelte che hanno favorito, nei lontani anni '90, la nascita di quella monocultura web dei danni della quale ho già parlato.

Fuga dalle monoculture

Comincia ora a vedersi la possibilità di un'emersione dalla monocultura Windows, con il diffondersi dei Mac dal lato trendy (e con un pericoloso rischio di sviluppo di una nuova monocultura che si sostituisca a quella esistente), e di Linux dall'altro. C'è da sperare che le quote di ciascun sistema raggiungano livelli tali da risanare l'ecosistema senza rischiare nuove degenerazioni. Fino ad allora, ci sarà sempre un po' di corrente contraria da affrontare, ma nulla di impossibile.

In questo ha sicuramente aiutato molto, come ho già detto, Ubuntu. Ultimamente, però, le nuove uscite sono state significativamente meno attraenti delle precedenti: tra una deriva ‘alla Apple’ dal punto di vista stilistico e funzionale ed alcune scelte troppo sperimentali per una piattaforma che si pone e propone come pronta per gli utonti, consiglio caldamente di permanere saldi sulla 10.04, attendendo con un po' di pazienza il decadere dell'attitudine ‘giocattolosa’ con cui Shuttleworth sta ultimamente gestendo Ubuntu, o magari l'emersione di una nuova alternativa un po' meno ‘coraggiosa’.

Le alternative a Windows ormai ci sono, e sono valide. Ma soprattutto, per fortuna, se ne stanno accorgendo tutti, indicando il superamento dell'ostacolo più grosso a qualunque progresso: il cambiamento di atteggiamento, di mentalità diffusa.

  1. è vero che sono disponibili viewer (per Windows) che non comprendano l'intera suite; è anche vero che la maggior parte dei formati, grazie a notevoli sforzi di reverse-engineering, sono ormai per lo più accessibili anche in altre suite, il problema della “compatibilità completa” rimane, e nonostante essa non sia garantita nemmeno da versioni diverse della suite MS, è anche vero che per i punti più delicati della formattazione altre suite possono differire più sensibilmente. ↩

  2. non si tratta di un errore di digitazione, bensí del termine spesso usato per indicare gli utenti con scarsa dimestichezza con gli strumenti informatici. ↩

API sociali

In attesa di attivare le funzioni interattive del wok (commenti, pagine a modifica libera, etc), ho iniziato a lavorare ad una forma minoritaria di integrazione con i social network. Ho colto l'occasione per ampliare il lavoro già fatto per il mio UserJS per mostrare i commenti di FriendFeed in qualunque pagina, del quale ho creato una versione specificamente mirata al wok. Le sue feature attuali sono

  • lavora su ogni pagina del wok (anche nella mia versione locale), nonché su ogni permalink presente nella pagina;
  • cerca tutte le entries su FriendFeed e su Twitter che facciano riferimento alla pagina/permalink in questione;
  • ignora le entries del sottoscritto, a meno che non abbiamo a loro volta commenti o like.

Il passo successivo che mi sarebbe piaciuto compiere sarebbe stato quello di aggiungere anche i riferimenti via Buzz, ed ho qui trovato il primo intoppo serio: non è possibile cercare activities che facciano riferimento ad un indirizzo preciso; la migliore approsimazione che si possa avere è cercare qualcosa per titolo, ma ovviamente i falsi positivi crescono così in quantità industriale.

Benché non sia certo questo ad impedire a Buzz di decollare come social network, è indubbio che è un limite che non l'aiuta per nulla, in special modo se confrontato con i limiti in cui si inciampa invece cercando di usare le API di FriendFeed1 o di Twitter2.

Lo script è ancora in formato UserJS/GreaseMonkey, scaricabile da qui in attesa di venire integrato con il wok stesso (repository). Testers welcome.

Nel frattempo penserò anche ad un modo intelligente di integrare qualcos'altro.

Aggiornamento al 2011-02-08

Scopiazzando qui e là (in particolare dai link ai social network in fondo agli articoli su Metilparaben) sono riuscito a cavare un piccolo ragno dal buco, scoprendo quali API di Buzz e FaceBook possono essere utilizzate quanto meno per ricavare il numero di commenti/riferimenti/apprezzamenti, se non il loro contenuto. Nel frattempo ho scoperto pure che la ricerca su Twitter restituisce solo contenuti, quindi dopo un po' pagine che sapevo avere riferimenti non ne rivelano.

Solo FriendFeed rimane il mio grande campione di sociabilità. Ma l'ho sempre sostenuto che era il social network “come doveva essere fatto”.

  1. la ricerca su FriendFeed non funziona più da un bel po' di tempo, né via sito, né via API, ed io sono fortunato a dover usare l'API in sola lettura e senza autenticazione, perché a sentire quelli che ci devono lavorare sul serio è messa proprio male. ↩

  2. Twitter attua un processo di risanamento del parametro di callback per JSONP che rende molto macchinoso chiamare una funzione a più parametri preassegnando i valor per i parametri che non sono i dati, cosa necessaria per caricare i dati di ciascun permalink distinguendone chiaramente le origini. ↩

Provveditorato agli Studi di Enna

Get the code for
UserJS/Greasemonkey fixer per il Provveditorato agli Studi di Enna:


Il sito del Provveditorato agli Studi di Enna è uno di quei siti istituzionali che, in quanto tale, dovrebbe essere ad accesso universale: dovrebbe, in altre parole, essere (facilmente) consultabile da qualunque browser, testuale, aurale o con interfaccia grafica, per Windows, Linux, Mac OS, Wii, cellulare o quant'altro.

Invece, figlio com'è della monocultura web dell'inizio del millennio, è orribile e disfunzionale. In particolare, una delle sue funzioni più importanti (la presentazione delle “ultime novità”) non solo è esteticamente offensiva, ma per giunta funziona solo in Internet Explorer. In aggiunta, il menu laterale richiede (inutilmente) il pesantissimo Java ed è, nuovamente, disfunzionale: i link ai rispettivi contenuti funzionano (nuovamente) solo in Internet Explorer.

Per vedere se fosse possibile porre rimedio a questi limiti, ho dovuto dare un'occhiata al codice costituente della pagina: vomitevolmente offensivo per qualunque sviluppatore web, è evidente il figlio tipico della cultura del “copincollo facendo presto e male, basta che funziona con IE6” alimentata dalla succitata e fortunatamente ormai passata monocultura web.

Per fortuna, mi è anche stato possibile sistemare almeno il più grosso dei problemi della pagina: uno script utilizzabile sia con Opera sia con l'estensione GreaseMonkey di Firefox e che rende finalmente visibili le news.

Anche il menu soffre di disfunzionalità, sia per la necessità di avere Java, sia per i collegamenti alle voci specificati in versione Windows e quindi inutilizzabili al di fuori della monocultura. Lo script pone rimedio anche a questo sostituendo il menu Java con un semplice menu HTML con opportuno stile CSS e correggendo gli indirizzi di destinazione.

Questo è il meglio che si può fare per ora. In particolare, mi sarebbe piaciuto effettuare la sostituzione del menu prima del caricamento di Java, ma la cosa purtroppo non è possibile con uno script per GreaseMonkey (avrei potuto farlo se mi fossi limitato agli UserJS di Opera).

(Lo sviluppo dello script può essere seguito dal suo repository git)

Monocultura nel web

La conclusione della prima browser war verso la fine dello scorso millennio portò ad una solida monoculura: “tutti” avevano Internet Explorer 6 (IE6) su Windows (generalmente senza nemmeno sapere esattamente cosa fosse, se non vagamente “l'internet” per i più sofisticati).

L'apparente o quantomeno temporanea vittoria della Microsoft divenne rapidamente un problema non solo per quei pochi alieni che non usavano IE1, ma anche per tutti gli omogeneizzati nella monocultura dominante, grazie alle gigantesche falle di sicurezza offerte dal browser Microsoft: l'insicurezza di Windows, avente come vettori principali di attacco proprio il browser ed il consociato programma di posta elettronica2, è un fardello con cui tuttora la Microsoft, e soprattutto i suoi utenti, devono fare i conti.

Ma l'eredità della monocultura d'inizio millennio non si limita a questo. La possibilità di creare “facilmente” pagine web quando non interi siti con strumenti poco adatti (quali ad esempio la suite per ufficio della stessa Microsoft), senza grandi conoscenze tecniche (bastante ad esempio spesso la capacità di usare un motore di ricerca e di copincollare codice) ha comportato la diffusione di pagine web di pessima qualità dal punto di vista tecnico, e soprattutto scarsamente fruibili, generalmente solo per ragioni estestiche, ma troppo spesso anche per questioni funzionali, su browser diversi da quello dominante. (Perché d'altronde sprecarsi per quel misero 10% di outsiders?)

Lo scarso interesse della Microsoft per il web come piattaforma ha fatto sì che la monocultura da lei dominata ne arenasse lo sviluppo, soprattutto in termini di interattività: per cinque anni abbondanti (un lunghissimo periodo in campo informatico), le potenzialità offerte dalla sempre più diffusa banda larga sono esulati dal linguaggio specifico del web (HTML), rimanendo dominio quasi incontrastato di tecnologie “supplementari”, prima Java, quindi Flash, disponibili un po' per tutti, e i famigerati (quanto pericolosi) controlli ActiveX specifici proprio di IE6.

Nel frattempo il lavoro dei pochi campioni di quel misero 10% di outsiders, piuttosto che arrendersi e gettare la spugna, ha lavorato dapprima per coprire il breve passo che lo separava dalle funzionalità offerte dal famigerato browser della Microsoft, e quindi per aggiungere nuove funzionalità. Dalle ceneri degli sconfitti dello scorso millenio è nato un web solido e sempre più capace, lasciando ben presto gli sviluppatori web davanti ad una scelta: creare contenuti seguendo i nuovi standard con un sempre più promettente futuro, o limitarsi alle possibilità offerte dal browser attualmente dominante ma dalla posizione sempre più incerta?

Con l'avvio della transizione al nuovo, la stagnante monocultura ha cominciato a manifestarsi sempre più evidentemente come la pesante palla al piede che era in realtà sempre stata: tecnicamente inferiore e limitante, un peso per gli sviluppatori, ed un pericolo per gli utenti.

Creare pagine web universali si è sempre più rivelato per la sua assurda natura: scrivere codice una volta per “tutti gli altri” e quindi ricorrere a penosi e complessi artifici per qualcosa che era non solo cronologicamente del secolo scorso. Persino la Microsoft stessa, quando il crescente successo delle alternative3 l'ha costretta infine a riprendere lo sviluppo di Internet Explorer, si è trovata ad avere come ostacolo principale proprio gli utenti che, affidatisi allora a programmi sviluppati in forma di controlli ActiveX per IE6, si sono ritrovati a non poter aggiornare il browser salvo perdere l'uso dei suddetti programmi, spesso necessari per lavoro.

E se la reticenza all'aggiornamento è il più grave problema contro cui deve combattere la responsabile della monocultura, l'eredità del “pensiero pigro” che l'ha accompagnata la devono invece pagare quegli utenti che si ritrovano ancora a combattere contro siti che, per loro natura, avrebbero sempre dovuto essere di universale accessibilità, ma che purtroppo, gravemente, non lo sono.


  1. vuoi per volontà, vuoi perché impossibilitati dall'uso di piattaforme diverse da Windows, quali Linux o Mac OS ↩

  2. Outlook Express ↩

  3. soprattutto Firefox, presto seguito da Safari e più recentemente da Chrome (per qualche motivo, la percentuale di utenti Opera non è cambiata mai molto, nonostante la sua superiorità tecnica) ↩

Why a Wok?

Si inizia un blog dando al mezzo di comunicazione il suo valore etimologico: web log, diario in rete. Si scoprono altri usi, dallo sfogo sentimentale allo zibaldone di riflessioni, dalla filosofia alla critica letteraria, dalle analisi storiche, politiche e sociologiche alla narrativa.

Si smette di tenere un blog in genere progressivamente, e per una varietà di motivi. Ma soprattutto, in una forma o nell'altra, per mancanza di tempo: vuoi perché si hanno talmente tante cose da fare da non trovare più il tempo di scriverle; vuoi perché non si hanno molte cose da fare, ed allora ci si dedica alla loro ricerca (e comunque non è che allora abbondi la voglia di scrivere: su cosa si scriverebbe, poi?); vuoi perché si perde semplicemente interesse nel tenere questa finestra aperta sul mondo, e si preferisce un metodo comunicativo più semplice ed immediato, un Twitter o un Tumblr o un abusato FriendFeed, ma soprattutto FaceBook; un metodo, soprattutto, che per ogni intervento non ci faccia sentire quella pressione che può derivare dall'idea di avere dei lettori, dal bisogno, conscio o inconscio, di offrire loro qualcosa di qualità.

Nel mio caso, oltre forse ad un misto di quanto sopra, si è trattato molto di una sensasione di limitatezza del formato preso in considerazione. E benché la limitatezza della piattaforma del mio precedente blog sia stata un forte incentivo a cercare alternative, nessuna di quelle da me viste (da Splinder a LiveJournal, passando per l'ormai diffusissimo WordPress) mi sono sembrate “quello che cercavo”.

Quello che cercavo

Molte delle mie esigenze per il sostituto del mio blog hanno radici nella mia natura molto geek di matematico e soprattutto di programmatore.

Ad esempio, l'esigenza di poter lavorare ai contenuti con i miei abituali strumenti di scrittura: Vim in un terminale, riposante schermo nero con testo bianco, senza fronzoli (se non eventuali sofisticherie quali il syntax highlighting) e distrazioni.

Ad esempio, la possibilità di pubblicare in maniera semplice ed immediata senza nemmeno dovermi preoccupare di aprire un browser.

Ad esempio, la possibilità di avere traccia della storia dei contenuti, tutte le fasi di modifica, tutte le revisioni.

Infine, dal punto di vista più esteriore, qualcosa che offrisse più che la classica interazione scrittore-lettore del blog. Più d'una volta, nella mia trascorsa vita da blogger, mi sono ritrovato tra le mani cose che richiedevano un'interazione più ricca, scambi più approfonditi, proposte o richieste che potessero più semplicemente essere soddisfatte da “gli altri”.

Da queste esigenze nasce il wok.

Il Wok

Rubo il termine dal nome della tradizionale pentola di origine cinese per almeno due buoni motivi. Il primo, di natura fonica, è data dalla somiglianza tra il termine “wok” ed un'opportuna commistione di “wiki” e “blog”. Il secondo, di natura invece funzionale, è legato alla grande flessibilità del Wok in cucina, dove può essere impiegato per tipi di cotture che vanno dal bollito al fritto passando per il rosolato e la cottura a vapore. Come piattaforma per la gestione di contenuti, ci si può aspettare da un wok la stessa flessibilità, e quindi la possibilità di ospitare (tutte) le (principali) forme di espressione (testuale) del web:

  • lo sfogo (localmente) individuale, forse anche “lirico”, per il quale interventi di seconde parti, anche quando benvenuti, rimangono esterni al corpo principale; ovvero i contenuti che caratterizzano in maniera sostanziale il blog;
  • il dibattito tra più parti, un tempo dominio di mailing list e newsgroup ed ormai dominato dai forum, laddove questi sopravvivano;
  • i contenuti che nascono, crescono e si rifiniscono grazie alla collaborazione di più partecipanti, il cui ambiente naturale è una Wiki;
  • banali, classiche, statiche pagine web “1.0”.

La base tecnica di questo wok è ikiwiki, un compilatore per wiki con svariati possibili usi (tra cui forum e blog), i cui contenuti sono semplici file e che si appoggia a sistemi esistenti di revision control (tra cui il mio favorito git) per preservarne la storia. Non è difficile capire perché l'abbia scelto, anche se personalizzazioni e rodaggio (che verranno qui documentati) sono necessari perché da questa base si possa raggiungere la vera natura del wok nella forma e nei modi a me più consoni.

Limiti tecnici correnti

Nella distribuzione ufficiale di ikiwiki mancano le seguenti capacità perché il wok mi sia tecnicamente soddisfacente:

  • categorizzazioni multiple: ikiwiki supporta di default solo i tag, per cui eventuali categorizzazioni supplementari (rubriche etc) devono essere implementate con plugin esterni; cose per cui può servire:
    • una migliore gestione delle collection; il nuovo sistema delle trail aiuta a facilitare la gestione delle collection, ma namespace per i tag aiuterebbero comunque a marcare in maniera più ‘discreta’ i capitoli;
    • draft e wip dovrebbero essere categorie distinte dai tag
  • un indice della pagina corrente nella barra laterale (questo si può risolvere con un po' di javascript, come fatto dal sottoscritto per la pagina principale del wok, ma una soluzione senza javascript sarebbe ovviamente preferibile)
  • barre funzionali multiple (sinistra, destra, sopra)
  • commenti nidificati
  • un modo per specificare quali pagine hanno un foglio di stile (es. demauro.css per le pagine taggate demauro, ma anche per tutte quelle che le comprendono!)
  • specificare in che lingua è ciascun articolo
  • la possibilità di specificare per un insieme di pagine che i link vanno risolti controllando in specifiche sottodirectory (ad esempio, Postapocalittica dovrebe cercare nel proprio glossario); feature implementata come linkbase, ma non ancora integrata nell'ikiwiki ufficiale.

Altri problemi riscontrati:

  • le note a piè di pagina non funzionano correttamente nelle pagine che raccolgono più documenti, poiché MultiMarkdown usa lo stesso anchor per note a piè di pagina con lo stesso numero ma appartenenti a documenti diversi problema facilmente aggirabile: MultiMarkdown utilizza come codice di riferimento quello indicato dall'utente; basta quindi usare codici di riferimento univoci nei documenti: non perfetto, ma funzionale;
  • le pagine autogenerate vengono attualmente aggiunte al sistema di revision control, nel ramo principale problema risolto dal meccanismo delle transient introdotto in recenti versioni di ikiwiki;

MultiMarkown e Ikiwiki

Chi usa MultiMarkdown con IkiWiki normalmente si appoggia alla versione Perl (2.x). Nella sua forma originale, questa presenta alcuni problemi:

  • non supporta correttamente l'HTML5, con problemi che si manifestano tipicamente nell'HTML prodotto da inline nidificate: pagina A include pagina B che include pagina C; risultato il markup di pagina C in pagina A è pieno di tag p fuori posto)
  • non supporta correttamente note a piè di pagina con più riferimenti nel testo: in una tale situazione, le note vengono duplicate, sempre con lo stesso identificativo (ma numeri diversi)

Entrambe questi problemi sono risolti nel fork di MultiMarkdown matenuto dal sottoscritto.

Irretiti invisibili significanti

Un essere umano che leggesse il calendario delle Letture di San Nicolò l'Arena non avrebbe grosse difficoltà ad identificarlo come tale. Fino a qualche giorno fa, un programma che ‘leggesse’ quella stessa pagina non ne avrebbe invece potuto estrarre i dati essenziali (ovvero le date ed i temi degli appuntamenti).

In questi termini, ciò che differenzia la macchina dall'uomo non è tanto un diverso rapporto tra qualità e quantità d'informazione, quanto piuttosto la diversa forma: la mente umana ha più agio nella comunicazione verbale (orale o scritta) composta in un linguaggio naturale, che è invece notoriamente difficile da elaborare automaticamente (e non parliamo poi dell'informazione visiva).

Mi soffermo sulla forma piuttosto che sulla qualità dell'informazione perché una valutazione qualitativa della comunicazione informale può essere solo contestuale, ed è intrinsecamente soggettiva (ma esistono valutazioni qualitative che non lo sono?). Ad esempio: la (potenziale, e talvolta ricercata) ambiguità del linguaggio naturale aumenta o diminiuisce la qualità dell'informazione comunicata?

Una visione della Rete —e qui parliamo di qualcosa che sicuramente interresserebbe lo Sposonovello, e forse anche Tommy David, ma non certo, ad esempio, Yanez— come mezzo universale per dati, informazione e sapere (secondo il suo padre fondatore Tim Berners-Lee) deve quindi scendere a patti con il fatto che la fruibilità umana e quella automatica hanno richieste ben distinte; e per lungo tempo (e per ovvi motivi) quella umana ha avuto un'alta priorità, rendendo arduo il compito, ad esempio, di quei motori di ricerca (meccanismi automatici di raccolta ed elaborazione (indicizzazione) delle informazioni) cui gli esseri umani stessi si appoggiano per trovare le informazioni di cui vorrebbero fruire.

Se gli esseri umani devono passare attraverso i computer per trovare le informazioni scritte da altri esseri umani, ed i computer non sono (facilmente) in grado di ‘comprendere’ le stesse informazioni, è evidente che si pone un problema. Ed è altrettanto evidente che, nell'attesa che la singolarità tecnologica porti ad un'intelligenza artificiale (che si speri non degeneri in Skynet) in grado di interpretare autonomamente le forme d'informazione umanamente fruibili, è necessario che chi produce le informazioni stesse (e quindi, gli esseri umani stessi) le presentino in una forma consumabile dalle macchine. Ma se l'utenza finale è sempre un altro essere umano, è evidente che le forme umanamente ostiche offerte da certe proposte per la costruzione della Rete ‘semantica’ non sono meno problematiche di quelle attuali.

Una promettente soluzione in questo senso è quella di nascondere le informazioni per le macchine in tutta quella montagna di metainformazioni che sono già presenti (per altri motivi) nelle pagine che propongono contenuti per gli esseri umani: nascono così i microformati, che permettono allo stesso calendario di essere fruibile dalla macchina.

E adesso ho la smania di microformatizzare il mio blog, ma l'unica cosa di nota che sono riuscito a fare è stato aggiungere i tag XFN al blogroll. (Rimando ad altra sede una dissertazione sull'utilità.)