[Quanto ho lavorato su questa rubrica]

Suicide by LLM

Of all the stupidest hills to die on …

I'm not Cory Doctorow

This means, among other things, that I do not get paid to talk in public, or to write books, and it means that I do not have a compulsion to publish a blog post every day (and it shows).

However, it does also mean that I won't have committed “suicide by LLM”, trying to defend the indefensible for the stupidest possible reason ever: spellchecking.

I'll admit it: my posts have typos. Sometimes, when I go over one of my previous posts, I spot all the typing errors, misspelt words, doubled words, forgotten upper cases and all the other writing issues that abound in everybody's long-form writing undertakings: the reason why serious publishers have editors and reviewers.

For us bloggers, however, these are issues that for the most part are caused by us forgetting to enable spellchecking in our text editor ( in my case). And sure, if you're Jerrold Zar you can come up with a funny and poetic way to remark the limits of these tools, but these limits are there regardless of the specific shape and form of the tool, its energy requirements both in the preparation and deployment phases, and of the ethical underpinning of its development and use.

Trying to justify the use of an for this particular task, something that word processors (and standalone programs) have already been able to do for decades, with quality steadily improving over time without any of the ethical issues surrounding the tools, really takes a very special type of mind, probably the same that fell for the questionable idea that it's possible to “install our own fire exits” on a billionaire's network.

Cory Doctorow is far from the only one falling for these ruses1, but with the news of Google and Microsoft paying influencers to promote adoption of , it's inevitable to question the motives behind such a choice. Doubling down on its defense against criticism sounds particularly hollow from someone who was so keen on clarifying the true nature of the Luddite movement. Flash news: running the model(s) locally has little to no bearing on the ethical issues concerning their origins, motives, and uses.

Adafruit: the poisoned apple

Cory Doctorow is not the only significant figure in tech that seems to have decided to commit suicide by LLM. Shortly before his manifestation on the use of LLMs as spellcheckers, Adafruit Industries revealed they had started using them to develop their board designs.

This revelation suffered such a massive pushback on the that it was inevitable to compare it to when Raspberry Pi decided to commit suicide by cop (for once, not in the usual sense of intentionally misclassifying murder by law enforcement, but an actual mediatic suicide of the enterprise' reputation).

And as it usually happens in these cases (remember when Framework Computer decided to commit [suicide by Nazi], pardon, by big tent?), instead of coming out with an apology, they doubled down with one of most inane reversals ever seen on the Internet, considering that basically the entire reason of existence for the LLMs used by Adafruit is specifically to accelerate the extraction of value (stealing) from the disenfranchised in favour of the rentier class.

Generative Large Brainworm

I have long ceased to idolise prominent figures in tech, so seeing them fall to the LLM/genAI brainworm doesn't really make me question my beliefs; however, for sure it does not spark joy to see them drift away like this.

And it is a brainworm; or rather, more appropriately, a mindworm, because it has no physical body (although it's quite possible that its influence does affect the brain biochemistry —I haven't looked into that yet). It is a mindworm, because it's parasitic in nature, infectious, and damaging to the host. It latches on the same mechanisms of our pareidolia, pushed through by the filthiest of psychological manipulations that marketing has been developing for decades, and leads to a general loss of cognitive functions, a decline that goes largely unnoticed by those who get trapped in it, but is clear as day to anyone surrounding them.

I'm guessing I'll stack that with all the other reasons why this grift is so damaging for the Internet, combined with the indiscriminate scraping I've finally had to defend my server from, the pollution of the commons that will make older and genuinely man-made web pages as precious as low-background steel, the massive jacking up of memory prices, the premature storage shortage, and all the other general attacks to personal computing.

And I'm well aware of how First World problem such complaints may be, but after all, let's be honest here: the root lies in the same extractive, colonial mindset that has dominated the last centuries, keeping under the heel people with much worse problems.


  1. speaking of and “fire exits”, for example, I've recently come across a thread on the from someone who has been spending time and resources to build a (somewhat successful so far) alternative BS instance, and who was now complaining about the dominant group undermining said effort —and honestly, what did he even expect? But that deserves its own post. ↩

Shields up, part 2

More aggressive defense against the LLM scraper flood.

We're going to need bigger shields

As I mentioned the last time, my biggest issue with the wave of scrapers isn't even with this particular website of mine, but with my gitweb instance. This is in large due to the fact that while the Wok is static (and thus relatively cheap to host even if scraped aggressively), the gitweb instance isn't, as it some links allow it to serve some pretty hefty binary blobs. And the I mentioned in my previous post on the topic wasn't cutting it anymore. It was time to go for something much more strict.

The solution I've decided to go with this time has been to go limit most gitweb commands to only serve the correct data if an appropriate referral is given, i.e. if the request was originated from a page for which it made sense. To wit, a request to e.g. download the full archive for one of the repositories I host should only be triggered by following a link from the corresponding project page: anything else will be assumed to be spurious, and will instead return a 401 Unauthorized error.

As usual, the presence of this error will then lead to a 7-day ban for that IP. The system has been up for approximately 3 hours at the time of writing, and it has already caught over 9000 (in fact, nearly 13,000). I'm guessing the next step would be to collect some subnet information about this huge list of IPs and proceed to ban the entire subnet.

This is a bit dangerous

To be honest, this is as successful as it is dangerous. Browser themselves often don't send the correct referrer information “for privacy”, so even a human genuinely ending up on my gitweb using an overprotective browser will end by gaining a 7-day ban. The only thing I can say is: sorry, the bot flood has made your access pattern largely indistinguishable from that of any of these bots, so you'll have to rethink your browser habits.

I wonder if I can count on the presence of the Do Not Track header in this case? Does anybody know if bots send it?

I'm also wondering if the 401 error page I send should also use the same trick as the neverlink tarpit I mentioned in the previous post, and “bleed out” the response very slowly.

I'll have to think about it.

12 days of XSLT

“12 days of Christmas” using XSLT

Deborah Pickett on the Fediverse threw a Christmas coding challenge: to code the famous Twelve days of Christmas rhyme in any (programming) language.

She even answers it herself with XSLT, a solution that I love because it's fast, compact, and it gives visibility to the much-maligned I've already abundantly discussed here.

So obviously I've asked myself: would I do it differently if I were to use the same language? And the answer is (obviously, as otherwise I wouldn't be writing this) yes.

Two things mainly I would do differently: I would separate the list of items from the transform, and I wouldn't include the cardinal and ordinals in the list of items, but let the transform compute it automatically.

This has some clear downsides compared to @futzle's solution: it requires multiple files (compared to the single one of her solution), and the XSLT itself will be considerably more complex, since at the very least it'll have to include the logic about writing the numbers as letters, both in ordinal and cardinal form.

On the other hand, we gain something: the possibility to customize the list without touching the transformation code (separation of concerns), and potentially even support for multiple (human) languages, which I haven't implemented yet.

You can find my version of the transform here, and you can see it in action here. This is presently a rather quick & dirty implementation, and will produce the incorrect indefinite article for words starting with a vowel sound (or to be more specific, needing an instead of a), but it shows it can be done, and that the transform can be applied to different lists of presents, such as this one taken from RFC 1882 without the commentary (which, on the one hand, highlights another shortcoming of the current transform, but on the other, helps minimize the sexism of the original text since it is unspecified why the nine lady executives are a problem for the tech support person —and that might just as well be because said tech support person happens to be a sexist asshat)

The problem is, now I feel nerd sniped, since there's a number of feature I want to add support for in the XSLT.

The aforementioned multiple language support, first and foremost. A way to specify the cardinal indicator to use (which may be an vs a in English, or gendered forms in languages like Italian or French) would also be useful to allow the correct form to be used. And ideally, even, a way to indicate which part of the “present” needs to be made plural (and how), which would allow reordering the list and still get a grammatically correct output. (Evolving per-present commentary would also be possible, although at the very least I'd want to play around with the idea using a better example than the aformentioned RFC.)

But for the time, this will suffice.

A tale of two Webs

Webs of documents, webs of apps, and conflicts of interests and designs.

This is the collected form of a thread I initially brainstormed on the Fediverse, with the introduction taken mostly from this other commment thread of mine.

Introduction

There has been a lot of noise recently concerning Mozilla's choices about where to take their Firefox browser, especially in view of their choice to go all-in on “AI” even against a largely negative feedback from the community. While some have defended, or at least found a justification of this choice based on “market” consideration («people want AI»), many (among which myself) remain unconvinced, for several reasons.

I have listed a few, ranging from the low market share of and making it hard to digest any claims about the credibility of an “educational” intent for the adoption (particularly since they fired their advocacy dvision) to their insistence on holding onto the idea even when this aliented their entire volunteer Japanese localization team —something that a “public benefit” corporation that claims to be consistently low on resources should be extremely wary of doing.

In general, Mozilla has been shown to be extremely unreceptive to feedback from its community in general, possibly unaware of the fact that the selected few that still hold on to Firefox as their primary browsers are largely people who care about the open web first and foremost, and feel deeply betrayed by the last decade plus of decisions made at Mozilla that do little more than paying lip service to the principles of the open web, while in practice assisting in dismantling it. And it's impossible to know, without insider knowledge, if their management is simply obtusely incapable of understanding their precarious position, under threat of seeing their advertisement funding cut off by Google if they don't comply (or just realise ad revenue won't be there forever), or simply trusting that people have nowhere else to go. What are disgruntled Firefox users going to switch to, after all? Google Chrome? Ah, please.

As @rysiek​@mstdn.social puts it,

Mozilla is the browser vendor equivalent of centrist political parties.

or rather, as I would put it, the “center-left” parties, whose “moderate” positions simply help shift the Overton window to the right, normalizing positions that are antithetical to their own existence, in a futile effort to pursue an (electoral, or user) base that would never vote for them anyway.

And not only recent political events such as Zohran Mamdani's election to mayor of New York City show how successful even just a moderate push to the left can be: to make things even worse for Mozilla, switching to minority browsers is actually much easier and effective than voting for minority parties in political elections.

The net result of this hard-headed decisions at Mozilla has been that long-time Firefox users have been looking harder at alternatives.

@mcc​@mastodon.social, like many others, have been promoting Servo, the experimental browser initially developed by Mozilla and then discarded by firing the entire team working on it, and now reborn as an independent project.

Supporting is an excellent idea, if not else because its survival increases the number of independent browser engines in active development. But Servo is not yet viable as a primary choice: performance is abysmal, standards support is low, and user interface is featureless and unstable. People wanting to start looking at options now will have to look elsewhere (this is, of course, orthogonal to any support one may give to the Servo project). The only currently viable alternatives to Firefox that aren't just skins around the or rendering engines are Firefox forks (Fireforks?)

As I have already mentioned, there are mainly three active Firefox forks that may fit the needs of users abandoning Firefox out of concern for the direction its development is taking: LibreWolf, WaterFox and Pale Moon. And of these, the first two ( and ) are basically just “Firefox without the most egregious privacy-invasive misfeatures”, which leaves the question open about their viability if Firefox goes under —in fact, the LibreWolf developers have clearly stated that they won't be able to hard-fork and go their own way. with its rendering engine is thus the only currently viable alternative that has shown it can exist and grow independently from Firefox.

And so I've been pondering: what is really needed of a web browser? We know that implementing one from scratch is a titanical challenge. But is that also true of maintaining an existing one?

To answer that, we should first stop and consider what is a browser, and what is the World Wide Web. And the interesting part is that we're currently in a process of “speciation”, if I may borrow a term from evolutionary biology.

A tale of two Webs

The World Wide Web was born with the intent to achieve an interconnected web of documents (or, more in general, resources that may include things like images or other multimedia elements): and this is not only what it was in the beginning, but also what most of the open, independent web still is, even when it's more dynamically generated (wikis, blogs).

What we've seen under the moniker of “Web 2.0” in the last 20+ years, but especially in the last decade, has been the development of a different interpretation of the Web.

Major corporations saw in the “Web 2.0” the opportunity to leverage this communication channel as a means to deliver services to the users, or, a rose by any other name, as a way to write cross-platform application front-ends.

This isn't exactly news to anyone who has been using the web more than a decade, but I think it's quite important to stress this again: the modern web features both kinds of websites: document repositories, and application front-ends (“web apps”).

And web browsers are used to access both kinds of websites, but —and this is extremely important— the two kinds of websites have very different requirements. For example, the V8 engine that powers Chrome was specifically designed to improve the quality of service of web apps, and while the “web of documents” can at times benefit from said improvements, it doesn't have particular needs in this regard, except maybe to compensate for the deficiency of other components (especially )

A lot of the development efforts (both creative and destructive) in web browsers in the last decade+ has been going into fostering the “web app” vision of the web, to the detriment of the “web of documents” vision. From the removal of native support for and feeds to the introduction of JavaScript APIs like WebUSB or the Web Environment Integrity attempt I already discussed in the past, nearly all work done on browsers has been in this direction.

This difference isn't just a matter of feature sets; in fact, it's primarily a matter of design principles.

A browser for the “web of documents” is a User Agent: it's a tool in the hands of users, designed to maximize the usability of said documents.

A browser for the “web of apps” is a Corporate Agent: it's a tool in the hands of corporations, designed to maximize the control they have on the user machine.

One can obviously see how this reflects in the development of Chrome with the removal of features that are unnecessary or, even worse, detrimental to corporate interests (the most famous recent such change being the introduction of the so-called Manifest v3 for WebExtensions to kill ad blockers), but you can also see it in Firefox development when their “listening to the community” means doubling down on shoving unwanted (aka genAI/LLM) everywhere and dropping support for .

Under this analysis, browsers like Vivaldi are in a very precarious situation: on the one hand, the browser is being developed under what is arguably a “web of documents” mindset, and in fact more in general as a “Swiss knife of the Internet”, (similarly to the classic Opera browser, as I've already written about at length). On the other hand, its reliance on the Google-controlled Blink engine that is designed for the “web of apps” cripples it in its efforts: while they've been able to reintroduce native RSS and Atom support, Vivaldi doesn't support JPEG XL because Blink has removed support for it (although things seem to be changing on that front), and will have no choice but to drop XSLT support following Chrome's timeline, unless its developers finally decide to put in the development effort themselves. The same holds for any other browser that depends on Blink, WebKit and soon even Gecko. This will make all of them less of a User Agent and more of Corporate Agents infiltrated in our machines.

So, it's time to realize that the “web of documents” and “web of apps” are two completely different beasts, only incidentally related to each other, and that it might not even make sense to waste efforts in developing tools that support both equally well. This means, in particular, that we may have to make peace with the fact that one browser might not be enough: we will need two of them.

For me, this is already the case, by the way: although Firefox is my primary browser, I still have to resort to Chromium from time to time, either because some websites simply refuse to work correctly in Firefox, or because it's the only way to ensure a solid “separation of concerns” (unsurprisingly, what I use Chromium for is the more corporate-y stuff.) And even without asking, I'm sure I'm not the only one. (From the poll I'm running on the Fediverse, over 40% of the respondents at the time of this writing use a separate browser for some corporate site, and less than 30% use the same browser for everything without any separation.)

In this sense, the question «are LibreWolf/WaterFox viable if Firefox goes down» becomes less important: Pale Moon has shown that a viable alternative at least for the “web of documents” is in fact possible (and it exists already). So the question would rather be «will we have a viable alternative to corporate browsers (Chrome, Safari) for the “web of apps”?»

I think the answer is yes: even if LibreWolf and WaterFox wouldn't be able to survive or keep up to date with evolving standards without their reliance on Firefox, we will likely still have Blink- and WebKit-based browsers around to work as “slightly less shitty corporate agents” to browse the “web of apps” —at least as long as Blink and/or WebKit remain open source.

If anything, the biggest issue would be that, since the “web of documents” and the “web of apps” use the same protocol, as the two diverse it will be come harder to switch from one to the other during a browsing session.

In this sense, the Gemini protocol folks have arguably had the right idea: “we'll make our own web, with simple text and image formats”. This, to steal @witchescauldron​@kolektiva.social's expression, pushes the “web of documents” away from the “web of apps”, and solves the issue by making it apparent that “the two webs” are completely different from each other.

But as I've already mentioned elsewhere, the Gemini protocol approach, in my opinion, throws away the baby with the bathwater. Many of the web formats and technologies are actually extremely useful even for the “web of documents”: the problem isn't with , , , , or even , the problem is that browsers have been catering exclusively to the “web of apps”, neglecting (when not outright obstructing) the “web of documents”. We can keep that tech and the “web of documents”.

The good old, classic Opera/Presto had an interesting approach here: since it couldn't guarantee, despite all efforts, perfect compatibility with websites that weren't designed around web standards but “for specific browsers”, it provided a menu option to open in a different browser any page you were in. I don't think I've seen such a feature in any other browser (apparently there were extensions for it), but I think it's actually the simplest solution to the diverging paths for the two webs.

(This is actually a feature that all browsers should have, regardless of the “apps vs documents” thing, and while I can understand why the major ones won't, I hope to see all others adopt it.)

If we accept that the “web of apps” and “web of documents” are two separate things, and that the development and maintenance of the browsers for the “web of apps” is essentially left in corporate hands, what is left is the question: how expensive is it to develop and maintain the browsers for the “web of documents”?

And I suspect that the answer is “much less” (than the “web of apps”).

For starters, most of what the WHATWG is working on is largely irrelevant for the “web of documents”. This means largely no development efforts is needed in “web of documents” browsers to “run after the latest revision of the spec”. I would expect most of the work to be of the maintenance kind (fixing bugs, security issues, and the like), which is sadly the kind of brutal, unglamorous work nobody wants to do.

Some new work to support more recent revisions of “document-useful” standards like CSS may be needed: this is a much slower process, although it can be quite complex.

It will be interesting to see, as the consciousness of the divergence of the “web of apps” from the “web of documents” grows and matures, how this will reflect on the adoption of new features by website developers in the face of graceful degradation.

My guess would be that “web of apps” developers will gladly throw themselves at new features as soon as it's “all green” on CanIUse (which I don't expect will monitor “web of documents” browsers: even now it doesn't feature minority browsers like Pale Moon or Vivaldi), while I expect “web of documents” developer to be more cautious in their approach: there's a reason why a gardening metaphor is often used for the independent web.

Just to be clear, I'm not saying that maintaining a “web of documents” browser would be effortless, or that it can be supported just by the kind of “amateur coder in their spare time” work that stereotypically underlies development (see also @glipari​@sciences.re's comment thread here).

In fact, if anything, specifically because the main work needed is the kind of unglamorous work that most developers dislike (bug fixing and the like), it's going to be even less likely to find the flock enthusiasts that volunteer their time on it for free. So this aspect does not, in any way, eliminate the need to find a way to support the developers of such a browser.

(By the way, while it's true that, as @glipari points out, browser development for the “web of apps” is motivated by the money it brings in, this doesn't necessarily explain why the “web of documents” gets not only neglected but in fact actively crippled. There seems to be indeed a lot of effort going into developing the “web of apps” in such a way that is specifically goes against the “web of documents, even when we have solutions, sometimes existing solutions, that can serve both.)

This is why it is important to fund projects that maintain and develop these browsers (although for example I would like to have from the Servo team a clear statement about the kind of browser that they want to be; it may not matter now as they are still quite far behind in terms of standards support, but it will matter soon, if not else in terms of what to prioritize, and it would be nice to know that we aren't throwing money at the next Firefox, down the enshittification drain).

(And no, don't expect me to propose a solution for the funding problem. I don't have one, unless we finally get Universal Basic Income everywhere, but that's a whole different topic of discussion.)

A wishlist for the “web of documents” browsers

Let's pretend for a moment we have solved the funding problems. In this case, I would love a “web of documents” browser to go beyond.

For example, such a browser would support the Gemini protocol just as well as HTTP. It would support the text/gemini format just as well as text/html —and why not, also text/markdown (yes, it's official) and text/asciidoc (registration pending).

It would have native support for feed discovery and for the RSS and Atom XML formats. It would support the multi-document navigation metadata that enjoyed a brief moment of glory in the early aughts and support for which currently only survives, AFAIK, in the Pale Moon Website Navigation Bar plugin, and it would support the HTML+SMIL profile that only briefly existed in an ancient IE experiment.

It would support XSLT 3 (and 4, when it comes out), and XQuery as a scripting (or more appropriately templating) language. It would support XHTML2 (seriously, have you read the spec? It's so much better than HTML in so many ways it's ridiculous what we've been deprived of; even with a dislike for XML Events and XForms, which is the “web for apps” part, there's no justification for throwing away the rest).

And of course, it would support “any” multimedia format (one of these days I will bring to this site the brainstorming I had about how to achieve this, and hopefully I'll remember to link it here).

But that's enough daydreaming.
Let's start from what we have.

Google is killing the open web, part 2

Do not comply in advance.

I wrote a few months ago about the proxy war by Google against the open web by means of XSLT. Unsurprisingly, Google has been moving forward on the deprecation, still without providing a solid justification on the reasons why other than “we've been leeching off a FLOSS library for which we've finally found enough security bugs to use as an excuse”. They do not explain why they haven't decided to fix the security issues in the library instead, or adopt a more modern library written in a safe language, taking the opportunity to upgrade support to a more recent, powerful and easier-to-use revision of the standard.

Instead, what they do is to provide a “polyfill”, a piece of that can allegedly used to supplant the functionality. Curiously, however, they do not plan to ship such alternative in-browser, which would allow a transparent transition without even a need to talk about XSLT at all. No, they specifically refuse to do it, and instead are requesting anyone still relying on XSLT to replace the invocation of the XSLT with a non-standard invocation of the JavaScript polyfill that should replace it.

This means that at least one of these two things are true:

  1. the polyfill is not, in fact, sufficient to cover all the use cases previously covered by the built-in support for XSLT, and insofar as it's not, they (Google) do not intend to invest resources in maintaining it, meaning that the task is being dumped on web developers (IOW, Google is removing a feature that is going to create more work for web developers just to provide the same functionality that they used to have from the browsers);

  2. insofar as the polyfill is sufficient to replace the XSLT support in the browser, the policy to not ship it as a replacement confirms that the security issues in the XSLT library used in Chrome were nothing more than excuses to give the final blow to and any other format that is still the backbone of an independent web.

As I have mentioned in the Fediverse thread I wrote before this long-form article, there's an obvious parallel here with the events that I already mentioned in my previous article: when bent over to 's pressure to kill off RSS by removing the “Live Bookmarks” features from the browser, they did this on presumed technical grounds (citing as usual security and maintenance costs, but despite paid lip service to their importance for an open and interoperable web, they didn't provide any official replacement for the functionality, directing users instead to a number of add-ons that provided similar functionality, none of which are written or supported by Mozilla. Compare and contrast with their Pocket integration that they force-installed everywhere before ultimately killing the service

Actions, as they say, speak louder than words. When a company claims that a service or feature they are removing can be still accessed by other means, but do not streamline such access said alternative, and instead require their users to do the work necessary to access it, you can rest assured that beyond any word of support they may coat their actions with there is a plain and direct intent at sabotaging said feature, and you can rest assured that any of the excuses brought forward to defend the choice are nothing but lies to cover a vested interest in sabotaging the adoption of the service or feature: the intent is for you to not use that feature at all, because they have a financial interest in you not using it.

And the best defense against that is to attack, and push the use of that feature even harder.

Do. Not. Comply.

This is the gist of my Fediverse thread.

Do not install the polyfill. Do not change your XML files to load it. Instead, flood their issue tracker with requests to bring back in-browser XSLT support. Report failed support for XSLT as a broken in browsers, because this is not a website issue.

I will not comply. As I have for years continued using , and (sometimes even all together) despite Google's intent on their deprecation, I will keep using XSLT, and in fact will look for new opportunities to rely on it. At most, I'll set up an infobox warning users reading my site about their browser's potential brokenness and inability to follow standards, just like I've done for MathML and SMIL (you can see such infoboxes in the page I linked above). And just like ultimately I was proven right (after several years, Google ended up fixing both their SMIL and their MathML support in Chrome), my expectation is that, particularly with more of us pushing through, the standards will once again prevail.

Remember: there is not technical justification for Google's choice. This is not about a lone free software developer donating their free time to the community and finding they do not have the mental or financial resources to provide a particular feature. This is a trillion-dollar ad company who has been actively destroying the open web for over a decade and finally admitting to it as a consequence of the LLM push and intentional [enshittification of web search]404mediaSearch.

The deprecation of XSLT is entirely political, fitting within the same grand scheme of the parasitic corporation killing the foundations of its own success in an effort to grasp more and more control of it. And the fact that the team at Apple and the team at Mozilla are intentioned to follow along on the same destructive path is not a counterpoint, but rather an endorsement of the analysis, as neither of those companies is interested in providing a User Agent as much as a surveillance capitalism tool that you happen to use.

(Hence why Mozilla, a company allegedly starved for resources, is wasting them implementing LLM features nobody wants instead of fixing much-voted decade-old bugs with several duplicates. Notice how the bug pertains the (mis)treatment of XML-based formats —like RSS.)

If you have to spend any time at all to confront the Chrome push to deprecate XSLT, your time is much better spent inventing better uses of XSLT and reporting broken rendering if/when they start disabling it, than caving to their destructive requests.

The WHATWG is not a good steward of the open web

I've mentioned it before, but the WHATWG, even assuming the best of intentions at the time it was founded, is not a good steward of the open web. It is more akin to the corrupt takeover you see in regulatory capture, except that instead of taking over the W3C they just decided to get the ball and run with it, taking advantage of the fact that, as implementors, they had the final say on what counted as “standard” (de facto if not de jure): exactly the same attitude with which Microsoft tried taking over the web through Internet Explorer at the time of the First browser war, an attitude that was rightly condemned at the time —even as many of those who did, have so far failed to acknowledge the problem with Google's no less detrimental approach.

The key point here is that, whatever the was (or was intended to be) when it was founded by Opera and Mozilla developers, it is now manifestly a corporate monster. Their corporate stakeholder have a very different vision of what the Web should be compared to the vision on which the Web was founded, the vision promoted by the , and the vision that underlies a truly open and independent web.

The WHATWG aim is to turn the Web into an application delivery platform, a profit-making machine for corporations where the computer (and the browser through it) are a means for them to make money off you rather than for you to gain access to services you may be interested in. Because of this, the browser in their vision is not a User Agent anymore, but a tool that sacrifices privacy, actual security and user control at the behest of the corporations “on the other side of the wire” —and of their political interests (refs. for Apple, Google, and a more recent list with all of them together).

Such vision is in direct contrast with that of the Web as a repository of knowledge, a vast vault of interconnected documents whose value emerges from organic connections, personalization, variety, curation and user control. But who in the WHATWG today would defend such vision?

A new browser war?

Maybe what we need is a new browser war. Not one of corporation versus corporation —doubly more so when all currently involved parties are allied in their efforts to enclose the Web than in fostering an open and independent one— but one of users versus corporations, a war to take back control of the Web and its tools.

It's kind of ironic that in a time when hosting has become almost trivial, the fight we're going to have to fight is going to be on the client side. But the biggest question is: who do we have as champions on our side?

I would have liked to see browsers like Vivaldi, the spiritual successor to my beloved classic Opera browser, amongst our ranks, but with their dependency on the rendering engine, controlled by Google, they won't be able to do anything but cave, as will all other FLOSS browsers relying on Google's or Apple's engines, none of which I foresee spending any significant efforts rolling back the extensive changes that these deprecations will involve. (We see this already when it comes to JPEG XL support, but it's also true that e.g. Vivaldi has made RSS feeds first-class documents, so who knows, maybe they'll find a way for XSLT through the polyfill that was mentioned above, or something like that?)

Who else is there? There is Servo, the rendering engine that was being developed at Mozilla to replace Gecko, and that turned into an independent project when its team was fired en masse in 2020; but they don't support XSLT yet, and I don't see why they would prioritize its implementation over, say, stuff like MathML or SVG animations with SMIL (just to name two of my pet peeves), or optimizing browsing speed (seriously, try opening the home page of this site and scrolling through).

What we're left with at the moment is basically just Firefox forks, and two of these (LibreWolf and WaterFox) are basically just “Firefox without the most egregious privacy-invasive misfeatures”, which leaves the question open about what they will be willing to do when Mozilla helps Google kill XSLT, and only the other one, Pale Moon, has grown into its own independent fork (since such an old version of Firefox, in fact, that it doesn't support WebExtensions-based plugins, such as the most recent versions of crucial plugins like uBlock Origin or Privacy Badger, although it's possible to install community-supported forks of these plugins designed for legacy versions of Firefox and forks like Pale Moon).

(Yes, I am aware that there are other minor independent browser projects, like Dillo and Ladybird, but the former is in no shape of being a serious contender for general use on more sophisticated pages —just see it in action on this site, as always— and the latter is not even in alpha phase, just in case the questionable “no politics” policies —which consistently prove to be weasel words for “we're right-wingers but too chicken to come out as such”— weren't enough to stay away from it.)

Periodically, I go through them (the Firefox forks, that is) to check if they are good enough for me to become my daily drivers. Just for you (not really: just for me, actually), I just tested them again. They're not ready yet, at least not for me, although I must say that I'm seeing clear improvements since my last foray into the matter, that wasn't even that long ago. In some cases, I can attest that they are even better than Firefox: for example, Pale Moon and WaterFox have good JPEG XL support (including transparency and animation support, which break in LibreWolf as they do in the latest nightly version of Firefox I tried), and Pale Moon still has first-class support for RSS, from address bar indicator to rendering even in the absence of a stylesheet (be it CSS or XSLT).

(A suggestion? Look into more microformats support. An auxiliary bar with previous/next/up links on pages where this is relevant would be a nice touch, for example. It's one of those little details that really made classic Opera shine. EDIT: I just found out that there's a relevant addon for Pale Moon!)

An interesting difference is that the user interface of these browsers is perceivably less refined than Firefox'. It's a bit surprising, given the common roots, but it emerges in several more and less apparent details, from the spacing between menu items to overlapping text and icons in context menus, passing through incomplete support for dark themes and other little details that all add up, giving these otherwise quite valid browsers and amateurish feeling.

And I get it: UI design is hard, and I myself suck at it, so I'm the last person that should be giving recommendations, but I'm still able to differentiate between more curated interfaces and ones that need some work; and if even someone like me who distinctly prefers function over form finds these little details annoying, I can imagine how much worse this may feel to users who care less about the former and more about the latter. Sadly, if a new browser war is to be fought to wrestle control from the corporate-controlled WHATWG, this matters.

In the end, I find myself in a “waiting” position. How long will it take for Firefox to kill their XSLT support? What will its closest forks (WaterFox in particular is the one I'm eyeing) be able to do about it? Or will Pale Moon remain the only modern broser with support for it, as a hard fork that has since long gone its own way? Will they have matured enough to become my primary browsers? We'll see in time.

Another web?

There's more to the Internet than the World Wide Web built around the HTTP protocol and the HTML file format. There used to be a lot of the Internet beyond the Web, and while much of it still remains as little more than a shadow of the past, largely eclipsed by the Web and what has been built on top of it (not all of it good) outside of some modest revivals, there's also new parts of it that have tried to learn from the past, and build towards something different.

This is the case for example of the so-called “Gemini Space”, a small corner of the Internet that has nothing to do with the LLM Google is trying to shove down everyone's throat, and in fact not only predates it, as I've mentioned already, but is intentionally built around different technology to stay away from the influence of Google and the like.

The Gemini protocol is designed to be conceptually simpler than HTTP, while providing modern features like built-in transport-level security and certificate-based client-side authentication, and its own “native” document format, the so-called gemtext.

As I said in my aforementioned Fediverse thread:

There's something to be said about not wanting to share your environment with the poison that a large part of the web has become, but at the same time, there's also something to be said about throwing away the baby with the bathwater. The problem with the web isn't technical, it's social. The tech itself is fine.

I'm not going to write up an extensive criticism of the Gemini Space: you can find here an older opinion by the author of curl, (although it should be kept in mind that things have changed quite a bit since: for example, the specification of the different components has been separated, as suggested by Daniel), and some criticism about how gemtext is used.

I'm not going to sing the praises of the Gemini protocol or gemtext either, even though I do like the idea of a web built on lightweight markup formats: I would love it if browsers had native support for formats like Markdown or AsciiDoc (and gemtext, for the matter): it's why I keep the AsciiDoctor Browser Extension installed.

But more in general, the Web (or at least its user agents) should not differentiate. It should not differentiate by protocol, and it should not differentiate by format. We've seen it with image formats like MNG being unfairly excluded, with [motivations based on alleged code bloat][nomng] that today are manifest in their idiocy (and yes, it hasn't escaped my that even Pale Moon doesn't support the format), and we're seeing it today with JPEG XL threatened with a similar fate, without even gracing us with a ridiculous excuse. On the upside, we have browsers shipping with a full-fledged PDF reader, which is a good step towards the integration of this format with the greater Web.

In an ideal world, browsers would have not deprecated older protocols like Gopher or FTP, and would just add support for new ones like Gemini, as they would have introduced support for new (open) document formats as they came along.

(Why insist on the open part? In another Fediverse thread about the XSLT deprecation I had an interesting discussion with the OP about SWF, the interactive multimedia format for the Web at the turn of the century. The Adobe Flash Player ultimately fell out of favour, arguably due to the advent of mobile Internet: it has been argued that the iPhone killed Flash, and while there's some well-deserved criticism of hypocrisy levelled against Steve Jobs infamous Thoughts on Flash letter, it is true that what ultimatelly truly killed the format was it being proprietary and not fully documented. And while we might not want to cry about the death of a proprietary format, it remains true even today that the loss of even just legacy suport for it has been a significant loss to culture and art, as argued by @whiteshark​@mastodon.social.)

A Web of interconnected software?

It shouldn't be up to the User Agent to determine which formats the user is able to access, and through which protocol. (If I had any artistic prowess (and willpower), I'd hack the “myth of consensual X” meme representing the user and the server saying “I consent”, and the browser saying “I don't”.) I do appreciate that there is a non-trivial maintenance cost that grows with the number of formats and protocols, but we know from classic Opera that it is indeed quite possible to ship a full Internet suite in a browser packaging.

In the old days, browser developers were well-aware that a single vendor couldn't “cover all bases”, which is how interfaces like the once ubiquituous NPAPI were born. The plug-in interface has been since removed from most browsers, an initiative again promoted by Google, announced in 2013 and completed in 2015 (I should really add this to my previous post on Google killing the open web, but I also really don't feel like touching that anymore; here will have to suffice), with the other major browsers quickly following suit, and its support is now relegated only to independent browsers like Pale Moon.

And even if it can be argued that the NPAPI specifically was indeed mired with unfixable security and portability issues and it had to go, its removal without a clear cross-browser upgrade path has been a significant loss for the evolution of the web, destroying the primary “escape hatch” to solve the chicken-and-egg problem of client-side format support versus server-side format adoption. By the way, it was also responsible for the biggest W3C blunder, the standardization of DRM for the web through the so-called Encrypted Media Extensions, a betrayal of the W3C own mission statement.

The role of multimedia streaming in the death of the User Agent

The timeline here is quite interesting, and correlates with the already briefly mentioned history of Flash, and its short-lived Microsoft Silverlight competitor, that were largely responsible for the early expansive growth of multimedia streaming services in the early years of the XXI century: with the tension between Apple's effort to kill Flash and the need of emerging streaming services like Netflix' and Hulu's to support in-browser multimedia streaming, there was a need to improve support for multimedia formats in the nascent HTML5 specification, but also a requirement from the MAFIAA partners that such a support would allow enforcing the necessary restrictions that would, among other things, prevent users from saving a local copy of the stream, something that could be more easily enforced within the Flash players the industries had control over than in a User Agent controlled by the user.

This is where the development of EME came in in 2013: this finally allowed a relatively quick phasing out of the Flash plugin, and a posteriori of the plugin interface that allowed its integration with the browsers: by that time, the Flash plugin was by and large the plugin the API existed for, and the plugin itself was indeed still supported by the browsers for some time after support for the API was otherwise discontinued (sometimes through alternative interfaces such as the PPAPI, other times by keeping the NPAPI support around, but only enabled for the Flash plugin).

There are several interesting consideration that emerge from this little glimpse at the history of Flash and the EME.

First of all, this is one more piece of history that goes to show how pivotal the year 2013 was for the enshittification of the World Wide Web, as discussed already.

Secondly, it shows how the developers of major browsers are more than willing to provide a smooth transition path with no user intervention, at least when catering to the interests of major industries. This indicates that when they don't, it's not because they can't: it's because they have a vested interest in not doing it. Major browser development is now (and has been for over a decade at least) beholden not to the needs and wants of their own users, but to those of other industries. But I repeat myself.

And thirdly, it's an excellent example, for the good and the bad, of how the plugin interface has helped drive the evolution of the web, as I was saying.

Controlled evolution

The removal of NPAPI support, followed a few years later by the removal of the (largely Chrome-specific) PPAPI interface (that was supposed to be the “safer, more portable” evolution of NPAPI), without providing any alternative, is a very strong indication of the path that browser development has taken in the last “decade plus”: a path where the Web is entirely controlled by what Google, Apple and Microsoft (hey look, it's GAFAM all over again!) decide about what is allowed on it, and what is not allowed to not be on it (to wit, ads and other user tracking implements).

In this perspective, the transition from plugins to browser extensions cannot be viewed (just) as a matter of security and portability, but —more importantly, in fact— as a matter of crippled functionality: indeed, extensions maintain enough capabilities to be a vector of malware and adware, but not enough to circumvent unwanted browser behavior, doubly more so with the so-called Extension Manifest V3 specifically designed to thwart ad blocking as I've already mentioned in the previous post of the series.

With plugins, anything could be integrated in the World Wide Web, and such integration would be close to as efficient as could be. Without plugins, such integration, when possible at all, becomes clumsier and more expensive.

As an example, there are browser extensions that can introduce support for JPEG XL to browsers that don't have native support. This provides a workaround to display such images in said browsers, but when a picture with multiple formats is offered (which is what I do e.g. to provide a PNG fallback for the JXL images I provide), this results in both the PNG and JXL formats being downloaded, increasing the amount of data transferred instead of decreasing it (one of the many benefits of JXL over PNG). By contrast, a plugin could register itself a handler for the JPEG XL format, and the browser would then be able to delegate rendering of the image to the plugin, only falling back to the PNG in case of failure, thus maximizing the usefulness of the format pending a built-in implementation.

The poster child of this lack of efficiency is arguably MathJax, that has been carrying for nearly two decades the burden of bringing math to the web while browser implementors slacked off on their MathML support. And while MathJax does offer more than just MathML support for browers without native implementations, there is little doubt that it would be more effective in delivering the services it delivers if it could be a plugin rather than a multi-megabyte (any efforts to minimize its size notwithstanding) JavaScript library each math-oriented website needs to load.

(In fact, it is somewhat surprising that there isn't a browser extesion version of MathJax that I can find other than a GreaseMonkey user script with convoluted usage requirements, but I guess this is the cost we have to pay for the library flexibility, and the sandboxing requirements enforced on JavaScript in modern browsers.)

Since apparently “defensive writing” is a thing we need when jotting down an article such as this (as if it even matters, given how little attention people give to what they read —if they read it at all— before commenting), I should clarify that I'm not necessarily for a return to NPAPI advocating. We have decades of experience about what could be considered the actual technical issues with that interface, and how they can be improved upon (which is for example what PPAPI allegedly did, before Google decided it would be better off to kill plugins entirely and thus gain full control of the Web as a platform), as we do about sandboxing external code running in browsers (largely through the efforts to sandbox JavaScript). A better plugin API could be designed.

It's not going to happen. It is now apparent that the major browsers explicitly and intentionally do not want to allow the kind of flexibility that plugins would allow, hiding their controlling efforts behind security excuses. It would thus be up to the minority browsers to come up with such an interface (or actually multiple ones, at least one for protocols and one for document types), but with most of them beholden to the rendering engines controlled by Google (for the most part), Apple (some, still using WebKit over Blink), and Mozilla (the few Firefox forks), they are left with very little leeway, if any at all, in terms of what they can support.

But even if, by some miraculous convergence, they did manage to agree on and implement support for such an API, would there actually be an interest by third party to develop plugins for it? I can envision this as a way for browsers to share coding efforts in supporting new protocols and formats before integrating them as first-class (for example, the already mentioned Gemini protocol and gemtext format could be implemented first as a plugin to the benefit of any browsers supporting such hypothetical interfaces) but there be any interest in developing for it, rather tha just trying to get the feature implemented in the browsers themselves?

A mesh of building blocks

Still, let me dream a bit of something like this, a browser made up of composable components, protocol handlers separate from primary document renderers separate from attachment handlers.

A new protocol comes out?
Implement a plugin to handle that, and you can test it by delivering the same content over it, and see it rendered just the same from the other components in the chain.
A new document format comes out?
Implement a plugin to handle that, and it will be used to render documents in the new format.
A new image format comes out?
Implement a plugin to handle that, and any image in the new format will be visible.
A new scripting language comes out?
You guessed it: implement a plugin to handle that …

How much tech would have had a real chance at proving itself in the field if this had been the case, or would have survived being ousted not by technical limitations, but by unfriendly corporate control? Who knows, maybe RSS and Atom integration would still be trivially at everybody's hand; nobody would have had to fight with the long-standing bugs in PNG rendering from Internet Explorer, MNG would have flourished, JPEG XL would have become ubiquituous six months after the specification had been finalized; we would have seen HTML+SMIL provide declarative interactive documents without JavaScript as far back as 2008; XSLT 2 and 3 would have long superseded XSLT 1 as the templating languages for the web, or XSLT would have been supplanted by the considerably more accessible XQuery; XHTML2 would have lived and grown alongside HTML5, offering more sensible markup for many common features, and much-wanted capabilities such as client-side includes.

The web would have been very different from what it is today, and most importantly we would never would have had to worry about a single corporation getting to dictate what is and what isn't allowed on the Web.

But the reality is much harsher and darker. Google has control, and we do need to wrestle it out of their hands.

Resist

So, do not comply.
Resist.
Force the unwanted tech through.
Use RSS.
Use XSLT.
Adopt JPEG XL as your primary image format.
And report newly broken sites for what they are:
a browser fault, not a content issue.

Post scriptum

I would like to add here any pièces de résistance for XSLT.

I'm going to inaugurate with a link I've just discovered thanks to JWZ:

  1. xslt.rip (best viewed with a browser that supports XSLT; viewing the source is highly recommended);

  2. Rivista Journal is a «syndicated publishing system for XMPP»; it is based on the combination of two XML formats and protocols, and , and can present content directly to the web via an XSLT transform; I was made aware of this platform and shown a very opinionated server running it by @lorenzo​@snac.bobadin.icu;

  3. and last but not least (yeah I know, doesn't make much sense with the current short list, but still), a shameless plug of my own website, of course, because of the idea to use XSLT not to produce HTML, but to produce SVG (in addition to, of course, my überprüfungslisten, and, more recently, the 12 days of Christmas generator).

Made the news (and other related articles)

I've apparently “made the news” (again).

I have read the comments, and little has changed since last time. The only comment worth responding is from the user that prefers EME to the mess of plugins we used to have. I understand where they are coming from, but I disagree on a matter of principle: DRM shouldn't exist, and it should never have been standardized in violation of the W3C mission statement; as a plus, and the more cumbersome it is for the user, the better it represents its negative nature.

Aside from that, the comments keep missing the point, and weight their personal dislike for XSLT more than the important role it plays on the open and indie web. Just seeing how many people follow the corporate recommendation to apply it server-side and distribute the “rendered” HTML shows these are people who have no idea what they're talking about: just as an example, my sparklines are still 10× smaller as XML data plus XSLT than as the rendered SVG, with benefits amortizing over multiple sparklines per page (due to common XSLT), and over time (data changes, XSLT does not; also, the actual ratio of SVG to XML data is much higher that 10×, and as data increases, the ratio tends to that). For smaller, cheaper and/or home hosting, XSLT remains a clear winner.

You don't like the syntax? Fine. But use that energy to pressure for to be available instead of towing the corporate line about the demise of XSLT.

Shields up

Depending against the LLM scraper flood.

Enough is enough

Scraping the open web for anything that can be fed to the LLMs that are passed off as artificial “intelligence” has become so aggressive that even I have finally come to terms with the need to protect myself and my online presence from it.

I've always had a “moderately tolerant” stance towards these kind of phenomena. For example, I was very late in adopting an ad blocker, because I felt that there was a sense of “equivalent exchange” in benefiting from free content while tolerating ads that I would have gladly gone without; even as the amount and invasiveness of ads grew, I resisted, until it was finally too much, and finally deployed uBlock Origin across all my browsers and machines.

Similarly, I've tolerated scrapers as long as they've been well-behaved, even when questionably more expansive and persistent than search engine web crawlers. But in the last few months, things have changed. Scrapers have increased in numbers, and more and more often are poorly coded enough to bog down my home server in what cannot be describe in any other way than a DoS.

First steps

I had already started setting up some precautionary measures such as the well-known fail2ban intrusion prevention to protect the machine against secure shell exploitation attempts, but that was all. (In fact, in reviewing by fail2ban configuration for what I'm going to discuss, I found out I had been way too tolerant, and have taken the opportunity to tighten that part of the process too, but that's beyond the scope of this article.)

From time to time, however, I was seeing some intense traffic against my gitweb instance that was obviously indiscriminate scraping activity, which would bring the load on the machine to ridiculous levels.

The first serious step was setting up a “manual” category in my fail2ban configuration to jail the most egregious offenders; for the curious, I've banned the entire 146.174.0.0/16 and 202.76.0.0/16 subnets, which may be a bit more aggressive than necessary, but a /8 wasn't enough and I honestly couldn't care enough to find the smallest mask; sorry if anybody got caught.

Because obviously that's part of the problem: to make it harder to use tools like fail2ban, these scrapers implement ban evasion in a number of ways, ranging from credible user agent identifiers (even though access patterns are “obviously” non-human, when seen by a human) to —most importantly— spread out attacks (i.e. more of a DDoS), which forces hosters on the defense, playing whack-a-mole on individual IPs while the attackers (scrapers) keep jumping from one to the next.

Enter ansuz

I'm not the only one with this problem, obviously. JWZ of Netscape and XScreenSaver fame, for example, has written extensively (for example, here's his latest musings on the topic) about the honeypot he has set up to poison the scrapers. (Highly recommend reading the comments too for additional recommendations from other people.)

But arguably, what finally got me to get a move on (aside from an assault peak) were some recent Fediverse posts by @ansuz​@social.cryptography.dog that were ultimately collected and expanded into a blog post (with an interesting follow-up.)

The reason why this caught my attention is because it presented a simple (trivial, even) way to catch (some) scrapers: a “neverlink”, i.e. a link that, by virtue of being commented out or explicitly tagged as “not to be followed” and hidden by style, would be invisible to all but the most aggressive scrapers.

(Update: since the nofollow attribute is intended for ranking rather than crawling, and there is no clear way to indicate that a specific link should not be followed for crawling, I have also added the neverlinks to robots.txt for exclusion by all user agents. We'll see if this helps refine their use for scraper detection.)

Since my most heavily bombarded subdomain was the gitweb, I took the opportunity to update it to the latest and change it so as to add two neverlinks, a commented link in the head tag, and a nofollow, display: none one in the body.

Moving forward: the tarpit

I was actually surprised when there were a couple of hits to the (404ing) linked page, so I started working on extending the effectiveness of the “scraper detection”: rather than just banning any IP trying to fetch the neverlink, which would be of limited effectiveness given the extensive use of host jumping that results in each IP fetching a single URL, I took some of my free time yesterday to create a tarpit, something which I had been pondering for months (so yes, arguably, @ansuz's post finally made me do it, and as usual it took months; I'm not Oblomov for nothing).

(Without going too much into details: a honeypot is something that looks palatable, so attachers are encouraged to get there and waste their time getting stuck there; a tarpit is something that is intentionally designed to slow things down.)

The tarpit is a PHP script that provides a standard HTML, but the entire content is (1) randomly generated and (2) at a slow rate, pausing a little bit after each character.

I must say the effect that I'm using to slow things down is actually fascinating to look at, giving a bit of an old “typewriter” effect. But of course it's not there for the aesthetics: the idea is to get the bot hooked up for several seconds, (potentially hours, with millions of characters generated in a single run, if it's ever left to finish, although from what I've seen these bots will generally time out after a few seconds) instead of hammering my servers with hundreds of requests per second.

What will there be next?

Both the tarpit and the scraper detection are under development. I'm trying out a few different ideas to see what's more effective. The new version that is coming up momentarily includes several improvements already.

First of all, neverlinks now come in different flavours (with or without hostname) for two different kinds (head and body).

Secondly, the tarpit itself now includes neverlinks.

And thirdly, since this time the HTML is dynamically generated by PHP, I've added “session IDs” that make them unique. The intent here is to make the scrapers try to keep accessing the tarpit with multiple requests, an idea that is probably better served by adding to the tarpit an infinitely generated maze of twisty little passages, all alike, which I'll most likely end up looking into.

What's missing

With neverlink, catching the stupidest scrapers is easy. With the IP jumping, an effective tarpit needs to detect them on first connection, and with the randomized user agent this is quite non-trivial, since the “hook” is undistinguishable from a human connection (who is subsequently trivial to see from the fetching of related resources such as CSS and JS). This means that post-processing of the logs is necessary to find these patterns, (no first-connection detection), potentially with subnet-wide banning or tarpit redirection. I wonder if it would be possible to add some warning text to the tarpit so that if a human ever gets caught in it they'll see the warning text even while the browser hangs fetching the rest of the page. And is that even worth it?

(And yes, this is why we can't have nice things.)

Not Your Personal Computer

Just because it's in your hands, it doesn't mean it's yours.

In the second half of the first decade of the XXI century, Apple ran a memetic advertising campaign (Get a Mac) that famously featured a “casually” clothed young man representing its computers, and a perceivably if slightly less young man dressed in a suit, representing “the competition”, introducing themselves with «Hello, I'm a Mac»/«and I'm a PC».

I never liked those ads, finding them conceptually wrong at such a fundamental level as to be distracting. This write-up is not an analysis of the many ways in which those ads were wrong, but the main fault in the campaign was the presentation of a false dichotomy based on a fundamentally broken misnomer, with the ads largely focusing on differences between the Microsoft Windows and OS X operating systems, completely ignoring the growing adoption of Linux on the desktop, and gliding over the “hardware convergence” that had brought the Mac of the time so deeply into the camp of compatibility with the hardware of the IBM PC descendants, that Apple itself offered the tools necessary to install Windows on their hardware.

The biggest lie in the ad was the pretense that the Mac was not a PC, even in the restricted meaning of IBM PC compatible.

Or was it?

What's a PC?

I'm well aware that I'm in the minority defending a definition of Personal Computer that breaks free of the Wintel monopoly that has been strangling it for the better part of the last half century, but as someone who has been running Linux on commodity hardware as my primary when not only operating system for two decades and counting, I can attest that there's more to personal computing than the infamous combo.

In fact —and this will actually be the main point of the article— there is a growing discrepancy between said combo and what a personal computer is, or rather should be. But to get there, we should start by making very clear what a personal computer is.

A personal computer is a computer that does what its user wants it to do

I'm not picky about the concept of computer. I'm fine with any hardware capable of general computing falling into the category: desktop computers, tablets, “smart” phones, and soon possibly even pregnancy tests1. To run with a now classic meme, anything that can run Doom.

I was actually wary of going with “user” in the definition. At first, I wanted to go with something like “assignee”, but ultimately I convinced myself that “user” works fine here. For example, if I let someone else provisionally use my personal computer, I'm fine with them still not being able to do what they want, because it's not their personal computer, but as long as the computer doesn't prevent me from doing what I want with it, it remains my personal computer.

And of course I'm fine with not being able to do on my personal computer things that are materially impossible within the constraints of the hardware. But, and this is the key, I should be the one in control.

A personal computer is a computer over which the user has control

This is why most “smart” phones and tablets and such are not personal computers: it's not a matter of form factor or other hardware choices,but a matter of control. If I cannot install the operating system I want, if the operating system they ship with prevents me from doing things like making a full backup or even just opening any file with any application, I am not in control: somebody else has made decisions for me, and if I cannot subvert them on my device, then it's not really my personal device.

I am even fine with restrictive defaults, as I'm aware there are benefits to them for the general populace. But if the vendor does not provide a means to overturn the defaults and allow the user to gain complete, unfettered access to the device, then it's not and it cannot be a personal device, a personal computer. It's someone else's machine that I've been provisionally allowed to use.

Beyond the personal

There is no cloud. It's just someone else's computer.

The early XXI century was also the period in which Cloud computing started gaining widespread recognition and adoption, a growth soon met with a healthy does of skepticism summarized in the quote above, a quote so successful that cloud computing business and pundit went on a spree trying to debunk it, largely completely missing the point of the quote.

It is undeniably true that the “just” in the quote is carrying a lot of weight. But it's also undeniably true that the whole point of cloud computing is to delegate to someone else the management of the hardware your software runs on. And that's the whole point the quote is making. There is nothing “magic” about the cloud. You could achieve the exact same results as “running things in the cloud” by shelling out money to purchase equivalent hardware and manage it yourself —it's entirely a matter of whether or not it's worth the price.

And of course, one of the things that you're giving up with cloud computing is control: you are trusting a third-party to provide the services they claim at the convened price. And there's literally nothing you can do if they choose to terminate your account with no recourse by mistake or because you're persona non grata to the fascist regime du jour.

So while it could be argued that it's not just “someone else's computer” (or «“just” by appropriate definitions of “just”») it's undeniable that it's not your (personal) computer.

The death of personal computing?

This article has been prompted by a tightening of the grasp GAFAM has on personal computing, such as the recent news about moving to kill sideloading on Android and further closing down development of the operating (hindering alternatives built on the Android Open Source Project in the process), or the “cloud-first” approach to data storage is pushing for its office suite and operating system.

This has been a long time coming, from all sides.

For example, Microsoft attempted to leverage its weight in the “personal computer” market to make UEFI's Secure Boot a requirement for Windows 8, and although the massive pushback they received ultimately led to a reversal in the form of allowing said Secure Boot to be disabled (thus making it possible to install operating systems without having to go through Microsoft for the appropriate cryptographic signing keys) at least for non-ARM machines, the fact remains that Secure Boot takes out of the hands of the user control about which software can run, restricting it to what the machine vendor (and Microsoft) allow, so that a machine where Secure Boot cannot be disabled, or where at the very least users are not allowed to register new cryptographic keys, cannot be classified as a personal computer. (Yes, I am aware Secure Boot has its uses. Again, that's not the point.)

Apple has been fighting the Digital Markets Act, and they straight and clear claim that the DMA is bad because it forces them (Apple) to let users download and run the software they want on their iPhones. Behind the paint of “security and privacy” pretense, the main issue is, again, control. Control that the corporate vendor is being forced to lose, to the benefit of the user, on a platform that has historically being designed as a “vertically integrated user experience” (which is another way to tell “corporate-controlled walled garden” and which I like to call “the Apple virus”: an approach to computing that seems benevolent if not even beneficial to the end-user, until they try to step out of the inflexible constraints of the design: remember You're holding it wrong?).

And of course I've already mentioned in passing how the giants that currently control the web browsing market, Google and Apple, with the connivance of the purported “opposition” (Mozilla), are removing user control from their web browsers, perverting them from user agents into corporate surveillance tools.

But it gets worse.

The “Apple virus” has moved beyond corporations and leaked into the free software world, from the “our users don't really know what they want” attitude of GNOME developers to Wayland's “you can only do what the compositor allows” design passing through systemd's “our way or the highway” steamroll, all coincidentally aided by RedHat's “gentle push” for the adoption of anything they develop through a “vertical integration” that shouldn't even exist in .

(Remember when the poor design of GNOME and dbus user session management got so fucked up that logging out failed to work correctly and this had to be tapered over by changing a systemd default which in turn broke everybody's terminal multiplexers, pissing off a lot of people? And let's not even talk about the PulseAudio clusterfuck.)

In the grand scheme of things, it matters little that the entire software stack is free software: just like for Chromium and the web, the money behind the development of the mainline implementations is the only thing that matters, and this affects the entire ecosystem. (And don't even bother trying to pushing back with arguments about “intent”. The purpose of a system is what it does.)

There's a classic screenshot from a social post making the rounds, which I'm going to quote here because it's quite relevant:

Sun Solaris used to be the OS that required overpriced proprietary hardware and still couldn't compete with Linux. That OS is now MacOS.

MacOS used to be the colorful and friendly walled garden OS that your non-techie parents would enjoy but was completely useless to you as a power user. That OS is now Windows.

Windows used to be the OS that could run a lot of apps, but was a headache to setup and maintain correctly and would sometimes blow up for no reason. That OS is now Linux.

Linux used to be the techie and developer oriented command-line OS that was lacking in desktop apps and might not support your hardware, but once you got it going, was rock solid and had no limits. That OS is now FreeBSD.

This may seem like all a digression, but it is actually a sharp representation of the shift in the Overton window computing (and in particular personal computing) has been subject to in the last couple of decades.

Do you own a personal computer?

Do you have a machine that you control?

Do you have a machine on which you can install the operating system of your choice? How hard is it to install a different operating system? Are there operating systems you cannot install because the hardware vendors refuse to provide the necessary drivers and/or specifications that would allow said drivers to be developed?

Can you customize your execution environment to your needs and preferences? How hard is it to do that?
Can you do a full backup of your system, to be restored to exact functionality if anything goes wrong?

Can you run any software of your choice on that machine?
If there are protections in place in the default configuration, can they be bypassed? How hard is it to bypass them, if it is possible at all?
Can the software you want to use interoperate smoothly with the other software you want to use? Are there constraints or restrictions beyond what may be expected by the software design, imposed by third-party entities?

Do you own a personal computer?


  1. yes I know the pregnancy test in the videos is not actually a computer, but just used as a display ↩

Google is killing the open web

The juggernaut is taking advantage of its dominant position to enclose and destroy the commons.

Google is managing to achieve what Microsoft couldn't: killing the open web. The efforts of tech giants to gain control of and enclose the commons for extractive purposes have been clear to anyone who has been following the history of the Internet for at least the last decade, and the adopted strategies are varied in technique as they are in success, from Embrace, Extend, Extinguish () to monopolization and lock-in.

What I want to talk about in this article is the war Google has been waging on XML for over a decade, why it matters that they've finally encroached themselves enough to get what they want, and what we can do to fight this.

A little bit of history

Google entered the browser market at a time when web development was starting to see the light again after Microsoft's “win” of the First browser war through the abuse of its operating system's monopoly by shipping its Internet Explorer for free and thus cutting off «Netscape's air supply», as intended.

What managed to break through Microsoft's short-lived victory was an alliance of browsers (my favorite Opera on its Presto engine, Mozilla's Firefox on its Gecko engine, and the newborn Safari from Apple, whose WebKit engine was forked from the KHTML engine that was being developed for the KDE Linux desktop environment) that decided to leverage their standards compliance to reinforce each other's position against the crippling effect of Microsoft's dominance —a dominance that Microsoft tried to protect resorting to the vilest tricks.

Google entered the market heavily abusing its dominance in web search to push the adoption of its Chrome browser, a practice not unlike the one used by Microsoft to push the adoption of IE, and of equally questionable legality and moral standing, a thing which was frequently overlooked with several excuses, not least the fact that Chrome was built on an open source core, Chromium, that was mostly assembled from software and libraries developed by other companies (primarily, Mozilla and Apple).

In the years of Chrome's release, the Internet was undergoing massive changes, with the emergence of centralized social media platforms like Facebook that started eroding the previous distributed social network of blogging platforms, Google's own Gmail mail service gaining ground over both ISP offering and other “cloud” offers like Yahoo!'s and Microsoft's Hotmail, and mobile connectivity growing beyond “professionals”, thanks mostly to Apple's iPhone and Google's own at-the-time recent acquisition of Android, plus some soon-to-be minor players I've talked about in the past.

For the purposes of our discussion, these changes had two major points of focus in terms of website development.

On the one hand, web developers started giving more attention to standards compliance, as it gave them more opportunities towards the growing user base of mobile users, which were unlikely to have the desktop-dominant Internet Explorer as browser. This helped accelerate the demise of IE (which was still going strong when Chrome was first released) —whose flaky standards compliance was ultimately responsible for its demise nearly a decade later, and subsequently for the complete discontinuation of its line (after the brief attempt of a reprise under the legacy Edge moniker)— and emboldened the “underdogs” of the time (Mozilla, Apple, Opera).

On the other hand, there was a distinct shift towards centralization of web services, which in turn accelerated the development of web applications, graphical user interfaces for the underlying (centralized) services that effectively relied on the browser(s) as cross-platform toolkits, an approach that would later give birth to the abomination known as Electron and the security nightmare better known as node.js.

Of course, Google had a primary interest in making web apps a credible alternative to desktop applications, what with their already-mentioned mail service and the recently-acquired-and-turned-web-app Google Maps. And since their browser was mostly a collection of existing software stapled together, they could focus their development effort in creating a faster implementation of , better known as V8. Never mind the fact that even years later native implementations of any useful feature would remain faster and cheaper than JavaScript.

But even before their direct involvement in browser development, Opera and Mozilla had started taking their distance from the W3C standardizing efforts and set up the WHATWG, a consortium of browser developers dedicated to coordinating rapid development of new web features without passing through the perceived slow W3C standardization process.

In truth, as it would become clear a few years later —and even more so with Google effectively taking over the WHATWG and turning into a sockpuppet to give a semblance of independence to their choices— the main purpose of the WHATWG was to hijack the development of web technologies to the benefits of the corporate investors, whereas the W3C, with all its flaws, had mostly given priority to features that would be of more general interest.

(It is not by chance that the most controversial standard to ever come out of the W3C has probably been the Encrypted Media Extensions, released as a failed attempt to remain relevant in the web space, and resulting instead in a critical strike against their own credibility as stewards of the open web.)

Google's war on XML as a proxy for the war against the open web

Arguably, the turning point for the centralization of the web was the year 2013. This is essentially the year where GAFAM stopped trying to pretend they liked to play nice, and started to “pull the reins in” on interoperability. Coincidentally, it's also the year Opera stopped being Opera, but I'll talk about this some more in the afterword.

Let's see a few of the major events relevant to our discussion (you can find some more in Part 2 of this series):

  1. 2013 is the year Google decides to sunset Google Reader, a (if not the most) widely used web feed aggregator (for and Atom feeds); the officially given reason is that usage was declining; “coincidentally” this happens shortly after them shutting down their AdSense for Feeds (for unspecified reasons, which can be most likely be summarized as “nobody wants ads in their feeds”, and especially not video ads —not that they won't keep trying);
  2. 2013 is the year Google decides to close XMPP server-to-server federation in their Google Chat service; Facebook will to the same with their Messenger product the following year;
  3. 2013 is the year Google first proposes the removal of XSLT, a proposal that is so unpopular that it will continue receiving comments against it as far as 5 years later (the last comment in the thread is from 2018);
  4. 2013 is the year Google removes the just-introduced MathML support from Chrome; it will take 10 years and an external company to bring support back into the browser.

This was just the beginning. Several other actions were undertaken or attempted in the following years. In the following list, while most pertain to the proxy war against XML, a few are not directly related, but help show that Google's overreaching attempts to gain complete control of the Web go far beyond XML (and why even those that don't like XML and are more than happy to see it gone should beware).

  1. at least as far back as 2014, Google starts exploring the idea of hiding URLs; they tried this again at least in 2015, in 2018, in 2019, in 2020; and yes, I'm aware Apple has been doing the same in Safari for over a decade, but Apple's whole shtick is gated communities and user lock-in (so hardly an example to follow), and most importantly Apple doesn't have as much power over the Web as Google does; (yes, this is one of those attempts that is not directly XML related);
  2. in 2015, the WHATWG introduces the Fetch API, purportedly intended as the modern replacement for the old XMLHttpRequest; prominently missing from the new specification is any mention or methods to manage documents, in favor of JSON that instead gets a dedicated document body presentation method;
  3. in 2015, Google proposes deprecating SMIL, the standard for declarative animation and interactivity in SVG; I have written in the past about the usefulness of and why not only it must not be deprecated, but its use should actually be integrated into , as noted by the W3C;
  4. in 2015, Google also announces the Accelerated Mobile Pages project, purportedly as a way to make web pages more accessible and faster to load on mobile, which coincidentally relied heavily on leveraging large CDNs like Google to cache contents (and optionally pre-render it); nevermind the facts that the seminal Responsive Web Design article on how to design for different screen sizes was from 2010, that the srcset attribute for images to support different-sized screens was already supported by at-the-time current desktop and mobile browsers, and that the primary reasons why webpages weren't fast to load on mobile was because of the so-called web obesity crisis which had been known since 2012 at least, and that the primary reason why AMP pages loaded faster was because they came with one tenth of the useless crap attached to the “regular” pages —so the only actual benefit from AMP was to force webdevs into writing leaner pages, with at least a modicum of responsivity, (and of course, for Google, to encourage them to funnel everything through Google's —or any other tech giant— servers for easier metric collection, faster ad serving, and more user profiling);
  5. still in 2015, Google announces the intent to deprecate the keygen element, a little-known but powerful security feature that simplified the generation of user-controlled cryptographic key pairs for secure communication between the client and server; you can read more about it in Tim Berners-Lee reaction, and in Hugo Landau's relevant “Memoir from the old web”; of note, TBL's primary interest in this element was to help build Solid, an incremental improvement on the WWW to make it more resistant to the centralization his original idea had been perverted into (see also the relevant issue in Solid's issue tracker); the importance of simplified handling of user certificates and the role they play in Mutual authentication can also be surmised by it being one of the features of the lightweight Gemini protocol that was also born as a response to the centralization and consequent complexification of the World Wide Web;
  6. in 2018, Mozilla removes RSS support from Firefox starting from version 64, and actively prevents opening them in-browser, giving them an even worse treatment than generic XML files, for which it keeps showing the structure (for example: compare how your browser handles the usage stats XML for this column with the way it handles the RSS feed and the Atom feed); the official reason is that the “Live Bookmarks” feature couldn't be easily ported to the new architecture; the fact that support for RSS could still be implemented via extensions, that Mozilla did not ship an extension to replace even just partially the Live Bookmarks feature —leaving its users in the hands of potentially insecure third-part extensions— and that feeds got an even worse treatment than generic XML document show that the official reason is just an excuse; this is one of the major cracks in the Mozilla façade, as it starts to show that their existence is just controlled opposition for Google to avoid antitrust issues —what Google wants goes, and Google doesn't want web feeds, so web feeds have to go;
  7. in 2019, Google announces a number of changes to purportedly make browser extensions “safer” for users, starting the work for what would later become the Extension Manifest V3; it is immediatelly apparent that at least some of the changes introduced are primarily intended to prevent adblockers from working, and don't actually do much to improve security or privacy; despite several reports against the at best ineffective and at worst detrimental changes proposed, in the next years Google will move on with the timeline to deprecate the previous extension APIs and finally succeed in its ad-blocking-blocking efforts; although this change is not currently relevant for the XML/XSLT focus of this article, I mention it not only because it is one of the many examples of Chrome becoming less of a User Agent and more of a “Google tool on your computer” over time, but because this aspect is important for the future of client-side XML and XSLT, as I will discuss later;
  8. in 2021, Google tried to remove some common JavaScript interaction idioms, again citing “security” as reason, despite the proposed changes being much more extensive than the purported security threat, and better solutions being proposed; you can read about it here and notice behavioral patterns similar to the assault I want to talk about here;
  9. in 2023, Google renames their chatbot from Bard to Gemini thereby completely eclipsing the 4-year-old independent protocol by the same name; this is possibly coincidental, which would make it the only unintentional attack on the open web by Google in the last 15 or so years —and at this point even that is doubtful;
  10. in 2023, Google proposes the Web Environment Integrity API, of which I've talked at the time; although this is only tangentially related to the XML-focused initiatives that are the subject of this article, it is relevant to mention here as it is another example indicative of the push to make browsers less User Agents and more corporate-controlled spyware;
  11. in 2023, Google kills off support for the JPEG XL image format, introduced barely two years before, depriving the Internet of a format that would have finally delivered on the promise of a unified format to provide competitive compression —both lossless and lossy—, progressive decoding, transparency, and animation, which would have allowed it to replace the widespread (and less efficient) JPEG, PNG and GIF formats that have been the staple of the web for the last decades; this also is not directly related to XML (unless the reason for the hate is that JPEG XL supports XMP metadata), but should be filed under “against the open and indie web” as it prevents at the very least the reduction of hosting and bandwidth costs that a transition to JPEG XL would offer.
  12. in 2023, after downranking plain HTTP websites for years, Google announces an even more aggressive stance to push for HTTPS adoption; I have a lot to say about the purported “security” of HTTPS (and in particular about how it doesn't mean what most people think it means, particularly concerning the distinction between the integrity of the connection between the client and server versus the authenticity of the content, particularly of relevance for both corporate silos and federated social networks), but that's material for a different article, so here I'll just link to a few writeups by Dave Winer (one of the inventors of RSS), especially this particularly prophetic one, and point out the hypocrisy of claiming an interest for security by the same company that pushed for the removal of keygen;
  13. in 2024, Google discontinues the possibility to submit RSS feeds for review to be included in Google News; how Google now discovers new News sites or how does it gather information about published news is now completely opaque;
  14. in 2025, Google announces a change in their Chrome Root Program Policy that within 2026 they will stop supporting certificate with an Extended Key Usage that includes any usage other than server (relevant Fediverse thread, other relevant Fediverse thread); this effectively kills certificates commonly used for mutual authentication (hey look, it's the keygen suppression theme again!) that include both client and server roles; coincidentally this also makes it harder to implement S/MIME, unless you go through Google's services, of course —but Google's war on self-hosted email deserves its own article, so that will be for another time.

And we finally get to these days. Just as RSS feeds are making a comeback and users are starting to grow skeptic of the corporate silos, Google makes another run to kill XSLT, this time using the WHATWG as a sock puppet. Particularly of note, the corresponding Chromium issue was created before the WHATWG Github issue. It is thus to no one's surprise that the overwhelmingly negative reactions to the issue, the detailed explanations about why is important and useful, the recommendations that instead of removing it browsers should move to more recent versions of the standard, and even the indications of existing better and more secure libraries to base such new implementations on, every counterpoint to the removal have gone completely ignored.

Still, the negative reactions were so extensive that the issue has been ultimately locked —particularly when people started pointing out that «we don't have enough resources to spend on this» was a completely idiotic excuse from billion-dollar companies, or even from smaller enterprises like Mozilla that apparently have enough money to waste on features nobody wants like LLM chat integration: this has ultimately confirmed that the purpose of the issue was never to actually discuss whether or not XSLT should be removed, but only to provide a flimsy excuse to pretend the removal was driven by a consensus rather than a top-down directive from Google.

The only true sentences stated by the Googler responsible for this issue were that browsers have been stuck with an obsolete version of XSLT for over two decades, and that the implementations they (Google and Apple) rely on has some security issues. The Googler in question also conveniently omitted several other important facts.

For example, he omitted that two new major versions of XSLT have been released since this technology was first implemented in the browsers: XSLT 2 in 2007, and XSLT 3 in 2017. This means that when Google first proposed to kill XSLT, a newer, considerably more powerful version of the standard had been released for six years already. And already at the time people were pleading for browsers support to be upgraded to the new version.

It is thus not by chance or by lack of resources that browsers are stuck with the 1999 XSLT 1: it has been an intentional choice against the users' will since at least 2013, the year we already mentioned as the turning point for the centralization of the web. XSLT has been intentionally boycotted by Google, Apple and Mozilla: using the excuse that it is not widely used today, after decades of undercutting any efforts in adoption, refusing to fix bugs or even to provide meaningful errors to assist in debugging related issues, is a complete mockery of the victims of these policy.

The Googler also omits to mention that both Google's and Apple's XSLT implementation (not Mozilla's, that developed their own) relies on a set of free-software libraries whose maintainer has recently undergone a bombardment of borderline abusive issue reports from the characteristically extractive corporate exploitation of , with requests to provide professional services without actually paying for it in any way. Let's repeat that again: we're talking about billion-dollar companies that have been exploiting the labor of free-software maintainers, demanding a preferential treatment at no cost for them, limiting their efforts to finding bugs, without raising a finger to actually fix them —almost as if the primary intent was to find excuses to expunge the library rather than working to improve the commons. (And this is before even going into the irresponsible way in which these libraries were being used.)

But of course anyone questioning the motives of the corporations controlling the WHATWG or pointing out the abundance of resources they have, and how these could easily be spent in bringing XSLT support to the XXI century instead of being spent in user-antagonistic features, is “off topic” and “in violation of the code of conduct”.

In the end, the WHATWG was forced to close down comments to the Github issue to stop the flood of negative feedback, so that the Googler could move on to the next step: commencing the process of formalizing the dismissal of XSLT

And yes, that issue is a goldmine if you're looking for examples of abusive behavior: in classic gaslighting DARVO, the currently last comment by the Googler that opened the issue in the first place is truly a masterpiece. And since I can't reply there, allow me to reply here:

I'm just one engineer and I don't have unlimited budget;

You don't. Google does. Ask for a bigger budget. Needing to fix a security risk that you claim to be so significant should be a pretty good excuse to get more resources. If you fail at that, that's your problem, not ours, and it is not, in any way, a valid reason to kill the open web. The security issues are not in the web standard, they are in the implementation you are using (or more specifically: leeching off).

I'm just trying to do my job.

That excuse didn't fly at the Nuremberg trials either.

I care a lot about the health of the overall web.

Patent lie. If you did care, your only priority would be how to keep XSLT, since that is the only solution that doesn't affect the millions of users that still use XML and XSLT today. But you don't actually care about the health of the web, you only care about the bottom line of your employer. Flash news, you'll be fired like everybody else regardless of how hard you suck up to them.

I do want to find solutions to real problems,

There's nothing to find. The whole comment thread is full of solutions to the real problems. You just don't like the answer. You've been told repeatedly that the security vulnerabilities are fixed by either fixing the bugs in the library Chrome is using, or switching to more robust, modern libraries like xrust or xee.

So you have already found the solutions. But those aren't the solution you wanted to find, because those “problems” are only excuses to finally do what Google has been trying (so far unsuccessfully) to do for over ten years: kill XSLT.

and I want to minimize the pain folks are feeling about this discussion of XSLT removal.

Aha, and here we see the slip: the discussion was never on whether it should be removed or not. It was always about: we are going to remove it regardless of what you think; prepare for the pain.

But it's important to remember that ordinary users that fall victim to security vulnerabilities also feel pain, and I'm trying to minimize that too.

There's a very simple solution to solve the “pain” of such users, and as everybody has told you, that's to fix the library or switch to a different one.

I proposed some solutions to the concrete use cases I heard in this issue. If there are still gaps, I'd like to work on closing them.

Sorry, but that's complete bullshit. There is literally no way to solve all the use cases that XML+XSLT can and is being used for.

It's too bad we can't have that discussion here - I'm guessing we can't re-open this issue for outside comments, due to the overall tone of past comments. Either way, from now on, I'll only be responding to technical conversations, and ignoring the rest, for my own sanity.

That is hilarious. You never provided one technical argument for the removal of XSLT. Literally not a single one.

«The XSLT library we're using is buggy» is not a technical argument for the removal of client-side XSLT support from the specifications.

«XSLT is “only” used by millions of people for tens of millions of pageloads per month» is not a technical argument for the removal of client-side XSLT support. The metrics aren't even a valid excuse by Chrome's own rules on how to read telemetry data.

«Trillion-dollar company does not have the resources to fix or change XSLT library», aside from being complete bullshit, is not a technical argument on why you shouldn't fix or change XSLT library.

The comments on the issue have all been very technical until you started spouting bullshit and refused to provide technical answers to the technical comments you had received. You and your colleagues are the only ones to have never made a single technical argument to defend your position that client-side XSLT should be removed. Because that was never the intent in the first place: it was only ever about «what do we want to replace our ancient, buggy XSLT 1.0 support with».

And the truth, regardless of how much you don't like it, and how much you try suppressing it, is that there's only one answer to that, and it's: a more modern client-side XSLT support.

An update is due here as a quick response to this ridiculous defense of Google and the engineer that proposed the change, a defense coming from an engineer working for Igalia, the same company that implemented MathML for Chrome 10 years after Google removed it —hardly an objective third party.

First of all, Apple and Mozilla agreeing with something Google has been pushing to do for 10+ years is not the defense of Google and Mason Freed for the change that you think it is. Apple is no less than Google is, and Mozilla is controlled opposition.

Trying to justify this agreement as an opportunity to “slim down their codebase” is also complete bullshit, when those same engineers keep pushing to add features and JavaScript APIs that nobody needs or wants, or sometimes even that users actively reject. It was a bullshit excuse when Mozilla used it to not restore support for the MNG file format and it remains a bullshit excuse today.

The budget excuse doesn't fly either. As I've already mentioned above, if the XSLT support in Chrome is so buggy that it is considered a high security issue, the correct approach is to focus resources on fixing that security issue, not to change the spec so you can justify removing the buggy component. If the Chrome, Safari and Mozilla teams don't have enough resources to fix their security issues, nobody should be using those browsers. And no, changing the HTML spec to remove any mention of XSLT so that it can be removed from the browsers does not count as “fix the security issue”. Nobody at Google or Apple or Mozilla has ever proposed to remove JavaScript support from the browser any time a new JavaScript-related security issue comes up, have they? Then don't do that for XSLT either.

If Mozilla really cares about slimming down their Firefox codebase, they can start by removing all the ML features they've been forcefully shoving down our throats against our will. Then we can start talking about “limited resources”, “slimming the codebase”, and even “user security and privacy”. The same goes for Apple and Google. For the umpteenth time: “limited resources” cannot be used as an excuse when the developers keep adding features nobody wants and refuse to fix things people complain about: that's not an issue of resources being limited, it's an issue of resources being misallocated. It's a policy issue.

Also, telling us about the removal procedure is meaningless, when Google has already made abundantly clear that they do not actually listen to user feedback when their policy requires it (Manifest V2 removal anyone)?

And your TL;DR gets an extremely important thing wrong: no good reason to remove XSLT support from browsers has ever been given. Literally none. Literally ever. The thread was locked out because people were pointing out that the excuses given were all bullshit, and bullshit they are.

Why it matters

When the XML specification was released in 1998, it gained traction very quickly, despite its increased verbosity, because by losing some of the flexibility of SGML (the overreaching specification of which HTML was the most famous incarnation1) it favored disambiguation and simplified parsing of documents of arbitrary kind. Combined with XSLT, it allowed documents of any kind to become “Internet ready”, and most importantly ready for the World Wide Web, helping driving the WWW towards its designed goal of a «universal linked information system».

Although the benefits of XML and the transformative power of XSLT mostly caught the attention of professionals in a variety of fields, at the turn of the century their flexibility reached also into the more general population of web users through the specific incarnation of RSS and Atom web feeds, which allowed users to remain informed about news and updates on their favorite websites without constantly “making the rounds”.

RSS and other XML-based technologies such as Pingbacks were the backbone of blogging, the distributed social network that characterized the first decade of the XXI century.

With blogging common and distributed across multiple platforms, the possibility to aggregate information from disparate sources, and still see it presented as a regular web page, across browsers, without any need for scripting, in a time where implementations were slow and (thanks to Microsoft, intentionally) incompatible with each other, was seen as a clear win.

Despite the efforts by Google to kill it since 2013, the RSS format remains an essential component of an open and independent web, still in widespread usage both server and client side: there's an estimate 500+ million websites using WordPress, and they all feature RSS feeds, even when not properly advertised; most if not all Fediverse platforms also offer RSS feeds, and some (e.g. Friendica) can also import them and thus work as aggregators; and possibly most important, RSS are the fundamental component of podcasts («it's not a podcast if it's not RSS»), a multimedia distribution format with hundreds of millions if not billions of users worldwide.

As already mentioned, it's now seeing a resurgence as people have started realizing how catastrophic for the web was the centralization driven by GAFAM during the second decade of the XXI century (even though too many have failed to learn the correct lesson, and have just jumped from one Nazi bar to the next, or have fallen for the cosplay of federation because it's shinier than actual federation).

XSLT is an essential companion to RSS, as it allows the feed itself to be perused in the browser (unless, of course, the browser makes the extra effort to prevents you from visualizing it at all, like Firefox does). This allows sites with hundreds of feeds to use the feed itself (styled with XSLT) as index page (example), reducing hosting and bandwidth costs. And of course it can also be used to style any other “standard” XML document that may be found on a site: for example, I have recently discovered thanks to @aslakr​@mastodon.social, that WordPress provides a default XSLT stylesheet for its sitemaps (curiously, apparently not one for its web feeds, though? Of course you can still roll your own and plug it in the right place.)

As pointed out by @lucidiot​@snug.moe, XML is extensively used in digital humanities (and many other fields), and the TEI offers an extensive set of XSLT stylesheet to transform common TEI markups into a variety of formats, among which XHTML, which would allow direct visualization of the XML documents.

And that's just the beginning: as I've shown on this same site, it's possible to use XSLT to plot XML data, and in general to produce rich, complex documents without JavaScript, and again with potentially significant reductions in hosting and bandwidth costs.

Bonus points: it seems that the horde of LLM scrapers that are causing troubles all around have some difficulties with general XML, so switching to XML+XSLT could actually work for self-protection.

Remember AMP? If you really wanted to keep shipping the usual tons of useless crap on desktop, but not on mobile, you could put the actual content in an XML file, and then provide two separate, trivial XSLT stylesheets, one to transform it into the usual bloated desktop page, and one to transform it into the stripped-down (and less bloated) abomination that is AMP HTML —which would have come in handy when Google introduced the requirement that the AMP and standard page had to present the same content. But then again, why even ship those tons of useless crap on desktop in the first place?

And to be honest, HTML templates look thoroughly unimpressive compared to XSLT. Worse, why are people reinventing templating without even so much as mentioning XSLT? Anything that discusses templating for HTML without a direct, explicit comparison with XSLT should be automatically disqualified as not being well-researched.

But most importantly, even if you personally don't like XML and/or XSLT, why are we letting Google decide what is acceptable and what is not (and most importantly, not anymore) on the World Wide Web?

Dorian Taylor on XSLT

If you've read this far, I would encourage you to also read the passionate defense of XSLT by Dorian Taylor on the Github issues that Google is using as an excuse to kill the standard. In case GAFAM gets touchy and decides to purge it, I'm taking the liberty to reproduce it here for archival purposes (with permission):

I have been using XSLT since it was in beta and the only browser that implemented it was MSIE 5.5. I designed and implemented an internationalized content pipeline using XSLT and DocBook at the job I had from 2002-2005. Since about 2007 I've been using XSLT regularly on the client side to transform (X)HTML into itself (as well as SVG and Atom), because it excels at bolting presentation markup onto plain semantic markup, and thus makes for an extremely lazy templating language that exists separately from the JavaScript ecosystem. I use it on my own Web properties, and I use it on projects (I mainly do intranets). I have made libraries for seamless transclusion and querying RDFa, and I use those on projects (like Sense Atlas, a nascent knowledge graph product I'm working on, and Intertwingler, the application server that powers it).

Why I still use XSLT:

  • it's a standard
  • it's fast (at least nominally)
  • it's declarative
  • it's orthogonal to JS
  • it can mix any number of back-ends (because it's a standard and operates over standard inputs; I regularly use it to mix static and dynamic resources on the same page)
  • it can only operate over information you give it (modulo zero-days, apparently)
  • it operates over wholes, i.e., it doesn't stitch together markup as text but rather operates over intact DOM structures.

The first and last points are probably the biggest reasons I still use it, and I suppose the latter may need some unpacking. An XSLT stylesheet is a well-formed XML document and only operates over well-formed XML documents, and (unless you put in the effort) is only capable of producing well-formed (X|HT)ML documents. So you have a validity check baked in at a very low level. Every other templating language I've seen, going all the way back to server-side includes (with one esoteric exception), seems to not be shy about chopping up the syntax of the target language.

I anticipate the knee-jerk reaction to this is "so what?". Why should you care whether your template language breaks the syntax of the target? The tooling can compensate and it's intact when you render it. I mean, I guess? But then you need more tooling when otherwise an off-the-shelf validator would do. But that I think is not even the main differentiator.

The key difference, and why I've stuck with XSLT for almost 25 years, is cognitive. When I make a hypermedia resource (I am deliberately not using the word "page"), I think about it as a discrete, atomic whole. I can consider that object (and the server-side code that generates it) in isolation. It loads in the browser and is well-formed and intact and navigable. Then I can think about applying transformations to that object, and/or composing related objects together, as a separate act. When I write an XSLT template, I think about it like a function (in the mathematical sense) rather than a procedure. I see my job as not to stitch together fragments of markup but to describe the node tree that results from an input tree. When I look at so-called "modern" frameworks, I (still) don't see any of that.

The reason why the implementations are riddled with CVEs, in my opinion, is because of neglect. I am old enough to remember when HTML5 was competing with XHTML2✱ as the proposed next-generation HTML standard. It turned out that the pedantry of the XML parser was not only reviled by developers (and remains a source of confusion for users if they hit a bad patch of it), for markup it was actually unnecessary. Tastes changed, and people moved on. The browser vendors keep the parsers around, but they demonstrably put as little effort into them as they can get away with. (The biggest shortcomings of XSLT 1.0 were fixed in 2.0—in 2007—but of course the browsers never implemented it.)

✱ XHTML2 actually had some really good ideas (like transclude all the things), but its mission (something like "how do we make the best XML-based hypertext markup language") was ultimately wrong-headed. I am also old enough to remember, however, that one of the central arguments for HTML5 (now just "HTML", of course) was not breaking backward-compatibility.

My proposal, then, is not to scrap XSLT, but to rehabilitate it. When it first shipped, XSLT was a solid, open-standard solution to the bog-standard problem of generating presentation markup. How many times has the wheel of Web templating been reinvented for this framework or that? Where is the Open Web successor to XSLT? How about…XSLT?

At its core, XSLT is terrifically powerful, especially its latest incarnation (which, incidentally, can operate over JSON). There are, of course, challenges:

  1. I would say problem number one is the syntax. XSLT is an extremely bulky, chatty language. Without syntax completion in your code editor you'd never get anything done. But, there are precedents for ameliorating this, like the compact syntax for RelaxNG, or the Turtle or JSON-LD syntaxes for RDF.
  2. XSLT only operates over XML (except of course for 3.0 which made an accommodation for JSON). Well that's simple, bump XSLT to 3.1 and spec out how it should operate over HTML DOMs (case-insensitive tags, whatever), as well as an invocation hook analogous to the XSLT processing instruction.
  3. Namespaces: Apparently people hate them? This is something I have never understood (because if you don't use namespaces you just end up reinventing them badly), but whatever, fine. You won't need namespaces in your XPath anyway if you're just transforming HTML.
  4. XPath: I would actually put money on the likelihood that CSS selector semantics can embed fully into XPath, especially given that XPath 3.1 itself is extensible (worst case scenario is you cheat and just make a css function).
  5. Debugging: currently sucks. This I would chalk up to the same neglect as the CVEs.

It would be eminently feasible to make a "SWeT", Standard Web Templates:

  • easy, neat, declarative syntax, comparable to Sass or RNC (I sketched one out in like 2019)
  • isomorphic (or at least injective onto) XSLT 3.0 (3.1?); compiles to it
  • wouldn't have to touch namespaces or even XPath if you didn't want to (use CSS selectors instead)
  • still capable of existing outside of the JS ecosystem, but can be accessed from JS/DOM just like XSLT 1.0 can

Now I can imagine somebody saying well I can go off and do that anyway; there's a reference implementation of XSLT 3.0 I can compile against (written, actually, by the spec's author), etc etc. I think that kind of misses the point of having a standard templating language that you can rely on being baked into every Web browser. At least, I suppose, until they rip it out.

So I guess my ultimate question is, is there truly no appetite for a standard language for transforming markup, a thing we all have to do, on every project, all the time? A thing that for lack of a standard, locks us into this or that framework, or stymies casual system heterogeneity? A thing that would make it even easier to build the Web? Seems like a sensible idea, doesn't it?

(Again: source, for reference.)

And now, onwards to what's ahead.

What we can do about it

Make yourself be seen

The first step is to actually use XML and XSLT. Visit sites that use them. If you have your own website, seek out opportunities to rely on this tech. If you don't know how, there's apparently a growing number of tutorials around, both in text and video form. Stealing the links to the text tutorials from the above link, we have for example.

(notice a pattern there?)

And styling RSS is just the most common use-case of XSLT. You can use it to plot data and create sparklines, like I've done. You can use it in place of server-side includes. You can use it to render tabular data you have in XML form. You can use it to adjust the (X)HTML structure of your documents to compensate for CSS limitations.

By the way, if you can find new and interesting applications of XSLT in browsers, do let me (and the world) know (see bottom of this article for information about how to contact me on the Fediverse).

Make yourself be heard

Complain. Complain. Complain. Comment on every relevant issue. Vote against the issue in the issue trackers. Let the browser developers know you are affected.

If (or when) the changes still pass, open new tickets. Demand that XSLT support be reinstated. And while you're at it, demand that it get upgraded to XSLT 3.0 at least. And demand that it be enabled for plain HTML.

Voice your opinion on social media. Post about it. Tag the social media profiles of the WHATWG, of Google, Apple, Mozilla, Microsoft, of Chrome, Safari, Firefox, Edge, and let them know that such a change will not be accepted.

Do not let the lies prevail. The choice to suppress XSLT is not due to technical reasons, and it's not due to lack of resources. It's entirely a policy choice intended to obstruct and limit the expressivity of the open and indie web, and this needs to be remarked to anyone believing otherwise.

UPDATE for 2025-10-10: some Firefox developers have put up a page for anyone with a GitHub account to vote on their preferred features/proposals. Support for XSLT3 is one of them. Voting for it is a way to make yourself be heard. (JPEG XL support and MathML and SVG improvements are there too. To now one's surprise, they are my top-ranked issue. And if RSS and HTML+SMIL were there, they would be too.)

Build the alternative

If (or when) the changes pass, our best option is to push through with a polyfill like the SaxonJS mentioned by Dorian Taylor. It will not be as efficient as a native implementation, it will not be as fast, and it will not be enough to allow clients to open and visualize XML files directly, but it will allow us to build the case for a return to XSLT as a significant web technology, and become an important instrument in pressuring vendors for new native implementations, not unlike how MathJax has been a useful bridge to native implementations of MathML.

For pure XML files … maybe an extension? This is most likely possible for Firefox, but I don't know enough about the more restrictive rules implemented in Chrome to tell if it would be possible or not. But of course, even if such an extension was possible today, there is no guarantee that Chrome won't push for another change in the API to disable it, like it did with ad blockers.

Who knows, it might as well be that the Streisand effect on this umpteenth attempt by Google to kill XSLT will be the chance for its rebirth. At the very least, there is now an open issue with the WHATWG for the adoption of XSLT 3.0 that hasn't been closed yet (the excuse being that discussion is “civil”, in the sense that nobody has yet suggested that it can't be done because Google doesn't have enough money, so nobody has called them out on the bullshit). Of course, it's quit likely that it will just be completely ignored for the next 10+ years, just like the comments on the 2013 XSLT removal proposal.

Afterword

With as much I hate Microsoft, its anticompetitive practices, and the way their Wintel monopoly has stymied software and hardware development, killed companies and destroyed innovation in the desktop and workstation space, one thing I can say about the First browser war is that —at least while it was ongoing— it led to a lot of innovation in the web space. Microsoft were the first to implement client-side XSLT, they were the ones that opened the gateway to AJAX through their proprietary XMLHTTP ActiveX control that was reimplemented into other browsers as the XMLHttpRequest object, and they were the ones that tried to add SMIL to (X)HTML through the TIME extension, which I wish hadn't failed the way it did (we would have to wait nearly another decade before a limited subset of the functionality would finally get into HTML via CSS animations).

It's possible that this was at least in part due to the fact that, as it has been said, Microsoft didn't “get” the Internet, but I suspect that the primary reason was that there was some actual competition going on —competition that since the creation of the WHATWG has been replaced by what is, for all intents and purposes, a cartel.

The intent to bypass the W3C for some decisions did have some merit at the time of creation; looking at the the document whose rejection led to the creation of the WHATWG, for example, we see among the design principles:

Well-defined error handling
Error handling in Web applications must be defined to a level of detail where User Agents do not have to invent their own error handling mechanisms or reverse engineer other User Agents'.
Users should not be exposed to authoring errors
Specifications must specify exact error recovery behaviour for each possible error scenario. Error handling should for the most part be defined in terms of graceful error recovery (as in CSS), rather than obvious and catastrophic failure (as in XML).

which I can't disagree with. On the other hand, it's also clear that two other design principles, backwards compatibility and open process, have been consistently violated since Opera dropped out (oh, how prescient I was in that article!) and the WHATWG was taken over by Google and its lapdogs (Mozilla) and frenemies (Microsoft and Apple).

Today, I'm left wondering if the developers of browsers like Servothe engine born out of the Mozilla experiments that were cut out (with the entire development team fired) at the start of the COVID-19 pandemic— or Pale Moon would even be accepted into the WHATWG today, since they could (and at the least the latter would) happily throw a wrench into the whole “fake public feedback” mockery we've been subject to this time.

Name and shame

The engineers working on these proposals should be ashamed of themselves. The names I could gather from the public discussions are:

If you ever hear any of these blabber about open web, interoperability and standards, know that they are lying through their teeth.

And if any of you happen to read this: fuck you.

Made the news (and other related articles)

I've apparently “made the news”.

I have read the comments (yes, I know, you should never do that) and it's curious that those that didn't like or agree with the article can be grouped in three sets:

unconvinced by my timeline
these are commenters that disagree on my list being enough to prove that Google is out to destroy the open web; that's OK, the list isn't there to prove anything, it's just to remind people (or inform youngsters who might not remember those days) about some relevant (and a couple less relevant) events in the last decade-plus that have significantly shaped the web; the list isn't even exaustive insofar Google is concerned (anyone wants some user stylesheets?), let alone all the crap, failed promises, rug pulls and abuses committed by the rest of the GAFAM crowd —the only reason I'm singling out Google here, and on those events in particular, is because the focus is on XML, XSLT, and the WHATWG takeover and unwillingness to listen to what users have to say;
disagreement on the assessment
these are people who disagree on some of the events I reported being bad for the open web, or aimed at encircling it; so far, from what I see, these comments come mostly from Googlers or ex-Googlers; well, I'm sorry to burst your bubble; I'm sure the engineers working at Google have always had the conviction to be doing Good Stuff™ for the benefits of all; but see, that's kind of the problem of a lot of Big Tech employees: the lack of attention to the implications of what their brilliant ideas are going to be used for —and sometimes, possibly, even the time to stop and think if the “innovation” is even needed in the first place, or you're forgetting what older, well-established but less reknown tech can already do; a few others don't seem to be (ex-)Googlers, but have a pet peeve with this or that item being included in the list; all I have to say to them is: if that's your only gripe with the article, well, there's hope for you;
people that don't like XML
a lot of people seem to be in this camp; and I have an important message for them: you're missing the point; this isn't about whether XML is nice or not, it's about the fact that it exists, it's in use, and it's a powerful tool that web developers can and do use; you prefer to ? That's fine, and if anything that's one more reason to push browser developers to implement newer XSLT versions that support JSON too; or if you prefer: the question isn't whether or not XML and XSLT are worth saving; it's whether or not you want Google to define what is allowed on the World Wide Web. (Yes, I have edited the post to make this more clear.)

There's also another interesting write-up about this same recent attack on XSLT, with a very different perspective than mine, from @ansuz​@social.cryptography.dog. I highly recommend it.

By the way, you can let me know directly about your thoughts about this article by commenting on this Fediverse thread.


  1. it has been pointed out in some comments that HTML is not really an application of SGML; this is debatable: TBL intended it to be one, and even if there were some significant divergences in the first iterations, the HTML 4 spec, which was the standard definition of HTML in the time I'm referring to, actually defined it as a compliant SGML language —even if browsers never adopted it as such. ↩

Sparkling wok, episode 4

Sparklines, how they were intended to be presented.

And it's done. It's over. The efforts to write an stylesheet to generate , and I mean actual sparklines, from an description of the activity in the Wok has finally brought fruit. The stylesheet I've discussed a couple of days ago has matured and is now able to generate graphical sparklines which are richer in information and lighter in weight than the previous pseudo-sparklines generated via a combination of shell scripting and awk, relying on the limited set of Unicode block characters for visualization.

You can now see it in action at the root index page as well as in any other index page (such as the Tech column index). If you want to compare this with how it used to look, here's a backup of the website index from the Internet Archive before the update that brought “real” sparklines into play.

The upsides

Although the final look is subject to change, there's already a few clear differences that jump to eye.

First of all, the sparkline includes both time series: the commits, and the publication dates. With the pseudo-sparklines, I used to generate both, but limited display to the commit series, because the screen estate that would be occupied by both series was excessive. Now the plot can be combined in a single double-sided sparkline.

I have tried to preserve the “year highlight” effect, adapted to the new format. I'm not entirely satisfied the result, so this may be subject to change, but at the moment it at least manages to convey the same information.

The time series is wildly different in some cases because of the start date: the old commits-series pseudo-sparklines started on August 2010, which is when I started working on the website structure a few months before the Wok went live, but the new “global” (root) time series goes all the way back to 2004, as it includes the articles I “dug out” from the archives of some of my past blogs. I will probably add markers to the longer time periods to show when activity started.

The single most important change, however, is that I can now afford to show the sparklines for each column in the index page: while the old pseudo-sparklines wrapped, taking up a lot of vertical space in single-column displays, the new ones plots designed to fit on a single line, stretching (or contracting) as needed. The root index can thus fit both the “global” sparkline, prominently, and each individual column sparkline under the corresponding heading —which was always the intended way to present these sparklines: I'm finally getting close to what I wanted in the first place.

The cost

Let's talk about bytes, both on disk and on the wire. First of, the stylesheet. Adding the features needed for the new sparklines has made the file grow in size, from barely more than 21KiB to well over 37KiB: that's almost 16KiB of extra weight, but then again, no effort has been made (yet) to minimize its size.

The sparklines themselves are a different matter: as previously mentioned, the old pseudo-sparklines were on average 30KiB each, for a total of 344KiB. The new sparklines are generated (all by the same stylesheet) from wrappers that are around 368 bytes long (4KiB total), loading data that is on average less than 5KiB (median 3.7KiB, the global index is an outlier at 14KiB), for a total of 54KiB for all series. The generated SVGs, on the other hand, are on average over 88KiB each (median 83KiB) for a total of over 973KiB.

So, on first load, the front page of the Wok (which itself is almost 1.2MiB) now loads nearly 3 times more bytes for the sparklines (from 36KiB to around 97KiB), but manages to show 11 times more information (sparklines for 10 columns plus the global series) with images that would have taken 10 times as much if generated server-side.

When any of the index data changes, the stylesheet itself is likely to be still in cache (unless modified since) scrapping its 37KiB from subsequent loads.

Even in the individual column index pages there are benefits, since the total size of the stylesheet plus the data is about half that of the generated SVG, even if 50% larger than the server-generated textual pseudo-sparklines. And of course, the cost of loading the XSLT stylesheet gets amortized over multiple page loads (on the site), doubly more so if there are other sparklines in the page.

The downsides

The main downside of this approach is that, since the SVGs are created client-side on page load time, users will experience a “repaint” as the page needs to adapt its layout to the final images. I haven't found a way to avoid this effect, but I'm open to suggestions.

There is also a matter of speed, particularly for larger sparklines: this is not an issue with the choice to use XSLT to generate the images client-side, but a fault in most User Agents, that have undeservedly neglected this aspect of the web platform.

This is an issue that presents itself in three ways.

First of all, the XSLT implementations in the three major engines (Google's Blink, Apple's WebKit, and Mozilla's Gecko) is over 20 years old, not particularly optimized, and bugged by the usual billion-dollar companies not contributing to the maintenance of the free software they rely on.

Secondly, their implementations, being old, are limited to XSLT 1.0, which is quite limited, requiring very complex implementations of features that are either built-in or trivial to implement in more modern version of XSLT.

Thirdly, their implementation being limited to XSLT 1.0 also requires the stylesheet developer to be exceptionally competent to write efficient implementations of the missing features —and I'm not a particularly competent XSLT coder (although, given the limited usage XSLT has, I wouldn't be surprised to find myself “high” in the list of web developers in general, XSLT-wise). Of course, the thing is that I wouldn't need to be particularly competent if the browser developers invested into bringing their implementations up to the latest language revision, instead of wasting inordinate amounts of development resources chasing user-hostile features such as ad-blocking-blocking or shoving “artificial intelligence” features down everybody's throat (from which we can surmise that the issue isn't lack of resources, but misplaced priorities).

Conclusions

I'm exceedingly satisfied for having managed to finally get sparklines in the Wok that look more or less like I wanted them to look since the inception of the idea last year. I will probably work on the styling some more, and probably add a few more features, but I can say that I'm much closer to my objective today than I was even just a month ago.

has confirmed itself to be an exceptionally well-suited language for web development, when even an amateur like me has been able to achieve such results in a constrained environment like the XSLT 1.0 that self-proclaimed “modern” browsers restrict the users to. Given what I've been able to obtain, and how relatively easily, I can't even begin to fathom what this language can do in the hands of an expert, or what could be achieved by anyone if more modern versions of it were widely available.

This experience of mine goes to confirm that the WHATWG antagonistic attitude towards in general and in particular poses a direct threat to the open and indie web, which would instead benefit from more recent version of the XSLT standard to be widely available in order to minimize server and bandwidth costs while providing a richer experience to their users and readers.

The WHATWG is trying to play the metrics game to justify deprecation of XSLT, conveniently forgetting that with the current size of the web (both server and client side) metrics as small as 0.01% for the dominant browser hide over a million people affected, when accounting for privacy-conscious users (which are also the ones more likely to visit less common corners of the web) and users of other browsers.

Do not let them win. Take this as an opportunity to play with XSLT yourself, and make it a staple of your indie website.

Keep the signal alive.

[Global sparkline for the activity of the Wok]

Plotting sparklines with XSLT

Using XSLT to generate sparklines, Wok style

As promised in my previous article about using XSLT for plotting, and as anticipated in the relative edit, I've been working on generating with . And by sparklines this time I mean actual sparklines, not the Unicode mockups I've already introduced to this site: I mean high-resolution (vector!) plots like this [An interactive sparkline of the language distribution over the years] representation of (as usual) the language distribution over the years.

In contrast to my previous XSLT plotting efforts, this time I want the results to be flexible enough to use to create sparklines about anything: this means in particular no hard-coding of the “plotting keys”, and provide ways for the user to customize the plot (at least to some degree). This does result in a larger XSLT stylestheet, which would be counter-productive in its use to plot a single sparkline (in terms of economy of space and bandwidth), but quickly amortizes over multiple sparklines, for example when showing the sparkline for all languages [You're probably getting bored to see this plot] as well as the ones for Italian [Distribution of Italian articles over time] and Latin [Distribution of Latin articles over time] individually, or the combined Italian/English sparkline [Distribution of Italian and English articles over time] —which I'm doing here only to showcase some of the functionality of this stylesheet: the possibility to select how many and which lines to plot (with automatic scaling based on the maximum of the given series, as clearly noticeable by the Latin sparkline [Yep, here it is again] whose peak is essentially invisible in the “everything” sparkline [Stop it already!]), and the possibility to select the colors (if your UA is set to prefer dark mode, you'll notice that the lines in the combined Italian/English sparkline [You might have noticed some repetition here too] are lighter than the corresponding ones in the “everything” sparkline [Hey look, another copy of this one!]).

To get an idea about the convenience (or not) of this approach, let's have a look at some numbers. At the time of writing, the XSL timesheet is already above 21KiB. The most complex sparkline [Hey look, another copy of this one!] is barely more than 10KiB. The other sparklines are even smaller, since the size clearly depends on the number of lines in the plot (and some choices such as whether or not to draw points at null values). However, all the sparklines presented here together add up to around 35KiB (or at best around 20% less when omitting null value points). Even considering that all of them being derived from the same dataset is an exception rather than the rule, we can see how quickly the (byte) cost of the XSLT gets amortized (and that's before putting any effort into minimizing the XSLT size).

The next step will be to introduce proper sparklines to replace the pseudo-sparklines based on Unicode blocks currently shown under each index page (here's for example the root index, and the one for the tech column) and later enhanced with some metadata popup.

Moving from the textual pseudo-sparklines to true sparklines will be a big change.

For example, one thing pseudo-sparklines can do, but true sparklines won't, is to wrap around when they are too long: if you visit any of my index pages from a mobile phone, for example, you'll probably see the pseudo-sparkline take something between 5 and 10 lines. This was never intentional, but it's an interesting side-effect of the textual description of the sparkline. In graphic form, the sparkline will (at most) fill the whole line, and grow/shrink based on available screen estate. I am not entirely convinced this will be a superior choice, but for sure it'll truer to the spirit of the object. It will also mean that I won't have to worry about screen estate to show both the commits and dates sparklines, even in the same plot, and I will be able to add the per-column sparklines at the root of the Wok.

It will also be interesting to see how much space usage will change. Currently, the auto-generated pseudo-sparklines take around 30KiB each, and all 11 of them together (10 columns + 1 root index page) means upwards of 340KiB extra , integrated directly into the index pages because static pages do not have a way to include external HTML fragments (although this is actually possible using an dialect of HTML, and an appropriate stylesheet).

Moreover, the actual content of each index page changes whenever a sparkline changes, and since the “sparkline update” runs unconditionally, this means that all index pages (even those that would be unaffected) are regenerated each time I publish anything anywhere.

So, in the current setup, index pages are 30KiB larger than they need to be, and are regenerated more often than they need to be. True sparklines, on the other hand, would simply be included via an unchanged object embedded in the page —and the only thing that would change is the data loaded by the skeleton that will produce the sparkline via the stylesheet.

All in all, I expect the change to provide more visual information (since it will be possible to visualize both the commits and dates timelines) with less data both on disk and on the wire.

The reason I don't actually know yet is that moving “up” from the languages-per-year plots I've been working lately is not as trivial as I would like it to be. The biggest challenge will be the switch from yearly to monthly data.

Since browser development is in the hands of people that apparently despise XML, all of them are stuck on 1999 tech even though XSLT has had some extremely significant improvements in the following 20+ years. Among the things that I would have available if I could use more modern XSLT, but cannot because browser development is controlled by user-hostile companies, are date/time manipulation functions.

So I'll have to roll my own, which will be time-consuming (although I will limit myself to what I actually need) and lead to an unnecessary growth in size for the stylesheet. (This, for anyone who's counting, isn't a downside of using XSLT, but a downside of browser developers refusing to move forward with more modern versions of it like they have instead done with all other web tech.) And I still expect the combined XSLT plus XML data to weight less than the rendered SVG sparklines: we're talking about over 300 data points for the smallest index sparkline already, and over 400 for the longer ones, with a guarantee for growth, after all, which is an order of magnitude larger than the data points in the simple sparklines I'm showing here.

Despite the uncertainty for what's to come, I felt it was important to push this update: seeing those first sparklines pop out of the page has been one of the most satisfactory moments in my recent life. Truly a Frankenstein (Frankensteen) moment.

Plotting (to save) XSLT

Using XSLT to transform XML data into SVG plots, Wok style

GAFAM's latest attack on the open web comes in the form of a proposal to deprecate XSLT. I've talked about in the past here, discussing options for the sparklines and (in Italian) to show a simple website entirely designed around its power.

I profoundly dislike the idea of such a powerful feature being removed from browsers, and I will write elsehwere about why this is a threat to the open web, an attack against the and an attempt to undermine the more recent efforts to push against the centralization of the web.

What I want to talk about here is about some recent efforts of mine to introduce XSLT in the Wok as an example of how websites can benefit from its adoption. This will be a long process, and my hope is that in the process I'll even manage to get rid of some (optional) usage, particularly in the generation of the “index pages tables of contents”. For the time being, what I've done has been to introduce a “plotting stylesheet” to reproduce the plots from my recent article (in Italian) about the language to write in, processing directly some data client-side to produce the relevant images.

The original images were produced via some server-side awk scripts (linked from the above mentioned page, for the curious) that need to be run after updating the stats whenever I want to regenerate the plots, and the idea was to try and get the same results (or possibly improve on them) by only needing to update the statistics (written this time in XML form).

By the way: even though it may seem that I'm only doing this to spite the engineers working on the browser engines, or to skew the stats about XSLT adoption, that's not really the reason why I'm doing this: as I've mentioned already, I've been pondering about doing this for a long time, and the only thing that the umpteenth attempt at sabotaging well-established standards (see their attempt in 2013, and the related attempt to deprecate SMIL in 2015; notice how they always go for it in the summer, too) has done is to push me to finally act on my intentions.

The results are in. The “articles per language per year” plots are:

A divergent bar graph showing the number of articles per language per year.
Languages of the Wok over time (2004–2025)
Languages of the Wok over time (2004–2025), XSLT version (link)

and the “language percentages per year” plots are

A stacked bar graph showing the percentage of articles per language each year.
Percentages of the Wok languages per year (2004–2025)
Percentages of the Wok languages per year (2004–2025), XSLT version (link)

There are some minor aesthetic differences because I've taken the opportunity to fix some visualization artefacts (such as the grid-under/over-bars inconsistency in the first plot), and also to update the statistics for 2025, but other than that the plot match up.

The biggest difference, however, is technical: since the XSLT plots make use of external resources (the XSL stylesheet and the data XML) they must be included not as images, but as objects.

Is it worth it?

Now there's an interesting question.

In terms of raw size, the original SVGs add up to just shy of 9KiB, but the scripts (two separate scripts) to generate them take around 6.3KiB. The data is a few hundred bytes and doesn't get committed to the repository.

The new SVGs weight less than 300 bytes combined, but the XSLT stylesheet to produce the same result (more on this later) is 12KiB. The data itself is around 2.8KiB, and this needs to be committed to the repository (and sent to the user), although it could be reduced to less than 1.3KiB choosing more compact node and attribute names (they're completely arbitrary).

Adding things up, it's a bit of a mixed bag. On the one hand, the original SVGs and necessary scripts take up more space in the repository: barely so on first commit, but the benefits would improve when stats get updated and/or the scripts get revised. On the other hand, the original SVGs send fewer bytes to the reader on first load, although the benefit would even out in case of updates, that would require only the next XML data to be re-fetched (unless the XSLT stylesheet changes too, of course, but that would be expected to change less often).

But there's more to it.

First of all, since we're producing the SVG via transformation, we can actually do something that would have be rather more expensive with the previous approach: we can include both plots in the same SVG:

Combined plots of the Wok languages per year (2004–2025) (link)

This is just a trivial exercise in this case, but there's room for some fancy combinations in more sophisticated setups (left as an exercise to the reader).

Even better: the attentive reader will have noticed that the linked XSLT stylesheet is not, in fact, 12KiB in size, but slightly larger. This is because, given that we have to include those plots as objects anyway, we can take advantage of this to make them interactive.

Fun fact: having interactive plots was actually my original intent, but I stopped short of doing it because the resulting files would have been too large (around 50% larger, with the current interactivity features). Now I don't have to worry about that, because the size increase to make the SVGs interactive goes into the stylesheet, and not only by increasing the size of a single file the benefits get shared between all plots, but even for a single plot the total increase (XML + XSLT) is considerably less because I only need to update the common template of all the bars, and not each bar. So for less than a 1KiB I manage to gain something that would have required more than 2KiB per file to achieve. A net win.

Conclusions

The migration from SVGs statically generated as part of the build process to SVGs generated via XSLT from data stored in XML format has been a resounding success. The next step will be to work on something similar for the sparklines of the index pages.

EDIT (2025-08-08): while working on sparkline [An interactive sparkline of the language distribution over the years] support, I've taken the opportunity to organize things better and redesign the data file. The net result is that both the data file and XSLT stylesheet are slightly smaller, making this approach even more convenient.

Sparkling wok, episode 3

Moving away from sparklines

I'm cheating a bit with the title of this article, but I feel that it fits the general theme, even if it's about “moving away from sparklines”. No, I'm not removing the sparklines I introduced less than a year ago, and improved upon shortly after. The issue came from my desire to present a visualization of data for which sparklines could have been appropriate, but I felt that my expertise with them was not up to par to make it justice.

The data in question was prepared (and briefly discussed) in my recent article (in Italian) about what drives my choice of language for an article, a topic quite dear to me, as shown by the previous article (in English) on the same topic.

The plot in question is the following:

A diverging bar graph showing the number of articles per language for each year.
Languages used in the Wok over time (2004–2025)

The intent was to show visually that even though Italian (blue) has historically been (and still remains) my primary writing language, there is a growing trend for the number of articles written in English (red), that have at time even surpassed my first language, particularly in “slower” years.

So here's the big question: could a similar plot be represented through sparklines? The answer is “technically yes”: in fact, if you look at the script I use to generate the article distribution sparklines, the BEGIN block even contemplates the possibility for negative values (to be shown in the line under the main one), so one possibility would be to produce a sparkline for the blue language, one for the red language, and show them one on top of the other.

However, I feel that this violates the point of sparklines, which is to present data in a way that can fit inline with the text, per Tufte's original idea. (Never mind the fact that my article distribution sparklines also violate the point.)

Additionally, that single value approaching 100 in 2012, with the simplest linear conversion of values to bar height would completely flatten out all other years: the Unicode characters I'm using for sparklines only offer 8 possible values (plus 0), so with a maximum of 97 we can expect anything below 18 to be flattened to 1, and so on. This is frustrating enough already in the commits sparkline where the maximum is 77 and there are still several months that have comparable values —in this case it would be even more frustrating.

I guess things could improve by plotting more granular information (per year-month rather than just year, just like the commit sparklines), but ultimately I decided to go with an actual plot —which opened a whole new can of worms.

There's apparently a dearth of tools that allow you to produce a nice, clean, “divergent bar graph plot” from some simple tabulated data. So I had to roll my own. On the upside, it was actually quite easy to customize for my purposes.

The curious can find here the shell script to extract the information, here an auxiliary awk script used by collect the statistics, and finally here another awk scripts that converts the statistics into an SVG. I guess this last script may be useful for others too, although it will need some heavy customization to be made more general.

Of course, once the statistics have been collected, there's other ways to present the same data. For example, rather than looking at the individual language totals, we may be interested in the percentages: what fraction of all the articles published each year are in one language rather than the other? (This is the script to generate the next plot.)

Stacked bar graph showing the percentage of articles per language for each year.
Percentage of posts by language in the Wok over time (2004–2025)

This plot is more amenable to sparklining, at least if we accept to ignore that single page in latin and the anomalies posed by the empty years. By putting in the sparkline only the native language statistics, it would look something like this: █ ████▅█▇▇▄▇   ▆▆▄▃▅▆▇.

This sparkline does give an “idea at a glance” abut the distribution of the native language posts (which is what they are for), although the gaps are grating, and the low vertical resolution allowed by the blocks kills off a lot of detail. In this sense —again— my sparklines end up violating some of Tufte's directives. Specifically, going by his notebook on the topic, the key takeaway is that sparklines should maximize data, minimize design. And at least in the vertical direction, the coarsening caused by the choice of using Unicode blocks is simply too much.

It really looks like, if I really want to lean in into the sparkline thing in a way that wouldn't appal Tufte, I'll have to look into the generation of inline SVGs, although I'll make sure to stay away from the JavaScript-based solutions which are linked to from his aforementioned notes.

AI signal

Let's do what the Creative Commons people haven't done

The Creative Commons nonprofit has started an initiative called CC Signal with the intent to define new licenses aimed at signaling the copyright holders'

preferences for how their content can be reused by machines based on a set of limited but meaningful options shaped in the public interest.

Never mind the fact that most artists have repeatedly and vocally stated that they do not consent to their “content” being reused by machines, and that despite this and the obvious copyright violations and disregard for the authors' will that undergoes the training of most LLMs —to the point that the techbros behind them are openly stated that they wouldn't have a leg to stand on if they respected copyright— nothing significant has been done in response to these glaring violations, so that this initiative would seem nothing more than an abstract exercise.

What's worse, though, is that this actually indicates that whoever is steering the direction the CC chooses to move towards is completely detached from the interests and desires of its purported user base. The original initiative behind the CC licenses served an often sought-out balance between the strictest enforcement of copyright and the desire of many artists to formally relinquish a subset of the rights granted by it. This time, there isn't much of an interest —if at all— in finding a “middle ground”. Nobody wants their creative work to be ingested by exploitative, power-hungry LLMs to be subsequently defecated in a sloppy sludge of manipulative scams.

The only thing creators are interested in is telling the powers behind the incorrectly-named “AI”s to buzz off.

I've therefore taken the liberty to create a new logo, licensed under CC-BY-SA, that is specifically designed for that. Of course, as already discussed, remember that using an easily-detectable logo to indicate absence of AI may actually help expose you to more aggressive scraping, in complete disregard for your desire, because LLMs are always hungry for fresh material.

A logo inspired by the CC-BY logo, except the “BY” text has been replaced by “NO‌AI”,
and the CC circle has been replaced by a red circle with a cross bar (prohibition symbol) over the text “AI”
CC-inspired No AI signal logo

And of course, do keep in mind that stamping my logo anywhere has no legal bearing. On the upside it's one of my usual hand-coded logos, so it's a very clean that you can download from here and edit to your heart's content.

Sul motore a gatto imburrato

Iniziamo le nuove generazioni.

Ieri, approfittando di un commento sui gatti che cadono sempre in piedi, ho introdotto gli all'ormai classica idea del motore a gatto imburrato (qui su Nonciclopedia), alla maniera tradizionale: dato che un gatto cade sempre in piedi, ed una fetta di pane imburrata cade sempre dalla parte del burro, legare una fetta di pane imburrata legata alla schienda di un gatto, o imburrare direttamente la schiena di un gatto porterebbe a qualcosa che, lasciato cadere, non arriverebbe mai a terra, continuando a girare su se stesso.

Il moto del gatto imburrato può essere opportunamente sfruttato (ad esempio, con magneti e solenoidi) per produrre energia elettrica. Siamo quindi davanti ad un motore perpetuo di prima specie? Purtroppo no. Come subito osservato dal grande, questo motore potrebbe funzionare solo fino alla morte del gatto.

Mentre ne parlavamo, mi è sorto il dubbio: l'aspetto della speranza di vita del gatto sulla durata del generatore era già stato preso in considerazione negli opportuni ambienti, ed in particolare il C.S.M. (Club Seghe Mentali) de rigueur.

Oggi mi sono quindi andato a cercare l'archivio del buon vecchio sito del C.S.M. dei tempi andati (prima che venisse “sdoganato” spostandosi su Facebook e dimostrando di non essere piú in mano ai Veri Nerd™), operazione peraltro piú complessa di quanto pensassi, ed ho potuto cosí confermare che la non perpetuità del motore in questione, legata alla durata della vita del felino, era invero già stata discussa, congiuntamente alla necessità di mantenere il gatto alimentato.

Rimangono invece non documentati, in alcuna delle fonti da me studiate sulla questione, gli aspetti piú prettamente quantitativi, ed in particolare la potenza sviluppata (che dipenderebbe quanto meno dalla velocità di rotazione), valore necessario per poter stimare la quantità di energia effettivamente producibile da un gatto (imburrato e ben nutrito) nell'arco della propria vita. Chissà se sarebbe possibile sfruttare l'ultima chiamata del Fondo Italiano per la Scienza per ottenere finanziamenti mirati allo studio della questione.

A credible threat to (and from) commercial social network silos/3

It isn't just about money and questionable CEO ethics.

Introduction

After posting on the Fediverse the previous chapter of this series, I had an interesting discussion with @jdp23​@gotosocial.thenexus.today concerning his much more positive position towards Bluesky and the way my (and others') contrasting position was presented.

(FWIW, I have not missed the reference to the “I for one welcome our new overlords” meme.)

One thing that can be said about the is that, contrary to some expectations both within and outside of it, it's quite heterogeneous, despite statistical significance of certain demographics over others. Among other things, this means that it is actually not that hard to find people with diverging opinions, or people that may share similar opinions on something, but with very different motivations behind said opinions, which may lead to quite different judgements on other similar things.

In the context of this series, this means that in the Fediverse you will find both people that are welcoming of federation with Facebook's Threads, and people strongly opposed to it (in particular the whole ), as well as people that take towards the issue a cautious but not drastic approach. Likewise, there are people welcoming of Bluesky (like the mentioned @jdp23), and people that do not share such enthusiasm to varying degrees.

However, not all those that welcome or reject federation with Threads or Bluesky do so on the same basis (especially when they reject it). Where this matters is in how their position extends (or not) to other platforms that implement or are working on implementation of the protocol with various degrees of support for federation with the rest of the Fediverse.

The discussion with @jdp23 in particular focused on WordPress, the well-known blogging platform (nowadays a much more general and powerful CMS). Would someone critical of Threads or Bluesky federation be equally critical of WordPress federation? In his post, @jdp23 posits that this would be the case, on the premise that the most common ground for rejection of Threads and Bluesky federation is their being essentially funded on VC money, and managed by CEOs with varying degrees of questionability, these two being the main driving force behind the platforms' inevitable enshittification. For sure, that's the reason why e.g. Cory Doctorow is avoiding BS, as mentioned previously.

In my opinion, however, while those do matter in the assessment of whether or not a company is a threat to the Fediverse (and the open web in general), they are, I would say, not the primary reasons. They are ingredients, and most definitely triggers, in the inevitable decline that will lead to the rug pull that threatens to give the Fediverse the coup de grâce, but they aren't the primary reasons why pose a threat.

I actually think that WordPress presents an excellent example of why they are not, the explanation of which also allows me also to discuss why in my opinion WordPress does not pose a threat to the Fediverse.

WordPress

The first thing to make clear is “what are we talking about”, to avoid confusion between WordPress (the software and its ecosystem at large), WordPress.com (the hosting company) and Automattic, the privately-owned company behind it all (or most of it at least).

As pointed out by @jdp23, Automattic is also heavily VC funded, and its founder's public position has been deteriorating quickly in the last year or so, starting from his famous spat with a banned trans Tumblr user to the more recent WPEngine drama, all of which seasoned by some public ranting that was borderline deranged, and which I'm not going to link because it mostly happened on Xitter.

If massive VC investments and a CEO of questionable sanity was all it took to reject a platform's federation, there is little doubt that WordPress would be on the chopping block. But in my opinion, this is not the case. There is a much more important factor at play, and that is how dependent users are on said VC-funded, questionably-driven company. In this sense, I'll recommend again reading the already-mentioned article by Cory Doctorow, with particular care about the paragraphs that discuss who controls when and how users can leave, and whether or not they can carry the social graph with them.

When it comes to WordPress, the answer is that Automattic actually has very little control on all this. WordPress hosting is one of the commonly offered service by basically all hosting service and domain registrars around the world. In 2014, an estimated 50% of the 70 million WordPress installations where on wordpress.com. Today, an estimated 44% of all websites tracked by the W3Tech Survey are based on WordPress: with an estimate 1.5 billion websites, that's something like 600 million WordPress installations, compared to an estimated 60 million blogs on wordpress.com.

If these numbers are anywhere close to the reality, that would mean that Automattic controls less than 10% of all the WordPress installations. If it went completely off the deep end, it would not be able to drag most of its ecosystem down with it. If ActivityPub integration, currently implemented as a separate plugin whose developer has been hired at Automattic, ever got integrated into core, it would make nearly half of the websites worldwide federated. Of course this would be a very welcome improvement over the current situation!

No such thing as no risk

WordPress in general has always been receptive, promoter and sometimes even the main proponent of open web standards: even as the major tech powerhouses have done their best to suppress , for example, you are basically guaranteed even now to see more or less visible links to the several feeds provided on any WordPress site; they also contributed to the widespread adoption of some linkback protocols, arguably one of the earliest approaches at federated social media on the web.

All this goodwill does not put them above judgement for any threat they could pose for the Fediverse, however, and a large-scale adoption of their ActivityPub plugin could become a threat to the Fediverse indeed, again for a matter of numbers: not in terms of how many users would be in the hands of Automattic, unable to escape with their social graph (this number would be in fact be relatively low, as discussed above), but in terms of their relative size to the rest of the Fediverse.

As I've had the opportunity to mention, the ActivityPub specification suffer from “catastrophic underspecification”. Moreover, even in parts where it does not, to maximize successful federation any (new) server software has an incentive to maximize compatibility with the (existing) dominant solutions, even when this leads to suboptimal or questionable choices.

Probably the most well-known example of this is how Mastodon's lack of support for anything but the Note object type within the ActivityStream Vocabulary (based on Mastodon's intended use as a microblogging platform) currently impacts other Fediverse platforms, that face an unpleasant conundrum: should they federate their content as Note, or should they opt for more appropriate types, and thus risk their contents failing to be properly presented to Mastodon users?

Different platforms take different paths here, with benefits and downsides.

, for example, opts for correctness. Long-form writing is federated as Article, photos as Image, and comments as Note objects. The obvious advantage of this is that objects are federated by a sensible type that can help type-aware platforms to represent them optimally. The downside is that if e.g. a “followers-only” article or image is delivered to a Mastodon follower, they will not be able to actually read it, because to them the Article or Image with render simply as the title and a link to the original object, which they won't be able to access on the sender's server where they won't be logged in.

on the other hand has opted to maximize compatibility, so even though it's an image-centered platform, it federates its content as Note objects, with the image(s) as attachments, which Mastodon is able to ingest and represent in a more accessible way. The ActivityPub plugin for WordPress delegates this choice to the user, letting them choose the type, but warning them about compatibility issues when the more appropriate Article type is selected.

While the obvious solution in this particular case would be to fix the Mastodon issue (since there is in practice little reason to not handle correctly objects of different types when the content can be managed just like for notes and their attachments), what this issue highlights is the weight that the dominant platform carries in defining how the protocol can be used when interoperability is a priority.

This is of course nothing new (de facto versus de jure standards on the web are something that is at least as old as the first of the browser wars), but it does mean that for the health of an ecosystem (or protocol) it is essential that the dominant solution plays its role in maintaining interoperability. And given how this is just barely the case even for a project like Mastodon (with its questionable prioritizing of features on the apparent delusion that copying some of the worst aspects of the commercial networks will somehow allure to the general public, over a more active collaboration with other platforms to strengthen the weaker aspects of federation), it's not hard to see how much worse things would be if the dominant position was taken over by a, shall we say, more commercially oriented platform.

And yes, this is not unlike what triggered the first post in this series when discussing Threads, but a similar issue would emerge with WordPress (the “source of the power” is always the same) even when intent behind the federation may not be the usual Embrace, Extend, Extinguish (EEE) we've come to expect from Big Tech.

And therein lies the rub: even if most WordPress installations are not under direct control of Automattic, the development of the software (including its ActivityPub plugin) do remain presently under their control. And while presently they remain a relatively minor player in the Fediverse, it's not unlikely that their relevant will grow as the plugin installation base expands, giving them more and more weight in the “compatibility decision tree” for the other platforms.

What would there be then to protect us from abuses of such power not unlike the ones discussed for Threads? Not much, in fact, except for the conscience of the plugin author and WordPress' history of open standards' support. Which is, in fact, a pretty thin and fragile protection, given how some questionable recent initiatives by Automattic against parts of the ecosystem they don't like show their willingness to breach trust for momentary opportunity.

WordPress vs Automattic?

This poses and interesting conundrum.

On the one hand, I see no reason to distrust WordPress' expansion in the Fediverse. In many ways, all WordPress sites enabling ActivityPub would be a massive win for the Fediverse —much more so, in fact, than its adoption by the genocidal, manipulative corporation behind Facebook, Instagram and now Threads.

On the other hand, especially given recent events, I would be hard-pressed to say we could trust Automattic to “do the right thing” with that much power in their hands, although I suspect it would be much less likely for them to “pull the rug”, which is the primary Threads' threat: what would be more likely would be the adoption of progressively less compatible extensions to the basic ActivityPub format, making it harder and harder for other platforms to keep up.

A way out of this could be a WordPress transition to a more community-oriented management, or a fork in the worst case (not unlike how LibreOffice forked from its predecessors to become the reference FLOSS office suite). Despite the apparent decline in the founder's sanity, though, I still feel that Automattic is much less of —if at all— a threat to the Fediverse, particularly compared to Threads or BS. This is in large part due to their business model being in turn much less threatened by the Fediverse in the first place. If anything, in fact, with a widely federated WordPress, the growth of the Fediverse as an “auxiliary network” would be a win for them, as it would help spread out the content they host and potentially help funnel readership towards more commercial endeavours, from which they often extract tithe.

While Threads and BS are in direct competition with Mastodon and the rest of the Fediverse microblogging platforms (any eye seeing their content outside of their platforms is one less opportunity to profile and sell data to advertisers), WordPress can leverage it to its benefit: advertising its compatibility with Mastodon would become a selling point with a growing Fediverse.

In his article, @jdp23 considers BS a potential counterbalance to the dominance of Meta in the Fediverse via Threads. I don't see it that way: with their choice of going with a different protocol, BS is intentionally taking themselves out of the “ActivityPub control” business, and in direct competition even at a technical level. Ironically, this sets them more as an ally of Threads in bending ActivityPub to be more corporate-friendly (giving them the opportunity to bring up “see, we need this shitty feature to compete with BS” arguments). This is also one of the ways in which their existence is a two-pronged attach to the Fediverse that I mentioned in the previous installment.

It's actually more likely for a company like Automattic to take this role instead: if an “ally” can be found in the corporate space (and that's quite debatable in the first place), it would be one that does not rely on centralization for its business model, and that can benefit more from an “independent” Fediverse as a support network.

(Of course, there's still the “little” issue of the CEO personality …)

No competition?

Is it actually true that Automattic doesn't have competition in the Fediverse, though?

WordPress as a CMS does have a competitor: Hubzilla. It just happens to have such a limited presence in the Fediverse (despite its support for long-wishlisted features such as nomadic identities) that it flies under the radar, particularly among the general public where the only awareness about the Fediverse, if at all, comes from having heard about Mastodon as a (nerdy, possibly dysfunctional) Twitter alternative.

Even as a blogging platform WordPress has “competition” on the Fediverse. WriteFreely, that holds around 1% of the total Fediverse user count (according to FediDB), is probably the best known blogging software for the Fediverse, open sourced from the write.as service set up by Matt Baer for its Musing Studio suite. (The curious may want to read up Matt Baer's take on “bringing blogging to the Fediverse”, where he also mentions other Fediverse blogging platforms such as Plume.)

Although they fall within the same category, and thus arguably offer similar basic functionality in some sense, I suspect that these services are not something WordPress would really feel the competition from. In many ways, WordPress is an ecosystem in itself which, through a number of both free and non-free plugins, allows an extremely wide range of applications of the software, from simple blog to e-commerce sites with a side dish of Patreon alternatives (such as the one Jennie Gyllblad aka @JenJen​@mastodon.art is setting up to safeguard herself from the aggressive policing of lewd content and general enshittification on many payment/​membership services). Competence in setting up more sophisticated WordPress configurations is even “monetizable”, and there are both individuals and companies selling such services.

Could the same be true for things like Hubzilla, Friendica or WriteFreely? For the latter probably not, and although I suspect there may be some potential for it on the other two, I'm not aware of anything like that —possibly because their “market share” (doesn't feel like “market” would be the appropriate term here though, does it?) is so small: a few thousand installations combined versus the tens or hundreds of millions of installations for WordPress.

I honestly doubt that the existing “native1” Fediverse services will ever grow to pose an actual threat to WordPress, especially if the WordPress ActivityPub integration progresses to the point that it would be more unusual to see a WordPress site without the plugin than one with it. (Would we start seeing Fediverse-aware WordPress themes that include the Fediverse handle of the blog(s) in the @user@domain form or a “boost this post with your Fediverse account” link?) In fact, I wouldn't be surprised if such a large-scale adoption of the ActivityPub plugin for WordPress, being the sign of a heightened awareness of the Fediverse, ended up leading —on the contrary— to a decrease in adoption of the current “native” Fediverse services.

Automattic is not just WordPress

In 2019, Automattic bought —for pocket changeTumblr, the social blogging silo famously almost-killed by its owners while in Yahoo! and later Verizon hands due to their conservative policies and terms of services that drove out much of the (porn-based, often queer) traffic (policies that, I should point out, have scarcely been lifted under the new owner, motivated by the restrictive cascade of limitations imposed by financial institutions, as well as (particularly mobile) operating systems for “apps” to be distributed through official channels).

Unless things have changed in the last year or so, Tumblr is not doing Automattic any money, despite having a user base that is of the same order of magnitude of the worldwide WordPress usage: the cost of operating such a massive installation are clearly much higher than what can be obtained from “standard” monetization practices (ads, plus paid subscriptions for ad removal). This is most likely the reason why Automattic has gone the (inevitable?) path of selling user data for SALAMI training, both from Tumblr and their WordPress hosting services.

(You may remember this being what I predict will happen with BS too. And yes, Tumblr showing that “simple” funding sources don't work at that scale is one of the reasons I'm sure it'll happen with BS.)

Now here's the interesting question: does the Fediverse offer something that could compete with Tumblr?

The answer is yes, and in my opinion Friendica is the closest competitor.

One of the main features that used to characterize Tumblr over its competitors when it got released was that it was a microblogging platform2 with social media features. In this sense, it could be considered a competitor for Facebook rather than, say, Twitter (timeline: Facebook launched in 2004, Twitter in 2006, Tumblr in 2007), due to the wide variety of content types supported (text, photos/audio/video, quotes, links, and even chats) and the possibility for users to interact with each other's post by liking, commenting or reblogging (with or without additional context).

What set Tumblr apart from Facebook was a preference for the short-form content that gave it the name, but most prominently (aside from, of course, not being designed around the collection of personal information) the customizability of the blog's interface, something that took a page more from MySpace and most importantly the blogosphere (including WordPress, that had introduced theming in 2005) than from its aseptic competitor. It's not unlikely that this was part of its appeal (compared to the “serious, professional” Facebook —the role now absolved by LinkedIn, arguably) and potentially will contribute to its demise as well, in a world where this kind of creativity is stymied by the uniform “professionalism” needed to cater to advertisers et similia.

So why would Friendica be the most likely competitor to Tumblr? The platform is more intended as an alternative to Facebook, and its default theme may even remind someone of the classic Facebook look. In terms of functionality, it covers by and large the same feature set (my understanding is that Tumblr's Ask is the only one missing, but it offers more, such as event planning). One of the winning features of Friendica is its extensive interoperability support: in addition to ActivityPub and its OStatus predecessor, it also “speaks” the diaspora* protocol (probably the only other platform that still supports it, today), it can ingest feeds, and —most notably in this context— it supports a number of proprietary networks including Tumblr via specific addons.

(Yes, you can follow Tumblr accounts from Friendica; yes, you already need a Tumblr account to do that: it will act as the “bridge”; yes, there is some support for cross-network interactions.)

There is one important feature missing in Friendica, arguably the most important feature in the comparison with Tumblr, the one that truly set Tumblr apart as a “modern” social network: customization.

This is actually a missing feature in the modern social media landscape in general, both in the corporate-controlled social silos and across Fediverse projects, although there are likely different reasons for it —but I'm not going into detail on the corporate silo loss of personality and aesthetics (not just UI) convergence here.

A “custom” Fediverse?

Fediverse server software does offer some degree of customizability, but generally only at instance level: if the users are afforded any kind of control at all for “their” profile page, it is at best some amount of custom : as powerful as this can be in capable hands —especially with the recent progress CSS has made— it's still far from the freedom that could be enjoyed in Tumblr. So, unless you're on a single-user instance (something that e.g. GoToSocial is particularly suited for), your customization option are quite limited.

There are obviously security concerns (among others) that motivate such an approach, since more flexibility than just “changing some colors here and there” requires giving users control on the HTML and potentially (inlined) JavaScript that will be presented to the viewers, that may pose a threat both to the instance and to the visitor, via malicious scripting or template language abuse.

(This isn't unique to Fediverse software, by the way. WordPress itself is infamous for its themes and plugins being attack vectors. There's a reason why bots keep trying to reach the non-existent wp-login.php and xmlrpc.php in the root of my (static) websites.)

There are obviously ways to limit the effect of this, such as using very restrictive (possibly custom) templating languages, (while we're talking about Tumblr, this is the syntax they use: it could be adapted for other platforms too, if there was an interest for this), even though this wouldn't solve for example the issue of profile pages with, say, a client-side cryptocurrency mining script.

The bigger question however, and I think this is also true for the corporate silos, is: would something like this still be a “selling point” for a social media platform in this day and age?

This isn't even just a matter of emulating the Big Tech “solutions” (platforms like MissKey can do some pretty crazy stuff compared to the corporate silos, for example, the least surprising of which is a “Cat” profile setting that will make your profile appear as belonging to a cat on supporting clients, across the network). In a federated social network, you'll be led infrequently to visit other people's profile pages directly: what you will see most of the time will be your server's rendition of their (possibly incomplete) profile information. (Ironically, centralized network would have an upper hand in providing “custom profile look” as a feature, since visiting a user profile would always be “local”, but instead choose to copy each other in a rush towards an indistinguishable anonymity of ensnaring boredom.)

Moreover, this is before even considering that many (most?) people don't even browse the web directly, but will peruse their social media via some mobile “app” that will provide its own interface through which all content will be presented. (Yes, this is the same issue that allegedly prevents Automattic from restoring Tumblr's “porn rights”, if you remember.) When would a customized profile or “wall” even be viewed?

It's not easy to argue that individual profile customization should have high priority, especially with the more pressing technical issues that plague the Fediverse, but on the other hand there's an argument to be made that the Fediverse platforms, more than anything else being developed today, has the opportunity to look back at the lessons from MySpace and Tumblr, remember the impact they had on the generations approaching the web in those days, and recreate a similar experience for the new generations, giving them an opportunity to learn something about web development in a (hopefully) less daunting (to beginners) environment than other approaches to the IndieWeb.

And of course, for this to be “valuable”, it would first require better cross-instance interaction, so that e.g. visiting the original profile page or public timeline of another user can become an experience as smooth as browsing local profiles and timelines. Better user agent support could help a lot here, but can we even count on Mozilla to do the right thing here, after they quit the Fediverse and fired their advocacy division?

But that's a discussion for another time. Here's a parting thought meanwhile: will we get a federated alternative to Tumblr earlier by adding “Asks” and customizable profiles/​“walls” to Friendica, or by expanding the WordPress ActivityPub plugin so that WordPress gains support for reblogs, favourites and other social features that are currently not represented there? (And would Automattic even allow that?)


  1. yes, I'm aware that calling Hubzilla “native” is a bit of a stretch. ↩

  2. at the time, the preferred term was tumblelog (hence the name), indicating predominantly “quick and dirty” short-form writing, but without the preset character limits Twitter was famous for (and which derived from its SMS bridge feature), and were later inherited by other microblogging platforms, both commercial and open. ↩

A credible threat to (and from) commercial social network silos/2

The Fediverse, especially through Mastodon, has been acknowledged by the major players as a threat —to be eliminated.

Foreword

When I got started on this series, the discourse was focused on the imminent launch of what is now known as Threads, the microblogging service offered by Meta, the parent company that also owns and and has more recently decided to trademark common words to make it harder for us to speak without mentioning it.

The choice to focus on it was motivated by the threat that this new platform poses to the via the Embrace, Extend, Extinguish (EEE) strategy that has been deployed with mixed success by tech giants in the last decades: in this sense, the threat posed by Threads comes with their declared intention (so far only partially realized) to federate via ActivityPub.

There was enough material about that already to convince me to postpone discussing its main commercial competitor, Bluesky (BS), and the threat it poses to the future of the open web by choosing to not be compatible with the ActivityPub network, while still presenting as a champion of decentralization (more on this later).

The benefit of having delayed writing the article is that I can go more in depth with specifics about it. The downside for you is that you have a lot to read (and for me a lot to write —took me over a week to finalize the first public draft).

Fighting fire with fire

In this sense, BS and Threads together are a two-pronged attack on the Fediverse: the former as the (“better”, for appropriate definition thereof) alternative for decentralization, and the latter as the EEE Trojan horse for the case when the Fediverse still wins (more on this later, too).

This shouldn't be read as some sort of “conspiracy” (with Jack Dorsey and Mark Zuckerberg meeting behind closed doors to plan together a combined attack), but rather as the predictable outcome of competitors finding themselves in front of a common enemy (the Fediverse) that is gaining credibility and popularity, and adopting different strategies to bring it down while trying to trip the other at the same time.

The difference in these strategies reflect also in the different approaches the platforms take on content distribution and —most importantly— moderation.

Meta has extensive understanding of the importance of moderation to keep (a certain type of) users happy, even when failing at implementing it properly (and I'm not even talking about their role in the Rohingya genocide in Myanmar here). They keep getting it wrong (can't get rid of your biases and sympathies after all, and throwing automation into the mix just makes everything worse), but at least they try.

The Twitter founder, on the other hand, has a more, shall we say, “TESCREAList” approach to moderation, or if you'd prefer a more libertoloid1 approach to it, that essentially boils down to «I want Nazis on my platform». (You can read the jwz jab about it here, but I recommend reading the more detailed and documented write-up by David Gerard to have a more complete picture of how deep the rabbit hole goes.)

Of course, there's only two types of people that are OK with Nazis on their platform: Nazi sympathisers, and people genuinely deluded that it's possible to have a constructive debate with a Nazi other than punching them in the face. But I repeat myself. (By the way, I actually disagree with Mike Masnick on that one: the Substack CEO is well aware of what he's turning his platform into; it's not by mistake, it's by design, which is why I recommend looking into something like Ghost to anyone looking for a publishing platform.)

Don't worry, though, having been founded specifically to allow Nazis to thrive in the same social space as you, with a data propagation design that is inherently unsafe, isn't the only thing that sets BS apart from the Fediverse: there's also the cryptocurrency grift. This has roots that trace back to when BS was still “incubating” at Twitter (from the choice of Jay Graber as CEO to the cooperation with the (at the time new) cyrptogrift Twitter team that brought us those those wonderful NFT profile pictures, so functional in helping us identify fools with a simple look at the shape of said profile picture).

If I were a BS user, I would give particular attention to everything they promise not to do in their announcement because that's exactly the path they'll take when they start to cash in for the investors' exit. Shall we be thankful that at least they haven't yet made any announcements about SALAMI (“AI”) yet? (And yes, more on this later too.)

Leaving moderation and exploitation aside, let's bring back into focus the different paths Threads and BS are taking for federation. I've already discussed at length about how Threads' (selective) “embrace” of ActivityPub can be weaponized against the Fediverse, and as promised I'll spend a few more words about it later. For now, though, let's see how BS approaches it differently, and what this means for the Fediverse.

BlueSky “decentralization” theory and practice

One of the purported purposes of BS is to test, validate and promote the use of the AT Protocol (nothing to do with modem commands), the underlying protocol designed by the BS developers to separate “speech” from “reach” —the separation behind the idea of «I want Nazis on my platform, but some people may not want to see them, and I want to spill both for cash» for which BS was created in the first place— on the same network.

I mentioned in the Foreword that BS has been created in response to the threat posed by the Fediverse. This is clear from the timeline. Dorsey announced the BS initiative in 2019. (I have no doubt that the idea came from engineers that truly believed in said decentralization as an essential tool for the open web, but what happened to it after it got in the hands of the leadership is an entirely different matter.)

At the time of the announcement, the Fediverse had already established itself through the complex network of protocols that the FLOSS community had experimented with and developed over the years, from the diaspora* project born ten years earlier to build a federated alternative to Facebook, to the family of protocols (StatusNet, OStatus, ActivityPump) that converged into ActivityPub with the W3C standardization process that ended in 2018, passing through platforms such as Friendica and its fork HubZilla designed with multi-protocol interoperability in mind, and arguably the first concrete examples of implementation of essential principles such as nomadic identity.

(I stress the latter in particular because one of the purported reasons why ActivityPub was snubbed by Twitter when evaluating the existing decentralized protocols was the absence of a portable identity —which however in itself is a pretty poor justification for developing an entirely new protocol, as shown for example by the proposed extension by Mike Macgirvin, the author of Friendica, HubZilla, Zot6 and several other projects that include efforts to bring self-sovereign identities to the Fediverse. The curious may find additional information in the documentation about federating nomadic identities of the streams repository. But doing that requires having interoperability as a priority, and an interest in promoting open standards, which really isn't what BS was ever about.)

By 2022, even before the sale of Twitter became final, the mindful people (yes, I include myself in the group) had already realized that the handover would be fatal to the platform and had started looking for alternatives. Saturated by the enshittification wave that was covering all corporate platforms, they realized that the only path forward was to stay as far as possible from the control of the giants of surveillance capitalism and found in the Fediverse the better (or more appropriately “least worst”) alternative. One of the key advantages it had over the competition was existing, and having several years of mixed track record showing (some of) both its strong and weak points (positive example: the isolation of the fascist network Gab; negative example: the Will Wheaton experience).

Meanwhile, in the three years since its start (yes, I'm including the time preluding to the 2021 foundation of the corporation that was spun off to handle the development) the BS initiative had little to show for itself (as Jack Dorsey himself said in early 2022, “It has been slow”) giving several wannabes (from Hive to the soon defunct Post.news, just to name the first that come to mind) the opportunity to spread chaos among those seeking for a new digital home in the preludes and even more so in the heat of the first large-scale Twitter exodus following the handover in late 2022, be it to cash in quickly from VCs taking the opportunity to milk the cow of the Twitter-disillusioned, or to divert attention from the Fediverse —leveraging its limitations (both real and FUD-fueled) and wads of cash to build something æsthetically attractive but with no concrete (or questionable when present) vision for the future and the platform sustainability

This («oops, we're losing face») put pressure on BS to “put something out there”, which turned out to be underwhelming, with not much to offer over the several Twitter “alternatives” and look-alikes that had popped up to cash in on the Twitter exodus other than the promise of “the new underpinning technology of the AT protocol”.

As it turns out, the purported decentralization theoretically made possible by is largely performative. This isn't just a coincidence due to BS being (essentially) the only significant implementor of the protocol, but is quite clearly a design decision. This is discussed in the aforementioned article about why Jack Dorsey ultimately dropped BS, but I also recommend reading @rysiek​@mstdn.social's write-up on the topic (out of date on some aspects, but still largely on point) and the more recent thread following @jonny​@neuromatch.social's comments on the possibility of alternate relays in the so-called “ATmosphere”.

This performative approach allows BS to reap all the benefits of centralization, while still occupying the decentralization mindspace —where that matters— to the point of confusing less knowledgeable people about whether or not BS is part of the Fediverse or not (it's not; the fact that a bridge service exists that allows posts from the Fediverse to be propagated to BS and conversely, while providing a potentially valuable service, helps muddy the waters in their favour —more on this later too!).

Of course, being still essentially a centralized service despite the decentralization cosplay, BS doesn't actually suffer from any of the issues that come from actual decentralization, such as the directory fragmentation, message propagation and indexing issues, or even just the “dreaded” need to choose an instance that has apparently (or allegedly) scared so many users off the Fediverse, while still allowing it to build “cred” for ATproto to be “federation done right” simply because its claims of federation are essentially untested.

As such, it has significant appeal for those who have gained awareness of the “limited timespan” (among several other issues) of corporate silos and somehow got wind that a decentralized, interconnected network closer in spirit to the “older” Internet, far from being a step back, would actually bring a breath of fresh air into their passion for the medium, but at the same are unaware of the existence of the Fediverse, or more often than not simply got scared (frequently before even trying) by its purported difficulties: it thus deludes them into having found an “escape hatch” via its pretense of decentralization, diverting their efforts towards rebuilding their presence in yet another corporate silo instead of an actually revolutionary (if at time painful) endeavour.

This combination of simplicity from its essentially centralized nature on one hand, and the paint of freshness and innovation from the “magic” protocol on the other, has been a nontrivial factor in its adoption during the most recent exodus from the social silo formerly known as Twitter and renamed by its new owner to that unoriginal (and pornographic-sounding) “X” Musk has been obsessed with since his online payment days that has led several connoisseurs to refer to it by the more apt Xitter moniker, pronounced with an initial “sh” sound to signify the critical difference in management and moderation style between the old and new platform.

But is there anything behind the cosplay?

A bridge too far

Ironically, basically the only thing that gives a semblance of credibility to the “decentralized” claim of BS is … the Fediverse. And even that is not only risible (metric-wise), but actually dangerous (for the Fediverse).

So where does this decentralization come from? By means of the aforementioned bridging service, Fediverse accounts can follow (some) BS accounts and vice versa, with consent given by following the “bridgehead” (endpoint) represented by the Fediverse (resp. BS) account representing the bridge itself on either network.

Aside from some technical limitations, the fact that the bridge allows accounts on either side of the bridge to see posts from the other side as coming from a “meaningful” user is an indication that some amount of decentralization is indeed possible with ATproto. As explained in the write-ups linked above (1, 2, 3), however, ultimately the control on what is and isn't accessible to the majority of users on the network remains in the hands of the corporation that controls the main node (see also the Entryway discussion in the BS technical documentation).

This means in particular that at any point in the future BS can introduce changes, or even simply decide to cut off “non-sanctioned” servers at no cost for themselves. (This is not a hypothetical: if you are hosting an ATproto PDS you must keep it up to date because BS introduces backwards-incompatible changes from time to time, and that cuts off your connection to the rest of the so-called ATmosphere.)

If this is reminiscent to the threat from Instagram's Threads I discussed in the previous installment, it is not by chance, because the leverage these juggernauts have always comes from the same source: user count. (And that's without even considering that BS also has complete control of the protocol itself, and obviously of the reference implementation which is, AFAIK, what basically everybody uses, not to talk about the “distributed” identity provider all BS accounts use.)

We can look at the number game by comparison with the Fediverse. (Note that the following statistics do not account for the contribution from Threads, and not just because getting metrics about it is all but impossible —the best I could do was discovering that around 2K federated Threads accounts overall are visible from my Fediverse accounts on instances that didn't join the . I don't have an account on mastodon.social, though, which is probably the Fediverse instance that sees the most federated Threads accounts. Other estimates reportedly put it around 50K.)

According to FediDB, at the moment of writing, the total Fediverse user count is just above 11M, with a rather stable 1M “monthly active users” (MAU). The largest Fediverse instance, Mastodon's flagship, has around 2.2M users, with around 240K MAU. In other words, the largest Fediverse instance has around 20% of the total users, and 24% of the MAU. These are already considered excessive ratios within the Fediverse, but if for any reason mastodon.social decided to go its own way and cut off the connection to the other servers, they would cut off over three quarters of the Fediverse. The Fediverse would suffer a bit, but would continue to exist without them.

For comparison, according to Wikipedia, BS currently has over 13M users, with 6.8M monthly active (ratios not unusual for a new social network in its growth phase, especially after the mass migrations caused by Xitter's problems in Brazil first, and more recently by the Xitter changes about blocks and the future use of user content for training of SALAMI energy sinks). How much of that comes from “outside” BS?

For the answer, we look at the data collected by @mackuba​@martianbase.net, presented in some statistics and a directory with information about traffic from non-BS PDSes. As of today, for example, the stats show that over an approximate peak of 2M unique weekly posting BS users, there are than 6K that come from outside BS, and the vast majority (nearly 5K) of these come from the Fediverse “proper” (through the bridge, that also covers the Nostr network I may talk about at some other time, and “direct” bridging of websites), while less than 300 come data servers different from the bridge. From the directory we see that of the over 16K non-BS accounts visible on BS, less than 1K do not come from the bridge.

(Of note, the directory explicitly mentions that the list covers non-BS PDSes that are visible to BS itself. There may be more that are not federated. How many is not known, but I suspect that if their numbers weren't vanishingly small we would have heard about them. This is actually another aspect where the “ATmosphere” and the Fediverse differ: statistics about the Fediverse are much less accurate, and almost surely underestimated2 due to how vast and variegated the ecosystem is even across the federation (i.e. without considering isolated subnetworks like Gab) —some instance software, for example, doesn't even report total or active users— whereas the ATmosphere is essentially defined by “what's in BS' orbit”: rather than an atmosphere, it should be considered more akin to the solar system.)

This tells us a few things about BS for our “number games”:

  1. around 80% of the non-BS traffic on BS comes from the Fediverse (through the bridge; this does not account for traffic coming from multi-protocol software such as Friendica instances with ATproto support, but again there's reason to believe that's vanishingly small);
  2. around 96% of the non-BS accounts on BS come from the bridge;
  3. the entire non-BS traffic on BS accounts for less than 3‰ (that's per mille, not percent) of the BS traffic

(Also fun fact: after each “Musk did something” peaks, the numbers of “native” BS posts shows a sharp decline, while that of bridged posts remains constant, when not showing straight out growth: i.e. relatively more traffic starts coming in through the bridge from the Fediverse. If the “native” BS traffic were to back to August level, the Fediverse contribution would jump up to around 1%.)

In addition to my initial statement above about the Fediverse being what really gives credibility to BS's decentralization claim, these numbers confirm, among other things, that BS could cut off all external PDSes and it would make barely a difference for them or their users, while everybody else in the “ATmosphere” would essentially remain isolated. This is not how decentralization works. (Compare and contrast with what would happen if the Fediverse juggernaut decided to isolate itself.)

Of course, this isn't going to happen anytime soon: as with any commercial platform in their growth phase, BS will do all they can to encourage developers and users to join in and build stuff that integrates with (i.e. depends on) their platform: they need to cultivate an “ecosystem” before moving on to the “cash in” phase and reap the benefits of centralization.

I highly recommend reading some of @atomicpoet​@mastodon.social's write-ups on how this went with Twitter. A couple of references: this thread on Hootsuite's fall, with its closing invitation to build on open protocols, and even more importantly this thread on the importance of third-party developers for platform adoption, and its near-closing reminder about how misplaced it is to put your trust in a commercial enterprise to not screw you over. Anybody setting themselves up to play with ATproto or trusting BS is setting themselves up for a huge delusion.

Don't be misled by its characterization as an “open standard”: ATproto and its ecosystem are effectively fully under corporate control. It's not hard to see how catastrophic that's going to be in the longer term, when considering that even the web —based on much more open standards— failed to resist corporate takeover. I refer you again to @rysiek​@mstdn.social's post and its comparison between the BS/ATproto situation and the monocultures that have and still do dominate —and hold back— the web. It doesn't matter what the spec says. Ultimately use is determined by what the juggernauts in the field choose, and for ATproto that's BS, just like for the web it's GAFAM (and today Google in particular). And yes, this is also a problem with Mastodon in the Fediverse (it's a common complaint, in fact), and even that is not at the scale at which this is a problem with Google on the web, and even that is not at the scale at which this is and will be a problem with BS and ATproto. Forewarned is forearmed.

For users, there is no doubt that the bridge provides a useful service, allowing people on either side to follow and interact with people on the other side, even if at the moment its usage is pretty limited: the bridge has 14.7K followers declared on the BS side, and by my estimation 11.1K on the Fediverse side. (Some would like to change that, pump the numbers up making the bridge opt-out instead of opt-in (a relevant thread). Informed consent? What's that?)

For BS, there is also no doubt that at the moment the bridge is a net win, providing a veneer of decentralization for an essentially centralized service, and giving them a latch on the Fediverse.

For the Fediverse as a whole, the evaluation of the bridge effect is less favorable. By giving BS credibility on the decentralization claim, it damages the public perception of what decentralization actually is, and implicitly supports BS' false claims about the (alleged) superiority of the ATproto approach to decentralization (since it hides that, on the contrary, all perceived benefits in the BS ecosystem come from its centralization instead).

On a larger scale, it will also increase the appeal of BS over the Fediverse for people who would be otherwise more inclined towards the latter for the network effect. In this sense, it provides BS with the rug to be pulled that was already discussed in the previous post concerning Threads' ActivityPub integration (when it will be complete, if not already at this stage of development).

As in that case, while it's theoretically possible for this integration (native in Threads' case, provided by the bridge in the BS case) to work in the other direction as well, helping people move to the Fediverse when (not if, but when) the commercial silos start their enshittification process, it's more likely that such a channel will be plugged as soon as the flow becomes a threat for the silos. In the mean time, it's more likely to favor migration from the Fediverse to the silos, that are generally more appealing in this initial growth phases.

(Of course, for both the Threads and BS bridge case, the argument is grounded in numbers. We won't see any meaningful effect from the bridge presence as long as cross-network federation is low, nor from the Threads integration until it reaches completion and sees wider adoption.)

But while in the Threads case the channel (i.e. the rug to be pulled) is entirely controlled by its genocidal parent company, and there's little that can be done about it Fediverse-side except joining the against federation with Threads, in the BS case it's almost silly how we're basically weaving that rug ourselves.

But it's worse than that

BS actually has some extra cards to play against the Fediverse, compared to Threads. For example, they can (and do) play the above-mentioned “our approach to decentralization is better” card (even if the only thing that makes it better is … the centralization; but facts don't really matter in the public perception).

Moreover, despite the potential threat of BS pulling the rug (deciding to go its own way) in more mature times, I don't actually think BS will have a particular inclination to cut off external PDSes, except maybe for egregious reasons (not hard to guess one, and it's not “only hosting Nazis”). This isn't so much out of good will as much as a matter of convenience. Again, as also reported by @rysiek​@mstdn.social, several people have noticed that the structure of the network is designed so that BS can offload work to other parties while reaping the benefits of “centralization where it matters” for itself. In this sense, ATproto is the epitome of capitalism (privatize profits, socialize losses), and chokepoint capitalism at that, enshrined at the protocol level (of course, BS is setting itself up as the chokepoint).

And before anyone jumps in, remarking that the protocol doesn't explicitly require the existence of a large centralized server: the reliance on «fairly resource-demanding» Relays and App Views for network-wide data collection, distribution and presentation is the relevant cornerstone highlighted in the protocol documentation itself, and clear to anyone who has given any thought to what it would mean to host an alternative relay other than for niche communities.

(This is essentially a discourse similar to what could be done about the open web and search engines. And not by chance.)

While we're at it, one of the most ridiculous statements in the ATproto design documents is their claim («“Big World” Design» section, that I can't even link directly because apparently whoever wrote the docs doesn't want that particular section to be linked) is the claim that the protocol is «modeled after the open web itself».

Here's the next two sentences:

With the web, individual computers upload content to the network, and then all of that content is then broadcasted back to other computers. Similarly, with the AT Protocol, we’re sending messages to a much smaller number of big aggregators, which then broadcast that data to personal data servers across the network.

This is one of the worst misrepresentation of what the open web is (or more precisely: what it was intended to be) that I've ever seen. It's even worse than the Series of tubes take. The web, especially the open web, was never about “uploading content to the network” to have it then “broadcasted” to other computers. The key defining characteristic of the web was interconnection between independent computers, directly accessible from any other (inter)connected computer. The original design document of the WWW has this to say about publishing documents on the web (emphasis mine):

From the information provider’s point of view, existing information systems may be “published” as part of the web simply by giving access to the data through a small server program. The data itself, and the software and human procedures which manage it, are left entirely in place.

I doubt that the writers of that document made such a glaring mistake by ignorance or distraction. The misrepresentation is intentional, to fake an analogy between the centralizing nature of BS' ATproto and the completely opposite, decentralized spirit and intent of the open web that has been completely destroyed by the growth of GAFAM. Ironically, this misrepresentation aims at presenting BS and its ATproto as a decentralized alternative to Big Tech while presenting the Big Tech chokepoint model as the open web model.

They then go on to present three justifications for the opportunity of what they call the “big-world indexing” model —except that the excuses they present are mostly bullshit (I mean, they're not BS for nothing), and when they're not they still don't require the kind of centralized model that they propose, revealing the actual intent behind such centralization (already discussed in those same 1, 2 and 3). It's also almost offensive how they don't explicitly mention ActivityPub, which is most surely what they're really comparing against.

Let's see them in detail.

Reduce load on PDSs

(“Make it easier to self-host, you can easily run your own server.”)

The open web model that ATproto claims to be inspired by is a “pull” protocol. When someone is interested in a content you publish, they fetch it directly from your server. In an analogy with a social network with a similar model, the load on your server would typically depend on how many followers you have (think ). And yes, if you start getting viral and/or have a huge following a 56K home connection might not cut it. There's a reason why slashdotting has become a verb.

The solution to this is to upgrade to a beefier hosting service (especially in terms of bandwidth) if this becomes a routine occurrence, not to move towards centralization for everything and everybody, all in the hands of a single (or a few) corporations.

But it's more likely that the BS document is taking a jab at Mastodon, that is infamously … not particularly efficient to host. However, that's a Mastodon issue more than a protocol issue. (This, by the way, is something you'll hear me say frequently. Some other time I may even write an article about the many ways in which Mastodon is and has been bad for the Fediverse, despite the tremendous contribution it has given to its expansion.)

There are other ActivityPub servers that are considerably more lightweight, especially for single-user (or more in general small-server) use. Pleroma is one such example, whose lightness and easy of installation ended up giving it a bad name because it became the platform of choice for neo-Nazis and related trolls in their harassment campaigns throughout the Fediverse. More recently, GoToSocial has also emerged as a lightweight server, specifically designed for single-user or few-users instances.

Being a “push” protocol, the load on an ActivityPub server depends mostly on how many people are followed by the users on the server For small servers, and especially personal servers, this is generally going to be on the low end. From a poll I've recently run on the Fediverse, the number of accounts followed by each users is measured in the hundreds, or at worst the low thousands. Even if all of them were heavy posters (say, 300 posts per day), this would still lead to 3 to 5 connections per second, easily handled by a home server.

(And that's a gross overestimation, since in the Fediverse it's not uncommon to end up following hundreds if not thousands of accounts simply because many of them go dormant after a while. There are also other cases, such as people who prefer following hashtags rather than accounts, but that's a separate topic.)

And sure, if you follow several media-heavy posters you may want to watch your disk usage if your server caches remote media files (not all servers do), but again, this does not justify designing a different protocol around centralization.

(There's another side to this, which is the infamous Mastodon stampede that can DDoS not only other instances, but in general any website, especially poorly-designed ones, as metadata is fetched for links and images to build cards and previews, turning from “push” to “pull” and going back to the beginning of this subsection. Again, this isn't fixed by a different federation protocol, but by a different way to federate said cards and previews.)

Improve discoverability

That's called a search engine. You don't need a new protocol to make a search engine, unless you want to make life easier for the search engine so that it can lower costs and maximize profits.

The reason why search on the web is going to shit isn't that there isn't a good protocol for it, it's that monetization has taken priority over providing the actual service. Kind of like the enshittification process that has destroyed all centralized social silos, which is exactly the path that BS will take a few years from now, when it'll need to start paying back the VCs to which it has indebted itself.

The reason why search engines and directories specifically designed around the Fediverse are sparse is cultural, not technical: there is nothing preventing a crawler from browsing Fediverse instances and collect information about accounts and posts, and present the collected information in a way that “improves discoverability”. Many Fediverse users though, particularly in the old guard would rather have these services opt-in (informed consent). For the same reason, a lot of users post with more restrictive post privacy than “public”.

This leads to an aperiodic re-emergence of similar situations, in which the techbro du jour sets up a Fediverse scraper that completely disregards this cultural inclination, going for opt-out (if even allowing that at all), on the excuse that “opt-in means few people will join”, and will be put through the grill for not understanding the basics of consent (flash news: if few people are interested in joining your service when it's opt-in, it means your services is not appreciated, making it opt-out is a bad idea, not genius).

This, again, is not solved by the use of a different protocol, especially not one that essentially designed around removing agency from the posters.

Improve “quality of experience”

(“Fewer dropped messages, out of sync metrics”)

This is probably the most bullshit of excuses. A separate discussions deserves to be dedicated to how both are indicative of a design intent that aims at replicating the toxic usage of social media that has become preponderant in the last decade plus, and that we should be breaking free from rather than encourage. The “dropped messages” in particular is FUD, in addition to stoking the well-known FOMO that is instrumental to the commercial social networks' manipulative approach.

And of course, neither of those really need centralization. Talking metrics, for example, even on Mastodon, which is most likely what this point is —again— taking a jab at, metrics are always accurate on the home server, that should be the only one to care about them.

«I don't know how many people liked or boosted this not-mine message.» really isn't something most people outside of the author should have to worry about, and for the infrequent cases when it does matter (research, for example) it can always be fetched “fresh” from the message home server.

Again, not an argument for centralization.

As for the dropped messages … ActivityPub doesn't “drop messages”. I assume that what the document authors are taking a jab at is Mastodon's infamously “conservative” approach to remote profile backfilling, full thread federation, and similar related issues.

Again, this is not a protocol issue, but a Mastodon issue. Even alternative front-ends to Mastodon (such as Phanpy) can give the user access to remote content with relative ease. Other server software, such as Friendica, work largely on the principle of loading remote content on an as-needed basis. There are even Mastodon forks that provide better support for importing more remote content on request.

FWIW, the partial thread federation in particular is indeed one of the most annoying misfeatures in Mastodon, and I'm looking forward to see it fixed —even though I don't think it'll resolve one of the common complaints derived from it (repeated similar replies from different people that don't see each other's replies) —simply because I often saw that same thing happening on Twitter: a lot of people reply to posts without first checking out the thread.

But one thing's for sure, you don't need a different protocol to fix it.

ATproto doesn't actually do anything to solve these issues

What's even worse is that even if these issues were actually protocol issues (which they are not), ATproto does absolutely nothing to fix those. It's not the restructuring of the network in PDS, Relay and App View that “solves” any of those problem, but the fact there is a single huge centralized node (coincidentally, the one provided by BS, the commercial enterprise) that handles them all.

You would get the same effect on the Fediverse if 99.7% of the accounts were on “the Mastodon flagship instance” (mastodon.social), the few other instances federated with it, and Mastodon did a remote-fetch when opening a thread.

The only reason why “the ATmosphere” isn't seeing any of the issues exhibited by the Fediverse isn't magic fairy dust in the protocol, but the fact that BS offers a centralized service covering all 4 fundamental components of ATproto:

  1. an identity provider (this article is getting long so I'll postpone the rant about DIDs to some other time);
  2. a not-so-Personal ATproto Data Server (PDS) (multiple ones, in fact, to distribute load, but presented as one);
  3. an ATproto Relay;
  4. an ATproto App View;

(Yes, I'm simplifying a bit here. There are other aspects such as labelling and feed generators. There would be a lot to say about this, but the only thing relevant here is how they contribute to the “let us other work for us for free” aspect.)

Now let's imagine a scenario in which ATproto takes off, BS grows to the point it can switch to the “cash in” phase, and just as it starts down the enshittification path to pay back its debts, some Big Tech competitor sets up its own ATproto service (call it BC): identity provider, hosted PDSes, Relay and App View. And of course, since in the beginning it only has few users, the Relay is set up to slurp in the entire contents of the BS PDS(es).

First question: do you think BS will let them do it? Or would you bet on the BC Relay getting blocked by the BS PDS within a couple of days, if not in a matter of hours?

(Hint: this is exactly what led Google Chat to close federation: no problem while the federated servers were small personal ones, but as soon as Facebook Messenger cooked up its compatibility layer and started crawling the Google Chat account network, Google isolated their server. That's the rug pull I mentioned already in the last post.)

Heck, even if BS decides to “play fair” and let this hypothetical BC do its thing, how much data would the BC relay need to fetch to start becoming even just moderately … I won't say competitive, but at least useful, maintaining the promise of “fewer dropped message, no out-of-sync metrics” allegedly guaranteed by the protocol?

What if this hypothetical BC experiences sufficient growth on its own PDS(es) (by migration —an as of yet a completely untested alleged feature of ATproto and BS, especially between competing, potentially hostile and reciprocally blocking hosting services— or any other mean) and decides they don't need to share that data with BS? How will ATproto help BS “improve discoverability” when they're cut out from half (or more) of the network because the BC PDS(es) refuse to communicate with the BS Relay?

(Yes, some of these topics are similar or symmetric to the ones discussed elsewhere about alternative relays and PDSes.)

So yeah, none of the claimed benefits of the protocol come from the protocol itself, and the only thing at which the protocol could be better than ActivityPub is the DID and its portability promise that still has to withstand the test of reality.

(Not to mention the centralized PLC DID directory under BS control that they promise will be spun off —while still of course remaining a centralized service: what happens when this directory service decides to block your account because your stuff has too much skin or too many female-presenting nipples? —never for the Nazi stuff, BTW, that's always OK for these services, of course. Sorry, I'll stop, I promised I will rant about this some other time.)

Seriously, if you want a truly decentralized social network where you remain in complete control of your data and connections, you'd be better off with something like Secure Scuttlebutt than with anything in the BS orbit, or with a service like Streams, that supports both ActivityPub and Nomadic Identities —anything, in fact, but that centralized mockery of federation that is BS and its AT protocol.

Stop making excuses for them

I consider the lies and misdirection in the ATproto documentation part of the course —after all, they have to sell their business to buyers, assuming federation will ever be a selling point for corporations.

But what's really surprising to me is the number of non-employees falling for it, the useful idiots they manage to get on board. Every time I see someone who is not a BS employee trying to push the idea that ATproto is decentralized I have to wonder where it's coming from, especially since the design documents themselves repeat in multiple points that the protocol is expressly designed for centralization (pardon, “Big World” Design).

No matter how your extremely cute and/or interesting ATproto-based application can claim to be independent of BS, it becomes essentially useless unless it builds on their relay. Ignoring that is either malicious or stupid, and completely misses the point of both ATproto and BS. Oh sure, you can actually design it to not depend on the BS relay: but then you'll soon discover it to be affected by all the pains of decentralization, and you might as well have built it on any other protocol not controlled by a corporation to sell their centralization services under the pretense of “decentralization”.

Since @jdp23​@gotosocial.thenexus.today woud like you readers of mine to know who I'm insulting in my warning rants against “expansive” definitions of the Fediverse or even of the concept of decentralization (because they are well aware that by no current definition of the term BS can be considered decentralized, so they need to resort to etymological jungle-gymming to make the entity fit the mold, not unlike the aforementioned libertoloids1 with the concept of freedom or voluntarity to justify the exploitation of people in need3), I feel the need to quote George Santayana here, which has some irony to it4. The following excerpts are from the first volume (Reason in common sense) of his Life of Reason, which you can read in full on Project Gutenberg.

First and foremost, his most famous and frequently paraphrased quote:

Those who cannot remember the past are condemned to repeat it.

There's more to it though, and I feel that the next paragraph makes a better point of it:

Not all readaptation […] is progress, for ideal identity must not be lost. The Latin language did not progress when it passed into Italian. It died. […] when the foundation itself shifts […] progress is not real. […] without this stability at the core no common standard exists and all comparison of value with value must be external and arbitrary. Retentiveness, we must repeat, is the condition of progress.

I was also reminded that Evan Prodromou, considered “the father of the Fediverse”, includes BS in the Fediverse, despite the fact that BS doesn't implement ActivityPub at all —something that is otherwise apparently enough to disqualify diaspora*— and the only reason it can communicate with the rest of the Fediverse is because of the bridge —which is a bit like saying that Tumblr is part of the Fediverse because Friendica can bridge to it (like it can to diaspora*).

But then again, Friendica, like most other Macgirvin projects, was always the odd-one out in the Fediverse, despite sporting much-needed features (first and foremost that nomadic identity that ATproto claims as (untested) advantage over ActivityPub), they never managed to gain much traction, and the feature themselves have remained ignored at large. And as also @jdp23 points out, this was one of the excuses used by the ATproto developers to recuse ActivityPub5 —which of course, as excuses go, it's a pretty piss-poor one, since once again it doesn't justify building an entirely new, untested, protocol from scratch rather than helping the convergence of the existing ones (which is technically possible, if there's a will to do it).

(Of course, we know why they didn't go that way, and in this case it's probably not even CADT, but a matter of intellectual property, budget justification, and all the familiar reasons that fuel the worst in corporate-driven “open source” development.)

Anyway, I'm not entirely sure what Prodromou's endorsement of BS is supposed to change. Am I supposed to be impressed by an appeal to autority? For an apt comparison, Tim Berners-Lee (the Father of the Web) —for whom I have way more esteem than for Evan— completely burned all credibility when he endorsed the W3C “standardization” of Digital Restrictions Management for the free and open web they were supposed to promote.

This kind of regulatory capture is exactly what's on the horizon for the Fediverse, as I've been saying for a while now (unsurprisingly, Prodromou has my Fediverse account blocked since the time I started actively posting to warn about said threat to the Fediverse). Why exactly should I waste my time mincing my words? I know where his enthusiasm is coming from, and still think he should do better than seek for recognition via the numbers game.

The bait

With this all being said, as I mentioned before, I'm convinced that BS will be generally welcoming not only of third-party “applications” (especially in the beginning, where they have to build momentum), but especially (and probably for longer) of external PDSes.

One reason that I haven't mentioned yet (OK, that I teased at the beginning) is that it gives them more material for when they'll start looking into the SALAMI grift for cash. Thanks to the unavoidably liberal license that users grant them on the content they share on the network, they'll get free hand on using it as source material to train their models. Even better, when they start on this path they'll be able to do that without any public announcement to ask for permission, since they can formally delegate that to a submarine spin-off or pretend-independent initiative that just happens to be build an “App View” (read: a leech on BS' relay's “firehose”) that is a SALAMI training setup, and by the time it'll go public it'll be to late to do anything about it.

Although this isn't a direct threat to the Fediverse by itself (in fact, it might even encourage more people to finally appreciate the value of an independent internet, and give more weight to the benefits that come from the pains of federation), it poses an interesting question about the viability of the bridge, given the generally higher sensibility users on the Fediverse have around these topics. Keep the bridge open, to encourage people to move to the Fediverse (the SALAMI grift is one of those enshittification steps that can really help drive out the artists, for example), but potentially providing data for said grift, or start cutting it off to protect against the grift, but making it harder for potential fugitives to migrate?

(On the upside, the approach adopted by the Fediverse bridge to rely on follow relationships as sign of consent gives the network members some control on what and when data flows through the channel. Of course there's people who'd like to change that, but hopefully if this does change it'll only be on BS side.)

Threads as the “backup plan”?

All things considered, at least until the famous account migration finally gets “battle-tested”, there is only one sense in which ATproto is fundamentally different from ActivityPub: it's more corporate-oriented. While this is mostly something that matters only “server-side”, it will most likely reflect in the culture that will develop in its space, which will be a clear positive for people with a specific mindset (to wit, the kind of people that find the mutual aid posts in the Fediverse “offensive”): there's no doubt they'll see the protocol as “better”: not because it does decentralization better —it doesn't; in fact, it does it much worse— but because they'll be able to look at it as a platform they can exploit to their benefit —at least until they get bitten in the back by BS' own monetization about-turn.

(… taking their users with them. See also Cory Doctorow's take on why he's avoiding BS, and for balance Molly White on the fact that she's going to establish a presence there anyway. Fun fanct: since he's published this article, I've been seeing more posts with negative takes on Doctorow or some of the terms he has popularized, like “enshittification”. Which may be just a coincidence, or maybe he just touched a lot of exposed nerves with this one.)

The measure in which this is a threat to the Fediverse is proportional to the amount of people BS manages to convince of their lie about the “decentralized” nature of the ATmosphere (with them at the center, of course). As I mentioned, the reason this is a problem of the Fediverse is that it projects the false impression that every single issue people come across in the Fediverse are a specific limitation of that particular network, rather than, as they are for the most part, a general issue with decentralization. And while it is true that the Fediverse has issues of its own (among which, sorry to say this, the predominance of Mastodon, but I recommend reading some of the stuff written by @trwnh​@mastodon.social to get an idea about all of the corners of the ActivityPub specification that needs some solid work, or the discussion up on the SocialHub to see how discussions about them are handled) the ones most commonly encountered are only solved in BS because of the centralization around their corporate nodes, and not by virtue of their choice of protocol.

That being said, I would greatly enjoy to see a couple of Big Tech corps pick it up just to enjoy the fantastic mess that will emerge from them tripping over each other in an effort to steal each other's lemons fodder esteemed users to maximize the profit from the leeched data resale. But I suspect this won't happen any time soon, so we'll be left to suffer through a constant stream of lies about the untestable claims of ATproto's superiority.

So what does this have to do with Threads?

At least in theory, the attention that Threads has been giving to ActivityPub through their (partial, opt-in) support for federation could just as well extend to ATproto. (After all, who better than them is in a position to do that, given their extensive computational resources?) I doubt this will happen any time soon though —if ever at all— and the ATproto being still in infancy and largely untested (especially in terms of federation), not fully specified, and controlled by a direct competitor are only the first reasons the come to mind.

(FWIW, ActivityPub itself is substantially underspecified, but it at least enjoys several years of practical experimentation that has defined de facto solutions for some of its less clear points.)

How much is Threads actually interested in federation, though? Regardless of the good intentions of the engineers working on it, does the genocidal parent company actually think it's a goal worth pursuing, or are they just looking into the bare minimum to work around any constraints that would otherwise come from the European Digital Markets Act?

(By the way, this is also another reason why they are unlikely to federate with BS via ATproto: it's more strategic for them to let BS grow until it can be declared a “gatekeeper” (for its own network) under the DMA, and then look into opportunities to exploit there.)

There has been a missed opportunity with the more recent exodus from Xitter. BS has clearly benefited from this (that bump in the stats is incontrovertible), and while a smaller bump was also observed in the Fediverse (visible on both the FediDB and Fediverse Observer stats), there's little doubt that it (the bump, but also the Fediverse in general, and Mastodon in it) have seen much more fanfare compared to the 2022. In all this, not only it's not clear how many users Threads itself has gained from this migration (it's surprisingly hard to get any decent historical stats of Threads usage —or at least, I haven't been able to find any), but it's also quite clear that the company hasn't been exactly pushing hard on its support for federation with Mastodon (let alone the Fediverse in general) to appeal to fugitives.

There are obviously several possible reasons for this, other than the obvious “Threads is only paying lip service to the Fediverse, doesn't really see it as a selling point now that the novelty has worn off, and wants people to end up on Threads, not join the Fediverse in general, so of course they won't do anything that gives the Fediverse credibility”.

For example, there's the fact that federation as a concept in general is still widely unknown and misunderstood (which is how BS can sell their claim to be decentralized), and this widespread ignorance contributes to it not being a selling point (aside from the FUD about how “it's complicated” that BS rides to claim they do it better).

And while it's also quite possible that the reason for the silence is that there simply haven't been any significant changes to the federation support since the end of August, it's telling that they don't seem to be playing this card at all, even in face of the threat of migration off to BS due to moderation issues (something to the tune of “If you don't want to join Threads, at least stick to the Fediverse so you can still follow the people you want to follow that are on Threads”)

Telling, but not surprising, also because those moderation issues keep hitting denouncing Trump's Nazi sympathies “by mistake”, while hate speech from the other side gets promoted “by mistake” —this of course has absolutely nothing to do with the hiring at Meta of one of the authors of the neo-Nazi Project 2025 driving current Republican politics— whereas the rest of the Fediverse is strongly left-leaning.

Talk about Threads federation being a taint on the Fediverse —and then people are surprised by the . But don't worry. In the mean time, Meta has joined the Social Web Foundation, putting itself in a more solid position to manipulate the future development of the protocols and platforms on which the Fediverse is built, steering them towards more corporate-friendly pastures and away for surveillance-capitalism-resistant initiatives.

(Of course, this is still only the preparatory stage: we're barely at the beginning of the Embrace step in EEE. What really matters it what will come next.)


  1. as I've come to call right-wing libertarian philosophies and self-proclaimed idealists, not to mention those “free-speech absolutists”, that consistently turn out to mean “you should be forced to hear my despicable speech, while I should be able to shut you off for speaking a truth that makes me uncomfortable”. ↩  ↩

  2. as an example of the uncertainty around the Fediverse numbers, Fediverse Observer reports over 14M total users compared to the 11M reported by FediDB, across less than 22K servers compared to the nearly 30K reported by FediDB. ↩

  3. if you find the comparison with libertoloids offensive, I'm not even sorry. I've honestly had enough of anyone assisintg exploiters and abusers in their play on potential semantic ambiguity to justify their exploitation and abuse, and doing it for corporate propaganda isn't any better. ↩

  4. George Santayana made no mystery of his racist and eugenic views, as do most libertoloids. ↩

  5. yeah, I know that's not his wording: he's way more diplomatic than I am. ↩

20,000 of these epochs

Unusual technological anniversaries.

I was just made aware that today is the 20,000 day since January 1, 1970. If you don't feel like downloading, building and running ud, the “UNIX day calculator”, you can check the current UNIX day (with some approximation) using standard command-line tools:

$ echo $(( $(date +%s) / 86400 ))
20000

Explanation: date is a tool that prints the current date. You can choose the format in which the date is printed using the command-line +format syntax, and %s is the format specifier “number of seconds since the epoch”, aka Unix Time where the epoch on UNIX systems is January 1, 1970 at midnight UTC. If we take its value and divide it by 86,400, which is the number of seconds in a day, we get the number of days since the epoch. (The $(( ... )) syntax is used to make simple mathematical operations with integer values directly on the command line.)

Fun fact, the same tools can be used in reverse to determine the date of any particular Unix day. For example, if I want to know which was the 10,000th day in Unix time, I would do something like

$ date -u -d@$((86400*10000)) +%Y-%m-%d
1997-05-19

Here we're passing date the parameter -u to tell it to use UTC, and -d to tell it to give us the date at the given time, rather than now. And we want this time to be specified as Unix Timer (which is specified by the @), with 86400×10000 meaning “10,000 days since the epoch” (again, 86,400 being the number of seconds in a day). The +%Y-%m-%d specification tells date to output the given date in year-month-day format.

Now, as I've mentioned before, if you're nerding out on dates there's not much of a point on getting excited on numbers that are only interesting in base 10. Especially when the next such interesting number would be on February 20, 2052, neary 28 years from now.

On the other hand, we're even worse off with power-of-two numbers, since 214 days was on November 10, 2014, while 215 days will be on September 19, 2059.

Damnit.

Sparkling wok, episode 2

Can the Wok sparklines be improved?

As I mentioned when I first introduced them, the sparklines I've introduced for the index pages of the Wok are … satisfactory. I like them better since I've enhanced them with metadata that becomes visible on hover, but I'm still not entirely satisfied with them, and from time to time I consider revisiting the idea.

Rather than the presentation, though, what I'm now rethinking is “what should the sparklines represent?” As I mentioned for the time being I've opted to use git commits as a proxy for activity on the Wok in general. This works reasonably well for the top-level index, but it becomes a weaker proxy in the individual categories, where I may not be as interested in considering minor fixes (typos, tag case adjustments, and the like) whose commits at times span multiple categories simply because I've opted to introduce a similar change across the whole Wok, but no category-specific content was added or modified.

(And of course, that's without considering the commits where I update the sparklines themselves, which luckily don't affect the category indices at all, only the top-level one.)

An alternative approach would be to build the sparkline from the date and updated metadata of each post that has these fields. This would give sparser sparklines, even possibly too sparse, as it would miss intermediate commits of drafts that I've worked on over several days, something which I often do for longer content (there's articles and other works that have been sitting around as drafts for years now). On the other hand, for readers it would make more sense as it would reflect when new content deemed significant was added to the Wok.

On the plus side, the data itself is trivial to get, and doesn't even need git. It would be something like this:

grep -h -r -E 'meta (date|updated)=' *
  | cut -d'"' -f2
  | cut -f1,2 -d-
  | awk '{ c[$1] += 1 } END { for (v in c) print v, c[v]; }'
  | sort -n

Of course this still needs to be converted to the HTML-interactive sparkline, which we can do ripping the logic I had implemented in my git chart. And since we're going through awk anyway, we might as well do it all there. This requires some care, because we still want to process months with zero data, which isn't included in our array. This means that within awk itself we must process dates in sorted order, filling gaps, and we need the maximum value to scale the counts.

Both of these can be achieved by sorting the c array, values and indices, in two different steps. We don't want to destroy the original array, and we only sort by values to get the maximum, so we can recycle the “sorted” array, with something like:

len = asort(c, dates);
max = dates[len];
asorti(c, dates);

which now gives a sorted array dates, that we can traverse to get the commit dates (in year-month format) from the oldest to the most recent. An iteration of this arrays allows to easily get for each date the number of commits and the scaled size:

for (i in dates) {
    date = dates[i];
    count = c[date];
    scaled = int((8*count + max - 1)/max);
    # TODO output date, count and scaled here
}

with the caveat that dates with no commits (that would give a null count and scaled) are not represented.

Since we do want to fill the holes, instead of iterating over the dates array, we can use a slightly different logic: we fetch the year and month of the start of the series, and the year and month of the end of series. Then we simply step through each month, switching to the next year when necessary. This also integrates well with the logic we will need to open and close the year blocks in the output HTML, which we assume is managed by some beginyear() and endyear() functions.

Getting the first year and month in numeric form can be done with something like this

split(dates[1], ym, "-");
firstyear = 0 + ym[1];
firstmonth = 0 + ym[2];

and similarly for the last. The logic is then something like the following:

year = firstyear;
month = firstmonth;
beginyear(year);
while (1) {
    date = sprintf("%4d-%02d", year, month);
    count = c[date];
    scaled = int((8*count + max - 1)/max);
    output_block(year, month, date, count, scaled);
    if (year == lastyear && month == lastmonth) {
        break;
    }

    ++month;
    if (month == 13) {
        endyear(zwsp);
        month = 1;
        ++year;
        beginyear(year);
    }
}
endyear();

where zwsp is a constant holding the zero-width space we use to allow wrapping between years, and output_block() is the function that prints out the Unicode block element for the given value, with any HTML metadata attached.

To make the awk script a bit more generic, we can make it a little more aggressive in the “capturing” phase. Instead of a simple { c[$1] += 1} which expects input in the form

YYYY-MM optional junk that will be removed

we can make it seek for anything that looks like YYYY-MM with something like

if (match($0, "[0-9]{4}-[0-9]{2}")) {
    date = substr($0, RSTART, RLENGTH);
    c[date] += 1
}

This is possibly a bit too aggressive, but allows us to pipe anything that outputs a date per line to the script, and get the HTML sparkline for the counts of lines grouped by year and month. You can find the complete script here, and I can use as

git log --pretty=format:%as | ./sparkline.awk

to get the commit-based sparkline, and as

grep -h -r -E 'meta (date|updated)=' * | ./sparkline.awk

to get the date-based times.

It's fascinating to see the difference between the two. At the moment, the commit timeline looks like this:

                                     

while the dates sparkline looks like this:

                                                                                                                   

A significant contribution to the difference is that some of the articles in the Wok are much older than the Wok itself, since they were “revived” from my older blog(s) hosted on now-defunct platforms. For the most part though, in the common years the two sparklines are quite similar, except for a few nodes where there's a distinct difference between the number of commits and the number of posts, highlighting times where there was significant “background” activity (revisions, stylistic changes, and the like) that didn't affect content in a meaningful way.

It should be noted that neither sparkline is actually particularly precise in indicating my activity, since they both skip days where I work on the Wok (or its content) but don't commit the work nor publish a new or updated article (for example, this one took two days to write, but will contribute at the moment of publishing only one commit and one date).

I'm still uncertain about which sparkline to keep as “main”, and I'm actually wondering if I should keep both. However, I suspect that may be too heavy —maybe a separate dedicated page for the stat curious (and myself)?

One thing's for sure, I now have the material to regenerate the sparklines at build time, which should allow me in the near future to remove the ones committed to the repository.


OK that was fast. I have now replaced the “committed” sparklines with some autogenerated ones. The build scripts (both the local one on my machine and the one on the server) have been updated to call the sparkline update script, that now generates both the commit and date sparklines, although the date one is hidden by default.

Ikiwiki has a “transient” page feature for autogenerated pages, but in my case, at least at the moment, these sparklines are not pages on their own, but rather snippets to be inserted in other pages (currently just the index pages, but in the future possibly also in the promised stat page(s)). For this reason, I'm currently abusing the template system instead, which also ensures me that the inclusion of the sparklines does not generate additional markup.

A possibility to show the date rather than commit sparklines, and the stats page, remain as future work.

Sparkling wok

A sparkline for the commit history of the Wok, and the challenges it poses.

I've been working on the Wok since mid-2010 (the first commit in the git history was on August 13), even if it was officially made public only on January 1, 2011.

Since then, I've been working on-and-off it for 14 years, with periods of frequent activity and others where months went by without anything going on. I have an approximate idea about the ebbs and tides of these period of activities from the commit history of the repository holding the Wok: it's only approximate because the history only tells me when I committed something, and at times I have worked on something for days (or even longer) before adding it to the repository (it's not very wise, because it's easier to lose work, but for these writings of mine I tend to commit only when something is ready, in most cases1), but this is something that I'm still curious about that I've actually written a git chart command to plot the commit frequency history for me.

I've been thinking sporadically about adding this plot to the Wok itself, in a way that isn't too prominent, and I think I've finally found the right idea in the form of a Tufte-style sparkline that you can admire on the main index page. To produce this sparkline, I extended my git chart command to produce a sparkline using Unicode Block Elements: after grouping the commits per month, each month with at least one commit prints a block ranging from the one-eighth block ▁ to the full-height block █ (a no-break space is used when no commits were made the whole month), giving us something like this (note: the sparkline in this post will not be updated, in contrast to the one in the home page):

▃​▁​▂​▂​▄​▇​▃​▁​▂​ ​▄​▂​▁​▁​▁​▂​▄​▃​▂​▂​▇​█​▂​▄​▄​▆​▃​▂​▁​▅​▃​▂​▁​▁​▁​▂​▂​▂​▁​▁​ ​▂​▂​▁​ ​ ​▂​▃​▁​▂​▁​ ​ ​▁​ ​▁​▁​ ​▁​▂​▂​▁​▁​ ​▃​▁​ ​▁​▁​▁​▁​▂​ ​▁​▁​▁​▁​ ​ ​ ​ ​▁​ ​▁​▁​▁​▁​ ​ ​▁​ ​ ​▁​▁​ ​ ​ ​ ​▁​ ​▁​ ​ ​▁​▁​▁​▁​▁​▂​▂​▁​▁​▁​▁​▁​▄​▁​▁​▁​▁​▂​▂​▁​▂​▁​ ​▁​▁​▁​▁​ ​▂​ ​▁​ ​ ​▁​▁​ ​ ​▁​▁​▁​ ​ ​▁​ ​▁​▂​▂​▂​▂​▂​▁​▁​▁​▄​▁​▁​▂​▂​▁​▁​▁​▁​▂​▁​▂​▅​▂

The results are … satisfactory, but I must say I'm not entirely happy with them. The sparkline serves the purpose of highlighting the overall trend of commit behavior, but it lacks any kind of temporal reference. In console, git chart also outputs some kind of temporal axis with each year spanning 12 characters, and I could add it to the HTML, but this poses a problem when the sparkline doesn't fit in one line: the sparkline and timeline would need to be interleaved. This is actually possible to achieve in CSS, playing around with line height and positioning, something like:

.interleave { position: relative; padding-bottom: 1lh }
.interleave p {
  font-family:monospace;
  line-height: 300%;
}
.interleave p::first-child { margin-top: .5lh }
.interleave p+p {
  position:absolute;
  top: 0;
}

but getting things to wrap at the right place across browsers is non-trivial (even with word-break: break-all) and adding centering to the business only makes things even more fragile.

After getting to all the trouble of trying to make it work, I realized that actually adding the timeline under the sparkline would have made it too heavy anyway, so there wasn't much to gain by actually doing it (in a sense, the sparkline alone is more elegant) but at the same time I have a feeling of dissatisfaction: it could convey so much more information —even if hidden at first sight!

There are many ways in which such information could be conveyed.

For starters, instead of allowing wrapping after any block with a ZWSP, the zero-width space could be added only after each (solar) year.

Furthermore, each solar year could be wrapped in a span or abbr element with the year as title. But once we do that, why not highlight the block and make the actual year visible as an overlay on hover? Why not provide a tooltip for each month with the actual number of commits there?

And while we're at it, why not add a sparkline to each subsection of the wok, to see when that particular section was worked on? (OK, looks like I'll have to get myself scripting …)


UPDATE: a few hours later, I can say that making the sparklines interactive, and available for all category index pages in addition to the home page, wasn't too much effort. I did have to hack my git chart script to produce HTML output basically tuned for the Wok, and I did create a shell script that allows me to update all the index sparklines because it would be exceedingly boring to do it by hand, but what I wasted the most time on has actually been … styling, which is admittedly something I generally suck at in general, but at the same time fascinates me, because of the challenge of (ab)using CSS to get the results I'm seeking (which is still not always possible).

What we do have now is a “year highlight” on hover, with the year specified below, and an HTML+CSS tooltip showing the actual value (number of commits) for each element of the sparkline (in addition to the browser's own tooltip in the form YYYY-MM: C where C is the number of commits). It's probably overdone, but I actually like having this information at hand.

The only “real” downside is that the last element of some of the sparkline is bound to be potentially off by a few commits, unless I run a sparklines' update after each commit —which I don't plan to do. There are other ways I could explore (such as auto-generating an XML on the server, and then importing it via XSLT like I do in my profile page for the RSS links, or like I've discussed here), but honestly I think I'll be OK with the minor discrepancy: as I've already mentioned, it's only an approximate indicator of my Wok activity anyway, so there's no need to worry about it being exact every time.


  1. the most significant exception in this case are some long-form artistic endeavours for which I do commit the draft updates even when the story isn't complete. ↩

Figured figures

Bringing “semantic” figures to IkiWiki and the Wok

I've mentioned before that I'm not satisfied (anymore) with for the Wok, but also that I haven't found an appropriate alternative yet. The net result of this is that I find myself pushing the limits of what IkiWiki can do, working within the limits of the system when possible, and hacking at it when not.

Last night I went through the former to bring more “semantic”, “modern” figures to Ikwiki, using the appropriate tags from (figure and figcaption) instead of hand-rolled classes for nested divs and ps.

I don't actually use a lot of images in the Wok, although things have started to change recently, and even if I haven't exactly shied away from hand-coded images, especially in the Mathesis, Ars and Ludica categories. In the more recent instances, I had actually already started manually wrapping figure tags around the standard IkiWiki img directives (for “internal images”) or hand-coded HTML for external images, in place of the (again, manually placed) div tags I was using (much) earlier, so the intent now was to automate it all.

I would have loved to replace the img directive with a figure directive that did the job for me, but IkiWiki doesn't really offer a way to define new directives without writing plugins, so I went with the closest alternative, which is templates. The only downside of this is that instead of a more lightweight [[!figure  …]] syntax I have to use the more convoluted [[!template id=figure …]]. I can survive.

The template is designed to be “flexible”, in that it should be usable in place of both the hand-coded HTML for external figures and in place of the “native” IkiWiki img directive —and even, for the few cases where this is necessary, for the instances where I use obj in place of img, which is for SVGs that would not render correctly as plain images (e.g. because of interactive features, or dependency on external resources).

What the template does is to open a figure tag, then choose “what” to include based on available parameters: if an img parameter is available, then an img directive is re-created (passing through any size, alt and title parameters to it), otherwise an obj directive is re-created if an obj parameter is available, and finally some hand-coded HTML is inserted using the src parameter (which should be present in this case), to produce an img HTML tag wrapped in a link (to the image itself, unless an href parameter is specified).

The nice thing about the template is that it also automatically reuses the title attribute not only for the image (or object), but also for the figure caption, which covers the most common cases. Of course it's possible to override this by specifying a caption attribute instead.

I've then gone through all of the uses of the img (and obj) directives and hand-coded HTML through the source of the Wok, replacing them with the template. I could have actually automated most of this, but I opted for doing it manually, taking advantage of the process to review many of the alt texts for the older images, turning them into proper descriptions rather than copies of the title attribute as they often were.

(I have the to thank for the deeper understanding I have now of the role of the alt text and how it should be used.)

The end result is that while the markup for the figures is now generally more compact than before (which was by and large my primary intention), sometimes the source files have grown larger simply by virtue of the new alt texts being longer than the bytes gain by simplifying the markup. At least it's a useful “weight” gain.

There is only one use case that the template doesn't cover yet, and it's the case of the picture element, that has been introduced in HTML5 to allow more sophisticated specifications of the images to use. I actually use this in the Wok as part of my initiative to support JPEG XL: the few binary images that are present on the site have both a JPEG XL and a (heavily crushed) PNG version, and they are placed in figure elements preferring the version and using the as fallback.

(I've actually had to hack IkiWiki for this, exporting the “best link” feature in a directive to ensure that links to the images in the source srcset attributes would resolve correctly both for main pages and for inlined content.)

A directive or template to replace the whole picture construct in the general case would be overkill for the Wok, so I can probably work something out for my specific use case (two images, same name but different extension, prefer JXL, fallback to PNG, although I should probably allow a JPEG fallback too). A template would be preferrable (hacking the IkiWiki source for something this specific doesn't feel like the right way), but at the same time the template system doesn't allow extensive string manipulation (for the curious, it's based on Perl's HTML::Template module), so it'll require some funky syntax if I want to support both PNG and JPEG as alternatives to the JPEG XL.

And of course this would have to be separate from the figure thing, since e.g. in two of the three uses (the 7SEEDS review and my Italian article on the unculture culture, the third case being my “Fediverse debate” comic) there are two pictures in the same figure. (Because . It's always 7SEEDS that makes it, for me.)

All in all, it doesn't seem worth it to look into this yet. At least until I start writing enough to overcome the image-to-text weight ratio issue.

Multilingual Ikiwiki

Some notes on making the wok properly multilingual

Introduction

The Wok is multilingual. Until recently, I didn't particularly bother about specifying the language of any article I wrote (curiously, with the exception of the one about writing SVG by hand, which is one of the first —but not the first!— of my English-language articles).

At first, this was because most of it was in Italian, but as English content grew, I started pondering about it. Curiously, does have some support for multi-lingual content, through its po plugin, but it's geared towards the original scope of the platform (wikis), as a means to provide an interface to content available in multiple languages (i.e., basically, translated versions of the same page or article).

This is not what I need, which is instead a way to mark independently the language used for each page. Interesting, I'm not the only one with such a need, but the feature is still not readily available in IkiWiki. And still, it may be considered a rather important feature since, per the WCAG, pages are required to have a programmatically-discoverable language, and adding the relevant attribute (lang) is the way to do it.

So, today, I finally took the plunge and started to work on adding language declaration support in IkiWiki. This is currently implemented in the fork of IkiWiki I run for the Wok. I may even propose it for upstreaming one of these days.

The interface

Declaration of the article language is integrated in the meta plugin, consistently with the author, date, title and most user-managed article metadata: trivially and unsurprisingly, you specify the language with [[!meta lang="lang-code"]] where the lang-code is the language code (see MDN for details).

(My implementation currently doesn't validate the language code, although it does restrict it to upper- and lowercase basic Latin letters, numbers, and the dash sign.)

The language code is exposed to the IkiWiki template system as PAGELANG, but the default template hasn't been changed to accommodate for it (yet). On my custom templates, I use it to add the appropriate lang attribute to the article tag for both main and inline pages. And this seems to be sufficient.

Tagging the Wok

The next step was obviously adding the appropriate language tag to all existing Wok articles. Since these now count in the hundreds I had no intention whatsoever to tag them manually myself, so I've relied on a couple of scripts.

The first is a trivial Python wrapper around the langdetect Python library, appropriately called langdetect. There's definitely room for improvement, but the following was sufficient for me:

#!/usr/bin/python3

import sys
import langdetect
langdetect.DetectorFactory.seed = 0

if len(sys.argv) > 1:
    for file in sys.argv[1:]:
        with open(file, 'r') as f:
            print('{}\t{}'.format(file, langdetect.detect(f.read())))
else:
    sample = sys.stdin.read()
    print(sample)
    print([ (l.prob, l.lang) for l in langdetect.detect_langs(sample) ])

This prints the file name and the best guess at the file language for any file passed on the command line, or candidates and their probability of the piped text if text is piped through it. Of course, for our purposes we only care about the first function.

We need another script to go over all of the documents in the Wok source, and add the tag for the guessed language. I rolled my own as:

#!/bin/sh
find -name \*.mdwn | while read fname ; do
if grep -q -F '[[!meta  lang=' "$fname" ; then
    printf '%s SKIPPED (lang already set)\n' "$fname"
else
    if grep -q -F '[[!meta author=' "$fname" ; then
        lang="$(langdetect "$fname" | awk -e '{print $NF;}')"
        printf 'setting %s to lang %s\n' "$fname" "$lang"
        sed -i.nolang -e '/\[\[!meta author=/a\[[!meta lang="'"$lang"'"]]' "$fname"
    else
        printf '%s SKIPPED (no author)\n' "$fname"
    fi
fi
done

This isn't the fastest, but it does its job.

The downside? Unless your articles set their publication date manually, this will mark all of them as updated, in random order. Which is going to wreck havoc to some inline page ordering.

In my case, I've gone through all the pages that don't set their date, and updated it manually. Not my happiest moment.

Moving to HTTPS

Some notes on my journey to move to HTTPS for the Wok and related sites

Some of you may have noticed that the Wok is now accessible via HTTPS. I expect this may break things here and there (in fact, I've already had to patch up stuff, especially in older articles, which has been an interesting dive into the past), but the transition so far has been going surprisingly (for me) smoothly.

So, why?

I actually tend to lean towards Dave Winer's position concerning HTTPS, although possibly somewhat less extreme. It's a nice-to-have with its valid raisons d'être, but most of those don't actually apply to an open-access, passive, statically generated site such as this.

(And honestly, even most of the claimed benefits of HTTPS are not what they are claimed to be. Just to mention one of my pet peeves, social networks: does HTTPS being used to serve them guarantee in any way that my Mastodon profiles are “genuine”, for any meaning of the word? Of course not. Any administrator with access to the database can pretend I wrote things there I never wrote, for example. And the situation isn't any better —if anything, in fact, much worse— for content served by Big Tech. Some more notes on my position: 1 2 3 4 5 6 7 8 9)

(Maybe I should have called this section “Why not?”)

So, why did I ultimately choose to add it? You'll never guess why.1

So, how?

I took advantage of the Let's Encrypt initiative for the certificate management: installed certbot and its GANDI plugin (since that's the registrar for my domain) got a certificate for oblomov.eu and then realized I needed one for the subdomains too, and because it was late at night and something I'm not familiar with, I ended up issuing a new one instead of just adding *.oblomov.eu to the existing one, which turned out to be a problem for access to my “main” site, and finally found how to add domains to a certificate, so added oblomov.eu to the new certificate and revoked the first one, and then had to convince Firefox that yes, the certificate had changed (it's possible I did something wrong in these final steps).

For the webserver configuration, I used certbot itself to configure HTTPS on my main site, after which I hacked up the configuration files so I didn't need to duplicate stuff for the HTTP and HTTPS virtual hosts (the trick here is to put the actual directives in a separate file, which is then included from from the :80 and the :443 virtual hosts specifications).

My intent was (and remains) to keep my sites accessible both via plain HTTP and HTTPS (for reasons), but I had to ensure that when going through HTTPS it would be “secure” (meaning: no loading of plain HTTP resources). This is generally easy to achieve, by simply omitting the protocol part of URLs (which means the browser will use whatever protocol is currently being used for the page). I was not happy however to find out that this isn't possible to the fullest extent I intended, since —for example— links in RSS and Atom feeds (or in sitemaps, for example) require a protocol, so one has to decide on a canonical protocol. I've ultimately decided to make HTTPS the protocol for canonical URLs, to save Egyptian politicians from MITM attacks, but I'm not happy about the decision.

I'm wondering if it would be worth looking again into the Gemini protocol now that I have certificates and everything, although there's still the issue that Gemini clients expect Gemtext and don't play nice with HTML, (and I'm not really looking forward at “Gemtextifying” the Wok, nor does Ikiwiki support Gemtext as an output format, even though it shouldn't be too hard given that the source is mostly plain Markdown).


  1. The Gamepad API requires a secure context, and I wanted to add gamepad support to Finger Maze. Of course, once I had to get a certificate for one subdomain, why not go the whole way? ↩

The unbearable lightness of text, part 2

OK, maybe some times I _do_ play favorites, but that's not really the reason why 7SEEDS got a special treatment.

I have previously discussed how I try to keep the Wok “lightweight” by design, and how a big part of this is achieved thanks to a distinct lack of images and other so-called media files (audio/video in particular) —at least until recently.

Things first changed with my first comic, but not enough to get me convinced about the opportunity to add screenshots to my game review. In fact, before today not even my review of 7SEEDS had images, despite my unbounded appreciation for the manga, nearly bordering the obsessive —although to be fair the review itself focuses on everything but the art of the manga in question, so the absence of even a small sample wasn't even that out of place.

If you visit the review now, though, you'll notice that it does feature images in its updated form. Yet curiously, these are not in the section where art is discussed (and yes, they are samples taken from the manga, so it might make sense to discuss them there). Even curiouser, those images have not been added to the Wok for the review, but intended for a largely unrelated article (in Italian) derived from a recent Mastodon thread by yours truly.

In fact, those samples were originally intended only for the Mastodon thread —as I mentioned on the previous part of this series, I am considerably more liberal in adding images to my toots, although I still have a preference for plain text there too. When the time came to collect the thread and massage it into a more presentable article, I had to make a decision on what to do about those images (aside from regenerating them from the original images, given the questionable way in which images are processed by Mastodon).

My approach to this was to recreate the images from the sources I had at hand, and evaluate their inclusion based on size, plus some additional considerations that I'll present below.

As it turns out, the original scanlation that I got the panels from had the images in format. This allowed me to crop the panels I was interested in from the comfort of whatever image editor I had at hand (otherwise, I would have had to use something like jpegtran's lossless crop for , with a … less optimal UI, and a possible impact on the crop region position and size due to the limitation of DCT boundaries for this operation to not introduce additional compression artifacts).

After appropriate crushing to minimize the PNG file size, and an additional pass through optipng to strip unnecessary packets introduced by the image editor, I was left with a 234K file for the larger crop, and a 37K one for the smaller one. (For comparison, a reminder that my first image is 134K in PNG format, and 40K in JPEG XL.)

Fun fact: these images have, apparently, only 17 distinct colors. This is frustrating: had it been one single color fewer, the color palette could have been reduced from being indexed in 8 bits to being indexed in only 4 bits, packing two pixels per byte in uncompressed form, with likely a compression advantage.

By the way, since these images use palettes, there's a possibility for improving compression by reordering the colors in the palette although I'm not aware of any common-use free software program that actually does that. (If anybody does know of one, I wouldn't mind a heads up.)

Of course this would be more interesting to see in action on the larger crop, rather than the smaller one, given the respective file sizes, and of course there are a lot of other approaches that I could have taken, including more aggressive cropping (e.g. by include only Takashi's response on the second image, or Kiichi's question on the first one), or even overcoming my “loss aversion” and going with the removal of a single grey tone.

In fact, even adoption of a lossy image format could have made sense, since who knows how many manipulations the image has gone through before becoming the “source” for me, and even before that, considering the fact that the dithering that realises some of the halftones is more likely to be a “print and scan” artifact than the artist's intention. But —what's the best way to put it?— I am probably even less willing to take on any responsibility for the degradation of digital data than I am to consider uploading larger files to my website.

To make things even worse, the version of these images, while smaller than the PNG, are not exceptionally so: about a 30% gain, compared to the massive 70% reduction for my parody comic (to be completely fair, I didn't know about optipng when I did the latter, but now that I've tried, that would only give me an extra 1%, so I won't actually update it —not worth the extra space in the repository.) Given my interest in promoting the new image file format, adding these images meant adding nearly 460K to the repository: 10% of the entire source tree!

The biggest argument in favor of allowing these images in was (unsurprisingly) the fact that they could be (re)used in my review of 7SEEDS, and to get an idea of how good of an argument that is, consider that the review itself, in source form, weight at 72K, 10K more than the PNG and JXL versions of the second image together. Add in the 25K for the source of the article the images were intended for, and the size of the images, while still large, becomes less impressive.

(And I'm not even counting the size of this article, which I might have not written if I hadn't decided to add the new images, and yes, I did consider that I would be writing it, when making the decision, even if I had no idea how big it would turn out to be —and that's nearly an additional 15K, as things stand now.)

Interestingly, even at 234K the largest image in PNG format is not the largest file in the repository. This is also counted as a point in favor, even if the only text file still larger than an image is a 50,000-words long incomplete draft of a long-form fictitious prose narrative (“aspiring novel”, maybe?) —and yes, it's in the repository, but no, I'm not ready to share a link to the draft just yet.

The second-longest text file, that now sits (in size) between the PNG and JXL versions of the images which are the topic of this article, is 232K (barely 2K less than the PNG monster), and is also an incomplete draft of a long-form fictitious prose narrative, currently standing at over 40,000 words, and that could easily pass the infamous PNG giant if I ever get back to work on it. And this is just Part #1 of a series I have planned spanning at least four parts plus at least a prologue.

Now, I'm not going to deny that my passion for 7 SEEDS might have biased my decision to include the images, and it's quite possible that every excuse I've come out with is just an attempt at rationalization.

On the one hand, if the passion for the manga was the prime mover, the review would have already given me an excuse to add the image a couple of years ago, especially considering that the long incomplete drafts that compete with the images in size had already been in the repository for years when I started drafting the review. On the other hand, maybe two years ago I wasn't ready yet to consider including images into the repository yet, and the necessity to self-host my parody comic had to manifest, if not else to allow me to reconsider my opinion on hosting raster images at a more personal level.

On a third hand, there's also the fact that for the review I would have picked different images, more representative of Tamura Yumi's art, which would have meant looking at larger hosting requirements, not unlike the ones for the Teslagrad screenshots1, as the focus would have been on full-page, detailed scenes rather than just a couple of scattered panels.

And on the fourth hand, to be completely fair, those huge text files I happen to have in my repository are … a bit unusual.

To understand why, the simplest thing to do is to compare them with other incomplete long-form fictitious prose I've dabbled with over the years here on the Wok: although none of these works is complete, they have all been “serialized” in the fashion characteristic of web novels, with each chapter in its own file, and a typical length between 4K and 7K characters per chapter. The longest of these works builds up to over 280K, which would take second place in file size even counting the new images if it hadn't been scattered across 54 files.

So, the main reason why I have such massive text files in the repository is that … I haven't split them in chunks yet. Had I worked on these long-form fictitious prose narratives the same way I did with the others, they would have also been split into chunks (chapters), and from a quick check the average length of those chunks would have been more or less in the same range as the ones linked above. I would still have some “rather large” text files (the 7 SEEDS review being probably the most prominent example), but nothing that could really compete with the larger image files.

It's even weirder when one considers that the reason why those2 particular works of fiction have been kept in single file (and have not “published” yet, not even partially) isn't a change of mind regarding how to approach writing and sharing the content, but rather from arguably silly “layout issues”: each for their own reason, they have special requirements in terms of presentations of part of the text, and I got stymied thinking of the way to obtain this both at the markup level source-side, and on the CSS side for the final output production.

(This is actually one of the many reasons why I've been thinking of moving away from my current production setup, as I've hinted previously here and there, and one of the reasons why I'd like to move to a static site builder based on , which might make it easier to tackle these aspects.)

When I say I didn't change my mind about how to approach writing, I do mean it —even in face of the OBTF/BATF movement that has been doing the rounds of the internet in the last decade for everything from to-do list to note-taking, almost in protest to the enshittification of many useful tools. And I say this as a huge fan of text in general (even for things that aren't), but also as a strong believer of “the right tool for the right job”. And for writing, as I've discussed already, I have a preference for releasing smaller chunks at once.

There are obvious downsides to the serial release, particularly if the whole story hasn't been finalized yet: you may get to a point where you need to revise some of the things you've written, to tighten things up or plug some continuity holes or because you may want to change the order in which things happen or are presented to the reader, and while it is relatively easy on the web (especially in a case such as mine, where one has full control of the medium), one wouldn't generally say that it is preferable.

(I don't follow many webnovels, but I do read a lot of webcomics, and not just page-episodic ones, and while I have come across the sporadic warning about past pages having been updated, I'm really amazed by how rarely this has happened. I know there's a lot of planning ahead, but still.)

So it's natural to accumulate several releases worth of updates as unreleased draft, and only beyond a certain point of maturity, start to release them issue by issue while more drafting is done in advance. Without going further off-topic, the question is then: when it comes to a long-form fictitious prose narrative, is it better to have said unreleased drafts as a single file, or would it be better to have them already roughly partitioned into separate chapters? Or maybe something in-between?

I'll leave the discussion for another article more focused on this particular aspect (particularly in reference to my personal approach to writing). Here, I'll just remark that the single file approach has a peculiar side effect: as the individual parts that are mature enough starts getting released, and thus moved from the “big file” to each individual chapter, the current “primacy of text” will decline, even if the total amount of text won't actually decrease, simply by virtue of the single files being dismembered. And I'm not sure I feel particularly happy about that. But it's a bridge we'll cross when we come to it.


  1. OK, not exactly: I'm not sure yet which pages I'd use for the review, but the scanlation I've read is from high-quality scans, so the images are high-resolution, but only a third of them is in PNG format, the rest are JPEGs, so the typical file size seems to hover between 300K and 400K. ↩

  2. and a few others that however have only grown to more modest sizes, in the few tens of thousand characters. ↩

The unbearable lightness of text

An image is worth several hundred thousand words, in bytes

This is a long-form post based on a previous Mastodon thread by yours truly

I've recently written a review of Teslagrad, a videogame which among its most striking features has beautiful art. Despite this, I have not included a screenshot in the review. While at first it was just a matter of convenience —I had not taken any while playing the game, and wasn't sure if I would have had to restart from scratch to get some good ones— the screenshots are still missing despite me having actually finally taken some.

As I've originally discussed in this Mastodon thread, this has not been an easy decision, and although I may come back to it in the future, it would require me to rethink my ideas on the design of the Wok.

Since its inception, I've endeavored to make this website as lightweight as possible —or at least to maximize its “content to weight” ratio: until recently, posts have almost exclusively been text-only, except for the sporadic self-hosted and often hand-coded vector graphic (feel free to browse the tag here to find more examples).

In the rare cases when raster images were included, they were hosted elsewhere, which has in fact been a source of issues due to the unreliability of said external hosting (expectedly) (You can find an example of missing image which I haven't bothered to fix yet on this page.)

To avoid this kind of issues, I've started revisiting my stance on image inclusion and hosting. Taking the thing possibly to the other extreme, I've decided that when adding my own (raster) images to a web page, these should also be tracked in the same source-control repository through which I manage the whole website.

This isn't strictly necessary, since , which I currently use as static site builder, has the concept of “underlay” that can be used to fetch site contents from outside the repository, and this would allow me to keep the repository lightweight (more so than the website at least). Tracking the images in the repository, on the other hand, has the advantage of keeping everything together, would avoid issues with having to rebuild the website safely if anything were to happen to the machine hosting it, and possibly most importantly (and most relevant to the discussion at hand) it encourages for images the same “content to bytes” maximization spirit that has driven my text usage so far.

One of the ways in which I would like to minimize disk usage is using JPEG XL instead of (or even , but I'm not currently hosting JPEG images), which I cannot do as long as Firefox doesn't enable JPEG XL support in the mainline edition of its browser (no, I don't care that has decided to boycott it in Chrome, just like I haven't cared about their lack of support and proper animation support for years).

The way I've approached this so far has been to store both the JPEG XL and PNG version of the image in the repository, using the picture and source HTML tags to provide the JPEG XL version as primary and the PNG version as fallback —which hasn't been that bad in terms of storage because the worst offender is a PNG which is ~134KB in size.

Fun fact: at 134KB, that image is only the third-largest file in the repository (yes, there are two text files that are larger than that; yes, I can be a heavy writer; no, they have not officially been published yet; yes, I may split them before publishing), and even lower in rank when considering the rendered pages, due to how my category and home pages are built. So an image of that size is still … acceptable.

And here we come at the issue with my Teslagrad review: as mentioned, one of the strong points of the game is its beautiful art, and showing it would be best, but showing it needs way more bytes than anything I've added to the Wok so far.

I've (re)played the game in its “Game Plus” mode that is available after a (perfect) win, just to take some screenshots to include in my page (I would actually like to include three or four of them), but it turns out the screenshots I've taken are huge, so adding them would be an enormous (pun intended) change in the website presence.

Let's put something out of the way instantly: yes, it's at least partially my fault for gaming in 4K (3840×2160, that's 8.3 megapixels, or 25MB of uncompressed, 8-bits-per-channel raster images). The numbers below would easily be lower if I had taken “FullHD/1080p” screenshots (1920×1080 pixels, exactly 1/4th of the 4K), but the following argument wouldn't change much. And most importantly, the web is full of screenshots at “regular” resolutions, mine would provide nothing other than being “mine”, and while that might have some value for me (but not even that: my ego feeds on other things), they still wouldn't be worth the extra space.

The screenshots I've taken so far are PNGs that range in size from 3.7MB to 6.8MB (26MB total for 5 images). pngcrush manges to squeeze between 200KB and 300KB per image, with a total reduction of the entire set of less than 1.5MB, down to a generously rounded 24MB.

By contrast, JPEG XL, true to its design goals, manages to shrink the whole set to around 16MB total with lossless compression (that's around 1/3rd smaller), or 4.3MB total with lossy compression at a “visual distance" of 0.5, and even down to 2.7MB total with lossy compression at the limit of “visually lossless” (i.e. with imperceptible loss of information), a “visual distance" of 1.0.

To give people an idea of why I'm resisting the idea of adding these to the post (and to the repository), the entire history of the Wok packs down to less than 3.2MB (that's the size of the .git directory). All source files (page sources, styles, templates and what little images are there) take up 4.3MB.

4 out of 5 of the screenshots take more space each than the entire current website source (other images included!). Adding both the PNG and lossless JPEG XL version of the 4 smallest images would more than decuplicate the website size. Even just adding the “1.0 distance” JPEG XLs would mean a 50% growth. I really don't feel like doing that. But I really think the Teslagrad review would benefit immensely from it.

(Fun fact #2: the rendered website actually takes up around 100MB, but around 80MB of that are just the tag directory, since each tag is its own subdirectory with an HTML, an Atom and an RSS file. The only other sections that take more than 2MB are the three most active categories: Oppure, Tecnologia and Riflessioni, in descending order, accounting for around 9MB of disk usage.)

So, under which condition would I be in a position to change my mind about this?

One possibility to solve the conundrum would be to write several megabytes of text. If I managed to get to approximately 60MB just for the textual part of the website, I could add a single screenshot to the review, since it would then occupy only 1/10th of the space. Actually make that 80MB, since I'd need both the PNG and JPEG XL version. (Keep in mind that these figures refer to the Wok source, since as mentioned above we're already in the “safe space” if we take the amount of space taken by tags in the rendered website.)

On the other hand, writing 60MB of text takes a lot of time, so maybe by the time I'm done I could do without the PNG version because all browsers will support JPEG XL (wishful thinking).

Jokes aside, this is actually a pretty good example of the difference in storage requirements for text vs images —and I haven't even discussed video yet (but that's extremely unlikely to happen on the Wok).

It has taken me 13 years of on-and-off writing to get to slightly over 4MB. At the same rhythm, it'd take me almost two centuries to get to 60MB.

I could probably cut that down to a tenth if I could dedicate myself full-time to writing, but that's, shall we say, unlikely to happen. And that's not for lack of ideas on what to write about either (as I assume people who follow me here or on the Fediverse have realized already, I can be very opinionated on a lot of topics, and verbosely so). But writing takes time, and I haven't even sat down to write about the things I've already planned to write about, due to a combination of factors that I'm not even going to bother to enumerate here.

I have fantasized from time to time about setting up a Patreon or something like that (most likely a Liberapay) just to see if anybody would actually back financially my ramblings about whatever topic I decide to write about, but let's be serious for a moment here: would any of you readers of mine actually shell any money to ensure I write more often? I'd probably get luckier with the opposite endeavor!

But let's just say I'm glad I have a steady job that pays the bills, even if it does leave me less time (and sometimes willpower —maybe let's just go with spoons?) to ramble —even assuming I'd use the extra free time for that rather than to, say, play videogames all day.

To be completely fair, it's also true that these days I've been writing more consistently from my Mastodon account than on the Wok, (I've discussed some of the reasons for this here), and while my “rate of production” of text on that platform is higher, it's still not enough to put much of a dent on the byte count.

I've run some numbers (in fact, compared to my original Mastodon thread, I've run some more recent numbers, which will be the ones presented here). Before setting out to write this article (or to port the original Mastodon thread, if you prefer) I've downloaded a backup of my primary Mastodon account, so while this will not be up to date with when I finish writing the article, it's fresh enough to merit discussion.

The full archive is a 183MB gzipped tarball. Uncompressed, it gives a directory that takes up 271MB. Most of it is (again unsurprisingly) is taken by the media attachments, that account for around 170MB. The next largest file is the outbox, which holds a representation of every contribution I've made to the Fediverse: 98209 objects (at the time of the backup) among which 80646 boosts/​reblogs and 16583 posts (plus other stuff that I haven't bothered to look into yet).

I'm a heavy booster, but for the topic at hand we only care about the actual posts I wrote. The JSON format is extremely verbose, but using FediRender, which I discovered for the occasion, it's possible to obtain a more compact HTML rendition of one's outbox, limited to the actual posts (which happens to be exactly what we are interested in).

In my case, the produced HTML takes almost 10MB, which —while significantly smaller than the original 86MB JSON (which to be fair contained much more data)— is still surprisingly large.

In the original thread I had miscounted down to around 110KB, which would actually be the number of lines, not characters.

Arguably, 10MB for nearly 17K posts isn't even that much: given the 500 characters limit on my instance, that many posts could easily take 8.3MB if they were filled to the brim. Of course that's not actually the case. There's a lot of overhead even in the HTML file: dumping the content of the HTML file in plain text and stripping out the metadata such as dates and account names drops the whole thing down to less than 2.7MB. (For the curious, that's an average of approximately 160–165 characters per post, quite reasonable all things considered.)

So yeah, even if I added my entire Fediverse posting and commenting history to the Wok I wouldn't budge the current size of the website source enough to be able to add a multi-megabyte screenshot without changing the current balance —and that's still being extremely generous, since many of those posts are replies that wouldn't make much sense posted here anyway, extrapolated from the original threads. (Some of them might still be worth as starting point for more in-depth writing, but as shown by my article against federating with large proprietary networks there isn't always a one-to-one correspondence between Fediverse threads, be them stand-alone or in replies, and Wok articles.)

Still, it is true that my Fediverse writing rates are much higher: we're talking about something between 2 and 3 million characters in approximately 19 months (that's between 3400 and 5200 characters per day, or between 17 and 25 seconds per character), compared to something like 4 million characters over 13 years (that's around 850 characters per day, or 102 seconds per character), and that's just looking at my primary Mastodon account. But is this enough?

(Fun fact #3: I have committed something to the Wok every year since starting the project in 2010. I don't have detailed statistics at hand, but the least active years were 2017, 2018 and 2022, at least looking at commit frequency.)

If I were to write on the Wok at the rate at which I write on the Fediverse, it would still take me no less than 60 years to add those 80 million characters of textual source that would allow me to post one screenshot and keep things balanced, but I don't think anything like that is ever going to happen.

(Fun fact #4: by the numbers seen so far, a losslessly compressed 4K picture may be worth months if not years of writing —way more than a thousand words!)

What I do plan on doing (but that's independently from the screenshot thing) is to “port” more of my threads to the Wok, in the spirit of PESOS. I try to keep the Wok in line with the longer Mastodon threads, but I know I have missed some. I may even go as far as working on some improvements to the rendering of Mastodon threads, to give me a better idea of what I have and have not transcribed yet.

(Fun fact #5: the source for this article is nearly two times larger than the original Mastodon thread.)

(Part 2 of this series is available now.)

Preparing for the end of the open web

How the Wok is changing in preparation for the final throes of the open web as we used to know it.

Against the Web Environment Integrity proposal

In April this year (2023) Google (or at least some of its employees) came out with a proposal for a Web Environment Integrity attestation mechanism, also known as Digital Restrictions Management (DRM) for the web, one of the most outlandish, nightmarish attacks to the open web from a dominant player in both the client and server space.

I've never felt more pressured (by myself) for never having gotten to write a follow-up to my (Italian) article on the dangers of monocultures on the web , a follow-up I've been thinking about for years —since, in fact, WebKit became the most-used rendering engine, even before it was forked by Google into Blink, which was then adopted by large swaths of the FLOSS browsers.

Although I've touched it briefly in my Opera Requiem series, I've never discussed in-depth how dangerous these de facto monopolies are even when the monopolistic product is open source: this is important to mention because many of the more sophisticated users and web developers that fought against the Internet Explorer monopoly settled down when the browser war was “won” by Chrome, on the assumptions that just because the winning browser was FLOSS, there was no danger for the kind of abuses that Microsoft could get away with thanks to the IE6 dominance.

We've seen now and again how this is far from true, and I've several times mentioned my pet peeves (e.g. here): RSS, SVG, SMIL, MathML and more recently JPEG XL. And the fact that Google has gone forward with implementing their own proposal on their Android browser despite the enormous pushback the proposal has received is the umpteenth nail in the coffin of Google's trustfulness with respect to the open web.

I've been aggressively using all the web standards sabotaged by Google on this site, putting “infobox” warnings about possible misrendering due to allegedly ‘modern’ browsers missing support for useful web standards (e.g. here), so the time has come to finally tackle the Web Environment Integrity misfeature. To this end, I've followed the example set by @77nn@livellosegreto.it, taking action through a small piece of JavaScript loaded by all pages that checks for the existence of navigator.getEnvironmentIntegrity.

On this site, if the symbol is found, a JavaScript alert is given, and a banner added to the top of the page, warning about the dangers of using Chrome and recommending switching to a different browser. I may tune the alert and/or message in the future, but ironically I don't seem to have yet a browser that actually supports this harrowing functionality. (For the curious: this is the JavaScript, and this the CSS for the warning box.)

And a side dish for automation

While I was at it, I've finally added a robots.txt control file to keep away well-behaved “machine learning” bots, and I've added the Wok to the Marginalia search engine index. I'm guessing the next step would be to make the site accessible through the Gemini protocol, although for that I would also need either a Markdown to Gemini text transpiler, or to push for web browsers to add support for Gemini while still accepting HTML and friends.

Analytics (2023-08-10 update)

As pointed out by some of my readers, it's a bit hypocritical to rant about Google dominance on the web when I still have Google Analytics tracking code on my website —which was actually something I wanted to mention already on the first draft of this article, but as things went it got late at night and I forgot, so let's have this minor update.

There isn't much to say: GA is a pretty invasive tracker, and I've been intending to replace it for a while now, particularly since finding about Plausible. So the plan is to replace GA with a self-hosted Plausible. The question (I'm not Oblomov for nothing) is when I'll finally get to it. In the mean time, as it happens, GA has actually stopped working since I was still using the old v3 tracker, so I decided to just get rid of it altogether in the mean time. I guess I'll just have to wade through my Apache log for visitor information for the moment.

And still more LLM scrapers (2024-08-29 update)

Last night I noticed that my access logs for the month were exceptionally dense, and I found out that the culprit was a new LLM scraper. For those interested, the User Agent string is

Mozilla/5.0 (compatible) Ai2Bot-Dolma (+https://www.allenai.org/crawler)

The bot does seem to check robots.txt, so I've updated mine, but I don't know if it actually respects it. Let's see if it's sufficient or if a more aggressive approach is needed (scraping seems to have lasted from August 17 to August 26, so I'm not sure if it'll happen again soon, though).

I seriously have to start thinking of a lower-level approach to detect “AI” crawlers and use the detection to poison the contents instead of just asking them to skip the website.

My robots.txt

As of now, for the curious, these are the contents of my robots.txt:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Ai2Bot
Disallow: /

C++ templates and object properties

A C++ template metaprogramming approach to generic interfaces for object properties.

One of the beatiful things of the Fediverse is the amount of interesting people and the stimulating dicussions they fuel.

This article stems from a minithread by @tess​@mastodon.social complaining noting that there are things that C++ templates cannot achieve yet, and for which one still needs to use preprocessor macros, which has its own share of problems.

I'll leave it up to the thread to give details, but the problem space is the following: we want to be able to assign properties to objects in such a way that, if a class C has a property Prop of type T then the object should reply to a common set of methods (member functions) such as:

T C::GetProp();
void C::SetProp(T)
Subscription C::AddPropChangedCallback(Callback)

and so on. The gimmick here is that for each property you need all these functions, and they all do exactly the same thing, for different variables. And the reason why this cannot be done purely via templates is that the name/​value of the parameter used to instantiate a template cannot be used to build the names of the functions.

I remarked that this is actually possible using templates, as long as we allow for a slightly different syntax and go deep into meta-programming territory.

The objective is to build a “generic property mixin” that can be used as base class for any number of properties, and with a slight change in syntax:

T C::Get<Prop>();
void C::Set<Prop>(T)
Subscription C::AddChangedCallback<Prop>(Callback)

For simplicity, we will limit ourselves to the getter and setter function. The code assumes C++14 (with some effort the requirement can be lowered to C++11, and with a lot of effort even to C++98), and is licensed under the GPLv3.

#include <type_traits>

/* A structure encapsulating a list of types.
   We will use this to determine if a type is present
   in the list or not */
template<class... List>
struct type_list;

/* Generic case */
template<class Head, class... Tail>
struct type_list<Head, Tail...> {
    /* Type alias that maps to Head if Needle matches it, and looks
       for Needle in the Tail of the list otherwise */
    template<class Needle, class Fail>
    using find_type = std::conditional_t<
        std::is_same<Needle, Head>::value,
        Head,
        typename type_list<Tail...>::template find_type<Needle, Fail> >;
};

/* Special case for a list of length one */
template<class Head>
struct type_list<Head> {
    /* In this case the Fail type is returned if Needle isn't found
       (i.e. if it doesn't match Head */
    template<class Needle, class Fail>
    using find_type = std::conditional_t<
        std::is_same<Needle, Head>::value,
        Head,
        Fail>;
};

/* Properties are defined by tag classes (declared but not defined) */

/* The special tag class PropertyNotFound will be used
   to return meaningful errors when trying to access a property
   for an object whose class does not have it. */
struct PropertyNotFound;

/* Property traits class:
   this defines the characteristics (traits) of a property.
   It should define at least its `type`, but of course it can be expanded
   for more introspection (e.g. it could include a name, etc.) */
template<class P> struct prop_traits {};

/* Basic mixin for a single property:
   defines unqualified get() and set() */
template<class P>
struct prop_mixin_base {
    using type = typename prop_traits<P>::type;
    type value;

    type get() { return value; }

    void set(type const& in) { value = in; }
};

/* The actual mixin for the properties is a variadic template,
   taking all the property tag classes as parameters. */
template<class... Ps>
class prop_mixin :
    /* derive from the basic mixin for each property */
    public prop_mixin_base<Ps>...
{
    /* Type alias to simplify syntax */
    using props = type_list<Ps...>;

    /* Type alias that maps a property to the corresponding mixin,
       falling back to PropertyNotFound */
    template<class U>
    using find_base = prop_mixin_base<
        typename props::template find_type<U, PropertyNotFound>
    >;

    /* Type alias that maps a property to its type,
       to simplify syntax */
    template<class U>
    using prop_type = typename find_base<U>::type;

public:

    /* User-facing interface: template functions for getter and setter
       that map to the corresponding basic mixin unquaified functions */
    template<class U>
    prop_type<U> get() { return find_base<U>::get(); }
    template<class U>
    void set(prop_type<U> const& in) { find_base<U>::set(in); }
};

/* Example usage: defines a Width and Height property, and an Unused one */
class Width; template<> struct prop_traits<Width> { using type = int ; };
class Height; template<> struct prop_traits<Height> { using type = int ; };
class Unused; template<> struct prop_traits<Unused> { using type = int ; };

/* A class C with Width and Height property */
class C : public prop_mixin<Width, Height> { };

#include <iostream>
int main()
{
    C c;
    c.set<Width>(1);
    c.set<Height>(2);
    std::cout << c.get<Height>() << std::endl;
    std::cout << c.get<Width>() << std::endl;
    // c.set<Unused>(3); // errors out complaining about a PropertyNotFound
}

Il Fediverso (non) è (.)uno

Alcune personali osservazioni sulla struttura ideale e reale del Fediverso, e sull'ombra proiettata dal gruppo Devol.

… che al Mercatone mio padre comprò

Ormai tutti hanno sentito parlare di Mastodon, la piattaforma di microblogging che ha avuto un esplosivo successo come alternativa al defungente Twitter dopo la presa di potere da parte di Elon Musk, presto rivelatasi —come molti si aspettavano— disastrosa per la qualità ed affidabilità sia tecnica sia sociale della piattaforma.

Non tutti sanno però che Mastodon non solo non è un singolo sito, ma è distribuito su un gran numero di istanze indipendenti che però possono interagire tra loro, ma che la piattaforma è solo una delle tante (se pure indubbiamente al momento la piú diffusa) di un ben piú variegato Fediverso, una rete in cui partecipano servizi di ogni tipo: invero, non è nemmeno l'unica piattaforma di del , categoria a cui appartengono, oltre ai fork di Mastodon stesso, anche piattaforme quali Pleroma ed i suoi fork (es. Akkoma) o MissKey ed i suoi fork (es. CalcKey). A queste si aggiungono piattaforme come Friendica (che a molti, non a caso, ricorderà il Facebook delle origini), Pixelfed (ispirato all'Instagram delle origini), PeerTube (per la condivisione di video), Lemmy e kbin per chi avesse nostalgia di Reddit, e via discorrendo.

In Italia, soprattutto nell'ultimo anno, il discorso attorno a Mastodon et al Fediverso è stato dominato dal gruppo Devol, che gestisce alcune grosse istanze che, per il loro uso del TLD .uno, sono diventate generalmente famose come “il Mercatone”. Se da un lato le istanze “generaliste” del gruppo Devol hanno avuto il pregio di aver dato spazio alla gente comune, permettendo cosí, in un certo senso, una “massificazione” di Mastodon e del Fediverso in Italia, è opportuno ricordare che il Fediverso italiano esiste da molto prima del loro interessamento, e che il gruppo Devol ha applicato strategie piuttosto aggressive, e quantomeno discutibili, per arrivare ad affermarsi —partendo dal mettere al bando le piú grandi istanze anarchiche, poste allo stesso livello di quelle pedopornografiche e di propaganda alt-right per finire con il censurare chiunque documenti le loro politiche (continua)— al punto che molti sconsigliano di usare le loro istanze

Il Fediverso italiano

Se qualcuno si vanta di gestire “la prima istanza Mastodon italiana”, la prima cosa che ci si dovrebbe chiedere è: in che senso? In ordine cronologico? Per numero di iscritti? Di utenti attivi?

Ad esempio, la piú “antica” istanza Mastodon italiana tuttora attiva è probabilmente quella del collettivo Bida, aperta nel 2018; sebbene sia stata preceduta da altre istanze Mastodon che poi hanno chiuso (una di queste menzionata dopo, nella sua forma “resuscitata”), e se si considera il Fediverso in generale, ve ne siano anche di piú vecchie (anche qui, vedi sotto), Bida può vantare anche di essere la piú antica presenza “generalista” italiana persistente nel Fediverso. Quasi coeva è quella di Cisti.org, e qui possiamo cominciare a notare come queste prime esperienze siano nate in ambienti anarchici ed anticapitalisti, ideologicamente antitetiche ed antipatiche alle —e spesso vittime di censura da parte delle— piattaforme commerciali —ed a volte, purtroppo, anche da alcune istanze con evidenti aspirazoni maggioritarie di cui non faremo il nome.

Ad oggi, a queste si sono aggiunte almeno una ventina di altre istanze aperte al pubblico, tra cui (di mia conoscenza ed al momento attive):

nonché svariate istanze individuali (di cui non parlerò per motivi di spazio, ma alcune delle quali potrete scoprire seguendo il mio account su sociale.network1), ivi inclusi alcuni blog con attivo il plugin per ActivityPub, ed alcune istanze in cui abbonda la propaganda alt-right, dal putinismo all'antivaccinismo, e di cui non parlerò per ovvie ragioni. (Per eventuali segnalazioni, per aggiunte e correzioni, potete lasciare un commento sotto il post che ho fatto su Mastodon relativo a questo articolo.)

I numeri

Facendosi un giro per i server sopraelencati, qualcuno potrebbe lamentare il ridotto numero di utenti attivi di molte di queste istanze (misurabili nelle decine o centinaia, un paio di migliaia per le piú grandi), sopratutto confrontati con le decine di migliaia (“numeri che fanno girare la testa”, come direbbe Ingegner Cane) vantate dal Mercatone.

Questo rimane un punto di forte contesa nel Fediverso (non solo italiano), tra chi manifesta una certa ossessione per i numeri e la crescita, che sia per prestigio o altro, e chi invece ha trovato nei numeri piú contenuti un equilibrio piú salutare, dove la crescita è preferita quando “organica” e (preferibilmente) “orizzontale” —una divergenza di opinioni che potrebbe creare in tempi relativamente brevi una frattura con l'ingresso di Threads (la piattaforma di microblogging legata ad ) nel Fediverso, una minaccia per la sua salute di cui ho già parlato (in inglese) qui.

Quali dei due approcci sia il migliore potrà dirlo solo il tempo, ma penso che il fatto che proprio oggi i server del Mercatone siano andati giú per sovraccarico in occasione dell'ennesima ondata migratoria da Twitter causata dalle folli scelte del suo proprietario debba far riflettere su quale sia la piú resiliente.

Di piú, rivela il paradosso nell'ossessione della crescita “verticale”, che portando al collasso del singolo server (pubblicizzato come il server su cui preferibilmente registrarsi) mette in pessima luce il Fediverso tutto e dissuade la gente dal frequentarlo, dove una crescita piú diffusa ed “orizzontale” permette di mantenere tutto in attività, incoraggiando quindi la gente a rimanere.

La filosofia

La linea di demarcazione tra i due modi di pensare ai numeri del Fediverso segue piuttosto fedelmente una divergenza filosofica su cosa il Fediverso debba essere: un progetto per superare le logiche che hanno guidato i “silo” corporativi verso la centralizzazione di Internet, o una scusa per sostituirsi agli stessi sfruttando manodopera gratuita o a basso costo e l'ingenuità dei molti.

In questi giorni in cui la sempre crescente instabilità di Twitter, la “immerdificazione” (traducendo l'inglese enshittification pubblicizzato da Cory Doctorow) di Reddit, l'emergere dei drammatici effetti della scarsa moderazione su Facebook (tra cui il genocidio dei Rhohingya in Myanmar), tutto mostra sempre piú chiari i problemi di queste piattaforme proprietarie e centralizzate, sarebbe opportuno fermarsi un attimo e considerare, nella ricerca di alternative, quale sia la cosa a cui dare valore.

Soprattutto se siamo i primi a muoverci, a cercare il nuovo, proviamo ad imparare dalle lezioni dell'ultimo ventennio, maturate soprattutto in questi ultimi anni: e ricordiamo anche agli altri che forse ora piú che mai è il caso di dare una possibilità a qualcosa che pur potendo sembrare meno accattivante e rifinita di primo acchito, è costruita per durare e liberare, e non per sorvegliare e manipolare. Facciamo uno sforzo per guidare le comunità di cui facciamo parte verso un orizzonte piú positivo di quello in cui sono rimaste legate finora. E se davvero non ci riusciamo, rivalutiamo l'opportunità, le basi etiche persino, dei contatti che abbiamo con chi rifiuta di ascoltare.


  1. a meno che non siate sul Mercatone, che ha sospeso il mio account per le ragioni descritte sopra; protestate con gli amministratori della vostra istanza, eventualmente, per avere accesso; ↩

A credible threat to (and from) commercial social network silos/1

The Fediverse, especially through Mastodon, has been acknowledged by the major players as a threat —to be eliminated.

Foreword

This article is born out of several threads I've written on Mastodon (relevant references: here, here, here, here and here at least, but also see this and this for additional arguments) after the announcement of a Meta/​Facebook project codenamed Project 92 (P92) and also known as Barcelona that would be “compatible with Mastodon” (common speak for supporting the ActivityPub federation protocol).

(Update: If you got here from the Fediverse, you will probably also be interested in my call for a pre-emptive Fediblock on Meta's platform.)

(Update 2023-12-13: the platform has been launched under the name Threads on 2023-07-05 in the USA, and is scheduled to launch in the EU as well tomorrow, and has apparently started testing ActivityPub integration today.)

The article tries to present my perspective in a more organic way, and includes some additional considerations that I've more recently had the opportunity to make not only about what the Fediverse may expect from companies like Facebook “joining” it, but also from others like Bluesky Social choosing to not be compatible.

(And yes, as we're now in June it's getting late, and it's possible P92 will release before I finish putting this in place, but most of my predictions are for much later, 3–5 years from now at least, so they may still count as such rather than as a retrospective.)

One of the most important aspects to take into consideration is what I mean by some piece of technology or protocol getting “killed”: some people interpret this in a very strict sense, considering something dead only when literally nobody uses it anymore, while I (and several others) prefer a much more lax interpretation, where assassination is achieved when something is successfully removed from the awareness of the general public.

In the latter sense, it may even be argued that the “app mindset” pushed by modern mobile device (started by Apple) is on the path of killing not just the open web, but the World Wide Web altogether, shifting content distribution to native “apps” from browser-accessible web pages.

But this is a discussion beyond the scope of this article.

When Google killed RSS

There's some debate on what exactly killed RSS, or even if RSS is dead in the first place, as it is still extensively used —although mostly behind the scenes— on several platforms, but it's generally understood that the main blow to RSS came from Google discontinuing their Google Reader.

A relevant thread started by @eniko​@peoplemaking.games does a pretty good job at collecting the different opinions.

The OP stance is that RSS isn't really dead and even if it was, it wasn't Google's fault, as anybody could have created an alternative to it —that nobody did shows there just wasn't enough interest in RSS in the first place.

My readers know that I agree on the fact that Google wasn't the only party contributing to the RSS downfall —for example, I also blame Mozilla for pulling their Live Bookmarks feature from Firefox without adequate replacement— but Google's role cannot be overestimated.

Two key weak points in @eniko's post were addressed in the comments to the aforementioned thread.

The first pertains the semantics (that I've addressed in the introduction) about what does it mean to kill something like RSS. As mentioned by @elrohir​@mastodon.gal here and by @luigihann​@mstdn.social here, what Google shutting off Reader achieved was to kill off the mindshare that RSS had accrued, turning it from “something every site should have” into “well, if Google is looking elsewhere, it mustn't be that important anymore”.

Several other comments highlighted the second aspect, that the Reader shutdown also removed the “social” aspect of RSS perusal, just as the general public attention was being corralled to centralized, ad-monetizable social networks —which probably also explains why the Reader was sunset: it went against the prevailing “attention economy”.

So I would argue that not only the shutdown of Google Reader was the primary cause for the RSS demise, but also this was actually intentional, to wipe out from the general consciousness the awareness of the possibility and existence of decentralized, user-controlled forms of content distribution.

Several users have also remarked that RSS and Atom feeds still exist, citing podcasts as primary application (in fact, it could be argued that it's not a podcast if it's not available via RSS). I'll add to that that there's another form of media that still heavily employs RSS, and those that follow me won't be surprised by my mention of this: webcomics. A lot of self-hosted webcomic sites build on WordPress, so RSS availability isn't a big surprise, but I appreciate that (or rather, when) dedicated hosting solutions also offer the feature. (Comic Fury does it, and even WebToons, while AFAICS Tapas does not.) I use a self-hosted Tiny Tiny RSS instance to follow feeds, and the largest active (and still growing!) category is comics. The vast majority of the blog feeds I (used to) follow are either dead (as the hosting site shut down) or inactive, last updated some 10 years ago, with few exceptions (among which, the feeds from this site).

(And for what it's worth, I still visit the original posts, even when the full article is available in the feed itself.)

And of course, Mastodon and other Fediverse platforms also support RSS, as producers, as consumers, or both (Friendica being the best example of the latter). Several news sites still have feeds too, even though they don't always link to them in any user-visible part of the page, or even at all (secret URLs FTW).

Hence the importance of stating what exactly is meant by “Google Reader (or whatever else) killed RSS”, and why the mindshare aspect mentioned above is so important.

The technology is still alive, it's still being actively (and passively) used. Feeds are still being produced by a surprisingly large part of the Internet, and still being consumed, but the general public has been made unaware of this, to push the centralized social network paradigm of the attention economy. Unsurprisingly, I'm convinced there might be an opportunity now, with the Twitter takeover bringing attention to the Fediverse and rekindling interest in POSSE and PESOS, to reverse the trend.

And yes, as individuals this can be extremely frustrating, because the feeling is that even knowing what would be best to support the open web, there's the feeling that our choices have little effect if any. And in this context, the non-ubiquitousness of RSS production and consumption “where people gather” often means a need to double or triple the efforts to reach out and connect with other people with matching interests, especially for those whose livelihood depends on it.

I do have a feeling that we're approaching another paradigm shift: as the GAFAM enshittification progresses at increasingly rapid pace, times are getting ripe for the emergence of new contenders to the search engines and social silos that dominated the last 15 years, and ironically the strongholds of the old web are more likely to survive this change than anything that moved to the now sinking giants.

I'm not as optimist as others about the open web and indie web making a return comparable to the days of glory of the blogosphere, but I do have feeling that it will be a bridge, and those finding their space there will have an edge in whatever the new paradigm is going to be, even if it currently seems like a cause lost in the mist of unfriendly User Agents and single-user apps.

I'll take this opportunity to renew my plea to browsers that claim to be in favor of the open web that it's way past the time you put your engineers where your mouth is, and get back into leading the adoption of open technology, instead of blindly following the “leader” in their destruction of the same —and not just when it's a matter of hype.

(And yes, that was really aimed at Mozilla as an invitation to bring back RSS support in Firefox, and to openly promote JPEG XL instead of keeping it hidden in Firefox Nightly —and more in general to go against everything that Google's chokehold on the web currently represent. Support user choice instead of crippling it. Prove that you are different.)

The Fediverse is a threat to Big Tech silos —and they're preparing to counter

Shortly after posting about how Google killed RSS (as discussed in the previous section) I was made aware that Meta (Facebook's and Instagram's parent company) has been working on a “Twitter competitor”, codenamed P92 and also known as Barcelona (more recent news seem to indicate that the actual name of the product will be “Threads”) designed to “interoperate with Mastodon” (which is “common people” speak for “will support the ActivityPub protocol).

The timing of the news and my original thread was quite serendipitous, as it allowed me to recall that Facebook played a role in killing of RSS, through strategies better known from Microsoft's fight against open protocols and FLOSS: Embrace, Extend, Extinguish (EEE). Luckily, the memory is still fresh enough for some of us to realize that Meta's actions represent a threat for the Fedivers.

The obvious recommendation here is to defederate with extreme prejudice any and all Facebook-associated instance. We know how it will play out, we've seen it happen again and again. It doesn't matter how nice and cooperative they will be in the beginning, this is a power move against the Fediverse, a move against which it has no chance of resisting if even an inch is given.

(You want tags for that? We have tags! , , )

The only measure in which this news can be viewed positively is that the decision confirms the Fediverse as a threat to the social silos. Let's keep it this way. Do not trust Facebook. Do not forget that just last week Instagram tried to block the tag (relevant thread), lest people became aware of the existence of the world outside, a world that still has the appeal of early-days Instagram, and not subject to the threat of enshittification.

I'm quite confident that many (most?) of the people on the Fediverse (especially the old-timers), and most of the admins, will know better and avoid Facebook affiliation (pardon, federation). But here's the million-dollar question: will Mastodon's own founder and creator Eugen Rochko resist, or will he fall for the lure, trustly welcoming the Trojan horse of a Facebook-managed ActivityPub instance, believing it a sign of “victory”?

I would have loved to claim “most” in the paragraph above, but as the news circulated I found that the amount of people deluding themselves into thinking we should welcome the upcoming Meta instance with open arms was much higher than I expected. Truly people never learn anything from history. Seriously, how can we ever hope to find a solution to things like anthropogenic climate change when we can't even cope with the threat of surveillance capitalists siphoning off all the energy from the opportunity of a breakout?

The problem with awareness is, again, frustration: knowing how things are going to turn out, being powerless against people with stronger voices or with more power dismissing the warnings and insisting on pursuing their delusions of impermeability to the laws of physics (for climate change) or surveillance capitalism (on the Internet), so that the only thing that's left in the end is being in the position to say «I told you so.» when things inevitably go as you predicted.

(Relevant XKCD)

It gets worse: unless the Fediverse starts guarding against the Facebook torpedo (and by the looks of it, I'm starting to doubt it will) now, the people that don't believe the warning now will come up with excuses why all energy will have been siphoned off the ecosystem, blaming everything but the elephant that they let in thinking it couldn't lay waste of the china shop. They'll add insult to injury.

There's one thing I've been thinking again these last couple of days: would the situation be different if the client-to-server part of the ActivityPub specification had found the same level of adoption as the server-to-server side? One of the ugly things in the Fediverse today is that each platform has its own API to communicate, which is burdensome for clients, and fragments the ecosystem.

I know there are historical reasons for this (most major Fediverse platforms either precede the ActivityPub spec and later adopted it for federation, or tried to emulate the existing ones also client-API-wise), but I still wonder: is the C2S protocol sufficiently specified to allow the same functionality currently implemented via ad-hoc APIs in existing platforms?

If so, it would be good for platforms to start moving towards it (possibly initially supporting it side-by-side with the existing APIs). If not, it would be good for the C2S protocol to be improved (and/or better specified, if it's a problem of underspecification), so it could be adopted as above.

(In fact, I'm not even sure there are ActivityPub servers that use the client-to-server part of the protocol. I should probably look into the more obscure ones, like GoToSocial or Vocata.)

The sad tale of XMPP: a cautionary tale

There's actually a better example than RSS for the threat that Meta (and Google) pose to the open web and the indie web, one that is much closer to the Fediverse in spirit and scope, and that these two companies almost single-handedly managed to kill off (in the usual sense of siphoning all energy and mindshare off): XMPP, a federated, open, extensible protocol for chat and instant messaging.

When Google and Facebook adopted XMPP for their messaging products, a lot of people in the ecosystem were thrilled. Open source had won! The Big Ones were finally joining the FLOSS community with an interoperable, federated product that allowed any client to be used with their system, and communication between people with account on other servers!

Yes, the rhetoric at the time was exactly the same that we're seeing from people looking at codename Barcelona with enthusiasm.

The rest is history. The XMPP protocol at the time had some glaring inadequacies in some areas, so it had to be extended to cover those use cases. Instead of working with the community at large to converge towards common extensions for logging, device switching, audio/video and conference calls, Facebook and Google went with the most classic of rug-pulls, first by defederating their servers (lock users in), and then by switching out of the protocol altogether.

(Interestingly, Facebook was much more successful in locking people in than Google, probably partly due to the very schizophrenic approach to products that the latter has, but Google also kept their XMPP compatibility for longer: access was shut down only on June 2022, years after the defederation and transition away from Chat.)

I'm sure you can see now why I am less than thrilled about the enthusiasm with which people are looking at Facebook “joining” the Fediverse with their ActivityPub-compatible Twitter clone, despite there being literally no indication that their attitude today is any different from the one at the time of XMPP —and if anything, the opposite, as we've seen with Instagram blocking the tag.

Nostradamus: how Facebook will torpedo the Fediverse using P92/​Barcelona

So I'm going to make a prediction now, about how this will go if their server isn't treated exactly like Gab, the Nazi social network. Feel free to bookmark this for reference and set a reminder for a couple of years from now to tell me (and everyone still interested in listening) how right I was (and rest assured I'm not the only one: everybody with a good memory thinks along the same lines).

Prediction: Meta's P92/​Barcelona/​Threads joins the Fediverse. Many instances (especially large ones and ones with an obsession with “growing the network at any cost”) federate with it. The Fediverse sees a growth like nothing before, with the Twitter migration in November last year paling to a glitch by comparison. Smooth sailing for a couple of years, but nearly every new account is created on Meta's side. Meta possibly brings Instagram in. @dansup​@mastodon.social, PixelFed creator, rejoices.

Prediction, part 2: a couple of years in, with over 90% of the Fediverse on Meta servers, the company goes for the rug pull, either on bogus technical claims on the protocol, or on the basis that “everybody is here anyway”. They defederate from everything. The Fediverse idea loses momentum, participation falls back to diaspora* levels. Tankies explain to us that this was due to ActivityPub not being good enough and not due to Meta killing it because it was a real threat.

I think that the main delusion that shines a positive light on the Barcelona thing is the illusion that this kind of connection can help bring people out of Facebook or Instagram, despite there being no indicator that this would actually happen, white there are several ways for Facebook to extend their control on the Fediverse through that same gate (see for example this write-up by @darnell​@one.darnell.one on a possible scenario for the takeover).

Striking a balance

Among the comments to my first run of these thoughts on the Fediverse, one of the key point that emerges is that the hardest task for the Fediverse is to find the right balance between moving beyond the “cool tech for nerds” phase towards a more general public acceptance, and avoiding falling into the hands of a small set of centralizing players that would ultimately get to decide what to make of it.

In many ways, the parable of the Fediverse is replicating that of the World Wide Web, that failed to deliver on its promise as a tool for the democratization of knowledge (both production and consumption) as Big Tech found a way to monetize it concentrating power at the expense of the open web.

This is also the reason why I believe strongly into the objective of a “Fediverse of all”, but I do not believe that this can be achieved with major players that have nothing to gain from it entering the playing field. This is also why I'm skeptical that the best (or only) way to counterbalance the Meta threat is for other major players to get into the field, Automattic's Tumblr being the best candidate if Matt's promises carry any weight: I do believe that having more than one “silo-derived” server on the Fediverse would be less dangerous than having only one, but I do not believe it is enough. Again, the XMPP history teaches us something, since both Facebook and Google being onboard did nothing to avoid them from killing it —and in fact probably made things worse in the end. But even if any of the majors would join in good faith (something that I would trust Automattic to do more than Meta), they would still carry too much weight, and could easily choose to expunge all the minor servers from their choice of federation, making it nearly impossible to successfully self-host an ActivityPub instance similarly to what's happening with email.

There are technical solutions that could alleviate this issue (first and foremost nomadic identity as supported by HubZilla or Streams), but I'd honestly put more hope on regulation such as the EU's Digital Markets Act forcing the giants to keep interoperating.

Meta Fediverse: validation or trap?

It's not that I don't get the enthusiasm about P92/​Barcelona —that's not the reason why I warn against federating with it. I do get it. There is something elating, validating, empowering even, when some{one,thing} Big & Famous (seems to) adopts “underdog” tech. I know because I've been there, both as a user and as a developer. But because of that, and for having been burned already not once, not twice, but three times at least, I know what to look out for.

The single most important thing to look out for is the difference between actually adopting a technology and “adopting” it. And this must be looked at not from the perspective of the user, but from that of the Big & Famous: how can they best use it to serve their own interest?

And here's the thing: Meta/​Facebook has literally nothing to gain from actually adopting ActivityPub and federating with the rest of the Fediverse, exactly as it had nothing to gain from RSS or from XMPP.

In fact, all three of them go completely against everything that company and its products stand for. So why would they even bother adopting them if they weren't dragged into it kicking and screaming? Because “adopting” them gives them a unique opportunity to destroy them thanks to the company market size. And this is exactly what they've done with RSS and XMPP, and what they will do to ActivityPub if we let them get even just a sliver of an inch in.

One of least-sensical objection I've had to my threads on Mastodon, worse than the uninformed “oh but we still use RSS/​XMPP” or “that's not the reason why they failed”, is “the products Facebook and Google used to allegedly sink XMPP just flopped”.

This objection completely misses the point that the purpose of these products was never to succeed. Their entire purpose was to die and bring down the competitors with them. And this is absolutely the case with P92/​Barcelona too.

“There's no guarantee that P92/​Barcelona will succeed” isn't a reason to not obstruct it in every possible way —in fact it's worse than that: not only it's not guaranteed to succeed, but it's pretty much guaranteed that even Meta doesn't care if it succeeds, because that's not why it's being developed. This is a golden opportunity for them to take a sweep at all of the competition at once: they can give a finishing blow to Twitter, trip BlueSky and siphon out all the energy from ActivityPub/​the Fediverse, all in one go!

Now you may consider that the Twitter/​BlueSky tripping combo might be worth rooting for, for people on the Fediverse, but are you willing to sacrifice the future of the Fediverse on that? I can safely say that I'm not.

And again: the Fediverse as it stands now has no chance to resist the Meta EEE torpedo. It is not large enough, widespread enough, or resilient enough to withstand it. The bridge will not help people move from Facebook/​Instagram to the Fediverse, it will only suck many of those that have made it here back into the silo.

I mean, seriously, the Fediverse can't even protect itself from the opinionated, crippling, overreaching influence of Mastodon, how can you delude yourself into hoping it can withstand intentional sabotage from one of if not the single most unethical social media giant on the Internet?

To get an idea about what I'm referring to, consider this. As I mentioned, much of the optimism comes from a belief that, from the federation, Meta/​P92 users would get a “window” on the Fediverse, which would tickle their curiosity and provide them with the opportunity to explore this connection to experience social networking outside of the giant's grasp, find it superior, and migrate out thanks to the alleged lack of risk of losing their network when moving out. Yet even most Mastodon users aren't even aware of the wider Fediverse, thanks in part to the platform being (intentionally?) designed to hide this, by not rendering rich text formatting, providing inadequate support for non-Note object types, no indication of the originating platform for contacts and posts coming from non-Mastodon angles of the Fediverse (compare with Friendica proudly showing its extensive federation support), etc.

So why would the tens or hundreds of millions users Meta can instantly put on P92 ever be made aware of the diversity of the Fediverse, when their only window on it will be controlled by Meta, who can do everything to make it as non-obvious as possible?

So, again: federation with P92 will not show the richness of the Fediverse to Meta users: it will only give unsatisfied Fediverse users the opportunity to jump back into the corporate silo before the door closes again.

And there are plenty of such users on Mastodon, people all but forced to jump ship when Twitter went to hell that would be more than happy to go back to anything resembling their idealized memory of “old Twitter”, people rightly disgruntled, unsatisfied by the innumerable idiosyncrasies of federation (and especially the way it's handled by Mastodon), people that would be easily corralled again, to their own future detriment, by a sufficiently “shiny new thing” with a promise (soon to be disregarded) of interoperability.

I've seen several people proposing already in advance to “fediblock” P92/​Barcelona because, being a Meta product, it poses a threat to the privacy of uses across the entire Fediverse. See for example this comment by @smallpatatas​@mstdn.patatas.ca and @atomicpoet​@calckey.social response (which coincidentally is also an excellent example of how the much-aspired but controversial “quote toot” feature requested for Mastodon and implemented by CalcKey destroys the flow of discussion even when not used maliciously) for two very different views on the matter.

And the privacy issue is quite legitimate. But honestly, to me this is a “second order” worry, Now, if the Fediverse had the resilience to plow through Meta's attempt at torpedoing it with P92/​Barcelona, or if Meta actually had good intentions regarding federation, the privacy angle would be something to think seriously about (whatever decision is taken ultimately, I assume different instances will have different takes on the matter). But if the EEE strategy works, privacy will be the least of our worries, as the Fediverse will crumble back to “also ran” numbers.

For what it's worth, I do wish the Fediverse had that kind of resilience. Especially since I wouldn't have to write this many threads (or this ginormous article that I'm editing now, collecting them) on the topic of the dangers of federating with P92 —because the dangers I'm talking about wouldn't be there in the first place. But until and unless the Fediverse becomes as pervasive as email, until links to Mastodon and the Fediverse become as common as links to Twitter, Facebook or Instagram in the “social media” sections of websites, until every person with a following on Mastodon will sport such a link proudly on their website (and there's still a lot that don't), the Fediverse simply doesn't have the kind of mindshare that would allow it to resist the P92 torbedo.

The Fediverse has not even reached critical mass for the network effect to bring more people in at a significant rate (outside of the bursts of the Twitter migration, the last of which was months ago), let alone to keep the ones already in from jumping back out.

So let's stay away from it. If you really do care about the good people you know on Facebook, Instagram, WhatsApp, tell them, help them come to the Fediverse now, and block P92. Don't wait until the damage has been irrevocably done.

Prevention is better than cure.
A stitch in time saves nine.
Better safe than sorry.

And yes, I know, it's true, it doesn't matter what I say. It won't be treated like Gab, it'll find lots of friendly instances willing to federate, especially among the larger ones (mastodon.social anyone?) I guess the only thing left for me is to keep this article (and the Mastodon threads it comes from) as future reference, when I'll get the opportunity to spam “I told you so” under every optimistic comment in a couple years' time.

Forewarned is forearmed.

Not just corporate hate

There was an interesting poll by @matthieu_xyz​@calckey.social about the preference to block a number of other platforms if/​when they federate: the list mentioned Meta, Tumblr and WordPress, all presented as “closed source, corporate” platforms.

I found the list a bit curious, because WordPress is not even closed source, and while its development is controlled by a corporation (Automattic, that has since bought Tumblr too), it has been in many way a champion of the open web, always supporting open standards and even helping define new ones and pushing for their adoption. It wasn't a surprise that the “block WordPress” option in the poll was the one that received the least votes —especially since WordPress can already federate using a dedicated plugin.

The situation with Tumblr is definitely different from that of WordPress, despite them being both owned by the same company now, but it's also different from that of Meta. So while Tumblr joining the Fediverse would pose a problem (especially since, in contrast to WordPress, it's not a myriad of self-hosted websites, but a single huge entity that would be perceived a single server on the ActivityPub network), it would be largely for different reasons than P92/​Barcelona.

And yes, this may be considered a bit naive, but it is due to the enormous difference between how Meta does things and how Automattic does things. While I'm wary of both, Facebook has already played the EEE game with XMPP and RSS: there's no reason to expect it will be different for ActivityPub. WordPress and Tumblr, on the other hand, are managed by a company that has always played “by the rules”. And yes, this doesn't make them immune to the risk that would come from a catastrophic change of hands in Automattic, but at the least this isn't in the horizon now, and while it may be an excellent reason to not create an account with them, the federation aspect may be considered with less worry than Meta's.

Even now we can see the difference: Automattic hired the developer of the WordPress ActivityPub integration plugin, so that he could work on it full time and improve the existence for every WordPress installation out there. What would Barcelona bring to the table, outside of a massive instance people will flock to because “it federates anyway” and then be rug-pulled out of their contacts when enough people are there?

I would also highly recommend reading @atomicpoet​@calckey.social's take on whether it's a good idea to block P92/​Barcelona or not, even though I (obviously) disagree with it on several points.

For example, whether or not (or how many) people will use Barcelona shouldn't really factor in. In fact, precisely because Meta can populate that service simply by giving an account to every Instagram user (which the last time I bother to check was kind of the plan) it should be defederated en masse. The point of blocking P92 isn't to make it fail, it's to prevent its usage as an EEE torpedo against the Fediverse. And again, not even Meta cares if P92 survives in the medium/​long term: as long as it manages to sink Twitter, BlueSky and the Fediverse, its work is done. And in the process they may even gain that non-insignificant fraction of the current Fediverse users that don't actually care about freedom, privacy and federation, and are on the Fediverse only provisionally, waiting for a “less Nazified, but similar to the old Twitter I remember” platform to come around —which is exactly what Meta is going to give them.

And yes, corporate-controlled media and corporate-brainwashed drone will look at the fediblock negatively. It wouldn't be the first nor the last instance of them being catastrophically wrong. But most importantly, not blocking will not bring positive presentations of the Fediverse in such media either: even now a lot of “tech” journalists have been disparaging it (and Mastodon in particular) for being too different from Twitter (but mostly for it not being corporate-controlled). Worse, just as most of them are “blissfully” unaware that Mastodon is just a small (albeit significant) part of the Fediverse, with Barcelona joining in they'll be guaranteed to completely reframe it all in terms of Meta's platform: no corporate outlet will give the Fediverse as a whole any serious weight, it's not how they reason, and most importantly, it's not what they'll be paid to do. You can be sure Meta will strive to make sure the entire federation discourse be reframed in terms of their platform, to help the Fediverse (and Mastodon in particular) disappear from the collective consciousness.

And as I mentioned already, I agree that it's important that people migrate off Meta services, and the Fediverse is the best there is now to help them to do that. But federating with P92 will not help achieve that. P92 is neither Facebook nor Instagram, it's a new thing, and when (not if, but when) it will defederate, it'll push a lot of people to move from the Fediverse to P92 to remain in contact with the people there. I predict that federating will not “buy” the Fediverse any new user migrating off the Meta platforms, but it will cause it to lose a lot of active users when Meta pulls the plug.

One of the aspects where @atomicpoet is (IMO) exceedingly optimistic is that Barcelona federating would give Meta users the opportunity to experience the Fediverse, which could make them interested in migrating off Meta's platform, since with P92 in the federation they can do this without losing their social graph. But interaction with users from other platforms does not raise awareness of the existence of these other platforms, as we see with Mastodon even now: a lot of people interact with users on other platforms and remain blissfully unaware, because Mastodon hides this information, both by hiding the platform of origin and by crippling content that isn't just short notes with no formatting. Unless someone actively bombards them with the information, the fact that other platforms exist and interoperate remains a “behind the scene” detail for most, and often a source of issues due to subtle differences and incompatibilities between these platforms and Mastodon. If we can't fix this for Mastodon, we have no hope of making it work against P92, that will go to extra lengths to hide this information from the users on purpose.

On a technical level, the Fediverse platforms competing with Meta's largest horses, Facebook and Instagram, even when reasonably mature, are still far from giving potential migrators all that they would expect. People have complained about Twitter feature missing from Mastodon, despite Mastodon largely offering a better feature set than Twitter. It would be even worse with Facebook/​Instagram versus Friendica/​PixelFed.

And even if the Fediverse platforms were mature enough to work as drop-in replacements for Meta's, there still aren't enough servers out there to absorb a potential mass migration —and especially for media-heavy platforms, this is unlikely to change any time soon.

Finally, even if there were enough hosts, people still wouldn't migrate, couldn't migrate actually, because Meta is not going to federate Facebook and Instagram: they're setting up P92 as a separate platform, so they can control what its users will see. P92 users will not wonder why PixelFed users have reels, because Meta will make sure that incoming posts will be crippled enough to not show anything that may make them wonder. Again, even Mastodon often fails a lot in managing correctly inbound content —and that's not out of malice. Now image the same, but with the express intent of making the Fediverse experience as crippled as possible so that people stick to Meta's platform.

What's next?

I conclude here the part about P92/​Barcelona and how (most likely) Meta will try to use to take a jab at all the competition, both corporate and open/​decentralized. I've posted an appeal to Fediblock their instances on Mastodon, in a most probably misplaced hope that it will be heard enough to avoid the worst-case scenario.

Next up (when I finish writing it) will be some (hopefully shorter) considerations on BlueSky, the new brainchild of the former Twitter founder Jack Dorsey, and how it's likely designed to be a different approach to corporate capture of the idea of federation, and thus the second prong of a coordinate attach on the open Fediverse.

Not by AI

Is it a good idea to sport a badge to attest that one's creatoins are done without using Ai-based tools?

This is a long-form post based on a previous Mastodon thread by yours truly

Some time ago I came across a Mastodon post by @stammy​@stammy.design linking to a website proposing “Not by AI badges”. The idea is to offer a preset badge (or rather 3 such presets) to “certify that your work (writing, painting or audio/visual) has been produced without (or rather with “less than 10%”) AI (or rather, as proposed by Stefano Quintarelli, SALAMI).

The idea piqued my interest, but I didn't have time to look into it when I first came across the site, so I just bookmarked it, leaving it for a time when I could look into how to add such a badge here. When I finally had time to look into it, I was underwhelmed by what it turned out to be.

The first thing that surprised me was the 10% threshold. If it's “not by AI”, I would expect the threshold to be either 0% (no AI involved at all) or 50% (majority of the work done a human). 10% sounds pretty arbitrary, and it's weakly motivated on the website:

The 90% can include using AI for inspiration purposes or to look for grammatical errors and typos.

Second, there are some very stringent rules about the badge use beyond the 90% criterion (placement, size, no modifications). I can guess some of the reasons why they would want to exert such control on these parameters, but the fact that no licensing or copyright information seems to be available on the website or the download package makes the use of the badge troublesome.

Third, still on legal, the badge holds no actual value, and at the time of writing the site actually is still asking for legal expert to get in contact with the organizers (who are they?) to

explore the potential of formalizing and regulating the use of the Not By AI badge.

Fourth, as if it wasn't enough that nothing is said about who's behind the initiative, they seem to have no Fediverse presence. And yes, this is important: especially when combined with the restrictive terms of use of the badges, it's an indication of a substantial lack of interest (at best) or antagonism (at worst) towards free culture and humane tech, which —given the alleged mission behind the badge— is a significant failure.

So, this «Not by AI» badge has no legal, nor technical (as remarked in a comment to my original thread) merit. What's the use, then?

The most obvious reason to apply the badge seems to be for clout: «proudly show that you're still doing things the human way!»

But I think there's a much more sinister reason behind it, that emerges when you think about cui prodest, who benefits from this badge existing and being so clearly recognizable?

The answer is that the main objective is to tag human-generated content to help scrapers identify non-AI content. Why is this important? Because as the Internet gets flooded by low quality machine-generated content, it will be more and more difficult for scrapers to find actually useful content to train their AI on (see also this minithread by @jetjocko​@mastodon.social).

And what's the best way to achieve that? Ask humans to tag the contents themselves! (Ironically, this need is highlighted on the “Not by AI” website itself, although it's spun under a pretense of “helping humanity move forward”, rather than as a way to feed sufficient noise into SALAMI models to make them produce more varied content.)

In other words, my suspicion is that by using the badge you'd just be setting yourself up for maximum targeting by SALAMI training bots: you'd be certifying to scrapers that your creative output can be used to train their models safely.

February 2026 update: my suspicons have been confirmed, as explained in this long post by @imbl​@treehouse.systems.

Don't use the badge

I'm not saying that it's not important to advertise when one's art has been produced without AI: there absolutely is merit to it, especially when looking at the implications of using a tool built on nonconsensual exploitation of the creativity of others. But if the intent is to communicate human-to-human, there is no benefit in automating the signaling of human effort and making it machine-processable —in fact, it's going to be counter-productive in the long term.

I don't dislike the idea of the badge. In fact, I may even be working on rolling my own. And it'll be a hand-rolled multi-lingual SVG, like I do. And it will be released on a CC-BY-SA license, and people will be encouraged to modify it, customizing it for their own purposes.

On the other hand, everybody using a derivative of my hypothetical badge would still be “detectable” by AI scrapers, so maybe that's not such a hot idea …

Meanwhile, consider looking into the brilliant alternative by @XanIndigo​@writing.exchange, an “AI data usage statement” that reads:

By using any of my writing, including social media posts or other communications, as training data for any AI writing tool, you are hereby agreeing to be legally bound to pay me for any and all text used, at my special AI premium rate of €5.50/word.

All payments are required in full within seven days. Late payments or use of any of my writing without disclosure may incur an additional penalty fee of up to €18/word plus full payment of any necessary legal fees.

I doubt it would be legally enforceable —which is, to be honest, quite a pity— but it undoubtedly works both as testament of the humanity of the work it refers to, and to show a clear intent in the relationship between this and the possibility for SALAMI to train on it —something which is completely omitted from the “Not by AI” badges.

And it's really time we started to be clear and loud on something much more important than our work being “Not by AI”: the fact that our work is “Not for AI” either.

By humans, for humans.

The <switch> element

Things I wish HTML inherited from SVG: the <switch> element.

Introduction

This is an expanded, blog-form version of a recent thread of mine on Mastodon

I love the SVG format. It's not perfect, but it has some amazing features, and with all the issues in its support across different browsers, it still remains a solid vector graphics format.

One of the things I love the most about SVG is that it allows interaction and dynamic content without requiring JavaScript.

This isn't actually an SVG feature per se, but it's related to the specification integrating support for SMIL, an XML language for dynamic content.

SVG also supports and incredibly powerful element: <switch>. The combination of switch and SMIL allows some impressively sophisticated things do be achieved in SVG, without using JavaScript or serverside funkiness: and honestly, I love these features so much that I really wish HTML was extended to support them too.

In fact, there was an attempt to add SMIL support in HTML: it was called TIME (Timed Interactive Multimedia Extension), and was proposed by Microsoft and Macromedia and, after being submitted to the W3C, evolved into the W3C Note (not even a recommendation) for XHTML+SMIL.

No other browser than Internet Explorer ever added support for it, and honestly, I see that as a loss.

With the integration of MathML and SVG standards into HTML5, there is actually some hope (if just a sliver) of things moving forward in this direction, although I doubt any of the existing implementation actually plans on investing resources in it. One of the benefits of having more competition in this area would be better chances of a growth in this regard.

I actually wonder if some kind of JavaScript polyfill could be created to implement support for these features without UA support. It would be suboptimal, similarly to how MathJax is inferior to UA support for MathML, but could work as a stopgap solution to promote the adoption and standardization of these extensions.

An HTML switch polyfill?

I've tried a quick test to see if you can exploit the HTML5 inclusion of SVG to do without the polyfill, but since you can't just randomly throw SVG elements in the HTML parts of the document and expect it to work, to make it actually work you need a double wrapping, passing through the SVG foreignObject element and put the HTML in there:

body > svg > switch > [foreignObject > your HTML here]+

and this requires a lot of efforts because sizing and spacing have to be handled manually.

You can almost implement the SVG switch element in pure HTML + CSS with something like:

switch > * {
    display: none;
}
switch > *:lang(...) {
    display: initial
}

with only one issue: there's no way to put in that :lang() pseudo-class “whatever the user asked for”.

So you still need some JavaScript or server-side assistance to bridge the gap between the user language selection and the styling.

So close, yet so far away …

An HTML switch polyfill

If we do things a bit more cleanly in CSS (to account for switch elements inside SVGs embedded in HTML5), and add a little bit of JavaScript to handle the language check, it turns out you can polyfill a switch element for HTML!

(How? I'll show this a little bit later.)

Testing this across browsers, however, I ended up discovering that when it comes to the SVG switch element, there are discrepancies in which child is selected when the user voices a preference for multiple acceptable user languages

Choosing the “best” language

So: the switch element is typically employed together with the systemLanguage attribute of its immediate children, as a way to display different content depending on the language choice of the user. Per the specification, the switch element should select

the first child that matches the user's language preference.

Now, there are two ways to do this when the user accepts multiple languages.

One is: for every language accepted by the user, find the first matching element.

The other is: find the first element that matches any of the user languages

It turns out that Firefox adopts the first strategy, while WebKit and Blink browsers the second.

Which one is correct?

If I look at the SVG specification about the systemLanguage attribute, the text says:

Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match of one of the language tags given in the value of this parameter, or if one of the language tags indicated by user preferences is a case-insensitive prefix of one of the language tags given in the value of this parameter such that the first tag character following the prefix is "-".

My interpretation of this is that the correct way to handle the switch element would be the second one (used in WebKit/Blink) rather than the first one. On the other hand, when it comes to the specification of the switch element, we have

In SVG, when evaluating the ‘systemLanguage’ attribute, the order of evaluation of descendant elements of the ‘switch’ element must be as if the 'allowReorder' attribute, defined in the SMIL specification always has a value of 'yes'.

This means that a UA can reorder them so that the match with the highest preference has priority, and this is correct too. In fact, the SMIL specification clear says about allowReorder:

User agents are free to ignore the allowReorder attribute, but if they implement prioritized language ranges as defined in BCP47 they are expected to use that prioritization to reorder children with systemLanguage attributes. The effect should be that the users are presented with the alternative that best matches their language preferences. Any final child without systemLanguage attribute should retain its place as the default item to present.

Authors should add the allowReorder attribute if all items in the switch are equivalent.

So I hate the SVG switch element now. (OK, not really, but I dislike that different results are possible still following the specification).

It turns out that both interpretations are possible: the indication about allowReorder is that if true the UA should prioritize languages by the user preference, but the UA is free to ignore it, so one may consider Firefox to be better adhering to the specification spirit (give the user control), but WebKit/Blink are still correct simply by ignoring the possibility to reorder (which is good for speed, even if by the note above, that is only informative, they would be expected to do the reordering).

Now, why is this important for me? Because I have to choose which strategy to implement in my JavaScript of the polyfill for the switch element in HTML: the “fast” way (no reorder) was easy to implement, but the reordering one should be contemplated too, and possibly given preference.

To reorder or not to reorder?

To clarify, the difference is that with reordering, the reader has priority in choosing the version, without reorder it's the writer that chooses.

Let's say I write a text in Italian, but also produce an English translation. My preference as a writer would be for a reader that understand Italian, even if it's not their preferred language, to read the original Italian text. With the reordering, the user preference for English over Italian means they would get the translation, even if they could understand the original.

One of the interesting advantages of the polyfill is that at least conceptually it can be overridden, for example providing interactive elements to allow users to force a specific language without changing the browser preferences. I'm not sure this is possible in SVG. (I tried, and couldn't make it work without duplication, but this may be a UA issue, I'll have to take it up with them).

SVG switch element in action

By the way, if you're unfamiliar with how the SVG switch element works, you can see it in action in some of the SVGs shown below.

All of them have text in them (some more, some less), and you will see the text in some language, but others will see it in a different language. Which one you see it in depends on a combination of your language preferences configured on the browser, and on the actual browser you're using.

If you wish to actually see the element in action, and the text changing, you will have to (temporarily) configure your browser to prefer different languages, and reload the images.

Hybrids

The first multilingual SVG I explicitly coded for the wok is the printable SVG template I prepared to play Boulet's “Hybrids” game (see also the Italian article I wrote when I first published it):

Printable template to play Hybrids
Hybrids (template)

Whatever little text is there, it should be translated in your language (assuming it's one of: en, it, fr, de, ca, es, pt) —if yours isn't there, and you let me know the singular for dice and player, I'll try adding them. Corrections welcome too.

On the usefulness of prayer

My first attempt at using switch was actually much older than that, and it was an attempt at recreating in SVG a meme on prayer that has been circulating on the Internet at least since 2011:

A flowchar showing that prayer isn't useful, because it all depends on God's plan
On the usefulness of prayer

I'm not interested in debating the meme here, so please spare your time (and most importantly my time) and go debate it somewhere else (such as this 2014 blog article about it), but if you do wish to provide translations for the text in other languages, then please do let me know: currently I only have Italian, French and English (the latter should be the fallback, so the one you see if your primary language is not among the supported ones)

Circular reasoning works because circular reasoning works

The last one is something I originally only did in English only, again based on a who-knows-how-old meme circulating on the Internet, so I took the opportunity of this article to revamp it and add additional languages:

A circular text reading: circular reasoning works because
Circular reasoning works because circular reasoning works

Again, you should be seeing the text in your language, provided it's among the supported ones (Italian and French), or English otherwise. Please do let me know of translations in other languages, I'll gladly add them, and do let me know if any of the translations are not up to par.

HTML switch element

Let's see now how the HTML switch element can be polyfilled. The ingredients are:

  • a browser that handles unknown HTML elements correctly;
  • a few lines of CSS styling to determine when children of the switch element should be shown;
  • a few more lines of JavaScript to actually mark the children appropriately.

The additional conditions are:

  • the default styling should display the fallback switch child (if present) if JavaScript is disabled;
  • neither the styling nor the JavaScript polyfill should handle switch elements that are handled by the browser

An example of all this has been neatly packaged up in this sample test file.

The CSS polyfill

The CSS style is relatively simple:

switch > * { display: none }
svg switch > * { display: initial }
switch > *:not([systemLanguage]):not(.html-switch-false)
{ display: initial }
switch > .html-switch-true { display: initial }

It hides immediate children of the switch element with the following exceptions:

  • when the switch is a descendant of an SVG element (because these will be handled by the SVG renderer in the browser);
  • immediate children without a systemLanguage attribute, unless they are marked with the class html-switch-false: this ensures that the fallback is handled correctly (even if JavaScript is disabled;
  • immediate children with an html-switch-true class.

Obviously, the html-switch-true and html-switch-false classes are the ones that will be set by the JavaScript polyfill to mark items that should (not) be visible.

The style is not perfect. For example, it doesn't handle HTML switch elements that would appear inside a foreignObject inside an SVG, which may cause issues (I haven't tested), and if no JavaScript is used and more than one child has no systemLanguage attribute, they will all be shown.

The JavaScript polyfill

This is where the “magic” happens: on document load, we run a function that goes over every switch element that isn't recognized by the browser (and is thus represented in the DOM as an HTMLUnknownElement), and finds “the first child that matches the user language”. Both reorder and no-reorder versions of the algorithm are possible, and have been implemented in the sample file. (I'm not going to paste the code here; it's not long, but not even short.)

Like for the CSS, the JavaScript I've implemented so far isn't perfect: it doesn't play nice with dynamic content (although one may wonder why a switch element would be generated via JavaScript), and it hasn't been thoroughly tested. I also have no idea how well it plays with accessibility (although I would assume that the display: none CSS would make it work ‘as expected’; do let me know how it works for you, though).

Lessons learned (and things to look into)

Issues with the SVG switch element and its implementation

With all its power, the SVG switch element has some limitations, the most important of which is that only a limited subset of the SVG elements can be used as children, and the element itself can only be use as child to a limited set of other elements.

This leads to a lot of duplication. For example, in reference to the circular reasoning example, the text and textPath elements have to be duplicated for each language, rather than using a single text > textPath nesting with a switch on tspan elements for each of the languages.

While there may be good reasons for these restrictions (for example, different languages may have very different requirements in terms of sizing and proportions of the elements), it makes the use of the switch element exceedingly bothersome whenever those reasons do not apply, and especially when the author has to go n and introduce changes to the wrapping elements that could be shared by all variants.

Even worse, it makes it much harder to build SVGs where language selection can be done both switch-wise and through dynamic interactions.

(Of course it's also possible that I'm just missing some obvious alternative solution —my knowledge of SVG is still largely amateurish anyway— or the browser is failing me.)

The fact that user agents with the same language settings can produce different results is also annoying, and potentially disruptive. It can be argued that the “no reorder” path taken by WebKit and Blink is lazy, but ultimately it's the specification not being stricter in this regard that gives them the leeway to act this way.

Ultimately, the possibly biggest issue at hand is that most UAs don't provide a simple way to change the language preference. I had the opportunity to discuss this also in a separate context in this Mastodon thread started by @partim@social.tchncs.de: we really need some fresh blood in the browser space to bring forward “revolutionary” ideas like … allowing the user to choose a language easily without requiring each website to reinvent the wheel in this regard.

Should I propose an HTML switch element?

The WHATWG apparently has a procedure to ask for new features. I guess if I had some time to throw at this I could go there and submit a proposal to add a switch element to HTML too, or even to incorporate SMIL support into HTML5.

However, I have my doubts, even with a polyfill like the one presented here available for demo purposes, that this would garner enough attention, given implementors can't even be arsed with properly supporting multilingual titles in SVG, or giving users easier (and more fine-grained) controls on their language preferences for websites.

(That being said, if anybody wants to give it a go, I'll be happy to support them. I even have the use case right here.)

Überprüfungslisten

Riscoprire con soddisfazione cose proprie dal passato.

La riscoperta

Ho recentemente riscoperto una cosina a cui avevo lavorato qualche anno fa: due check-list “grafiche” di cose da ricordarsi di verificare prima di uscire di casa o di andare a dormire.

È stata una riscoperta appassionante: ho rivisto quello che avevo fatto non solo con piacere, ma direi quasi con ammirazione. L'intero minisito è un piccolo gioiello di tecnologia web codificato a mano.

Le immagini

Le immagini che descrivono le cose da non dimenticare sono SVG, codificati a mano come a me piace fare. E benché io non mi sia mai considerato un artista particolarmente capace, rivedendo le immagini non ho potuto fare a meno di osservare che in realtà queste sono piuttosto ben riuscite. Seriamente, guardatele:

Disegno vettoriale di una bombola blu
Guarda che bombola!

Guardatele!

Disegno vettoriale di una lampada a basso consumo, bianca, inclusa la vite d'attacco, grigia
Guarda che lampada!

Peraltro come già osservato, le immagini in sé sono piccole: la maggior parte dell'SVG di queste immagini è composta da metadati, tra cui in particolare il titolo e la licenza.

Sorgente del minisito
Überprüfenlisten:
git clone //labrador.oblomov.eu/uberprufungslisten/.git

(Per inciso, l'intero minisito in questione è sotto licenza Creative Commons BY-SA 4.0, quindi potete approfittarne, se vi servisse qualcosa; è facilmente scaricabile con git come indicato nella nota a margine.)

Le liste

Il minisito è composto da tre documenti principali: l'indice, e le due liste, individualmente raggiungibili dai “titoli” con cui sono indicati (e da cui sono collegate) nella pagina indice, ovvero la lista per uscire di casa, e la lista per andare a letto

Nel creare il minisito, mi si è posto un problema: come evitare di scrivere le liste due volte, una per la lista vera e propria ed una per l'indice? (Ricordiamo che l'obiettivo qui era di fare tutto a mano: quindi niente strumenti di pre- e post-processing: solo pure e semplici tecnologie web!)

E mentre ci siamo, anche, se possibile, come evitare di scrivere a mano tutta l'impalcatura per qualcosa che dovrebbe essere un semplice indice?

A venirmi in aiuto è stata una delle tecnologie forse piú odiata dai web developer, e che per molti versi è stata “uccisa” dal suo stesso successo, che l'ha trasformata in una insopportabile buzzword: l'XML.

Chi ha letto i link con attenzione avrà notato infatti che le due pagine/liste non sono i classici file index.html, bensí index.xml, e visti senza “stile” sono poco piú di un elenco <item id='nome'>Titolo</item> (ed anzi ora sto pensando che è possibile semplificarli ulteriormente, quindi forse quando leggerete questo articolo scoprirete che sarà solo un <item id='nome' />).

Aggiornamento: Q.E.D., ho ulteriormente semplificato gli indici come annunciato sopra.

Il potere trasformativo dei fogli di stile

La “magia” che trasforma questi XML in HTML è quella dell'XSLT, un linguaggio per trasformare XML in altro XML o, come in questo caso, in HTML.

Le singole liste sono quindi XML che specificano un foglio di stile XSL il quale, applicato dal browser stesso quando la lista viene aperta indipendentemente, trasforma il file XML in un documento HTML completo che il browser può visualizzare senza problemi. La pagina indice del minisito, invece, è un file XHTML che indica un altro foglio di stile per si prende cura di importare i contenuti delle singole liste in un formato piú compatto.

Riflessioni conclusive

Pur sapendo molto bene perché l'XML e l'XSL siano odiati profondamente nell'ambiente del web development, ed avendo avuto io stesso momenti in cui sentivo mio il sentimento della famosa barzelletta dei linguaggi di programmazione che prendono l'XML a mazzate sulle gengive, non posso negare che questi strumenti abbiano la loro utilità —ed ironicamente che questa è tanto maggiore quanto piú si deve (o si desidera) creare documenti web in maniera “artigianale”, con l'instancabile etica lavorativa dell'amanuense.

(Ironicamente: perché, sinceramente, scrivere XML ed XSL a mano è “'na fatica”.)

Forse il modo migliore per illustrare il mio pensiero è dato dal confronto tra queste liste ed il ben piú utile Planner.

Anche questo minisito è stato creato a mano, ma a differenza del precedente è composto solo dai classici HTML, CSS ed SVG, con l'aggiunta di un pizzico di JavaScript per la selezione della data. Il piano della settimana è sostanzialmente una tabella (OK, una raccolta di tabelle) piena di righe vuote. Farle tutte a mano è stato abbastanza noioso, anche se non troppo difficile grazie alla magia del copincolla. Ma non sarebbe stato piú raffinato crearle automaticamente con un XSL, riducendo notevolmente le dimensioni del file? (Ovviamente, qui entriamo nel famoso dilemma: conviene o non conviene automatizzare?)

(E prima che qualcuno intervenga a dire “eh, ma se usi già JavaScript potevi fare tutto con JS”, la risposta è no: solo il minimo indispensabile, sicché il documento sia fruibile, se pur con funzioni ridotte, anche senza. La cosiddetta graceful degradation è uno dei cardini del web accessibile.)

Alla fine, l'XML ed i suoi fogli di stile sono uno strumento come un altro, con pregi e difetti. Passata la manía di XMLizzare tutto, possiamo adesso rilassarci ed apprezzarne (e adottarne!) l'uso dove questo abbia senso ed utilità.

Post Scriptum

Qualcuno potrebbe intervenire dicendo che sfruttare l'XSL è “barare”, non stiamo piú parlando veramente di documenti web “prodotti artigianalmente”, che la strategia che ho adottato non è da amanuense, al massimo da Gutenberg.

Sinceramente, non sono d'accordo. Anche ammettendo che questo approccio stiracchia un po' l'idea della manualità, rimane una soluzione sostanzialmente costruita sugli standard web, senza l'intervento di linguaggi di programmazione1 per la produzione dei contenuti.

Post Scriptum 2

Perché i titoli in tedesco? Perché è lo stereotipo della lingua per le parole composte e desideravo chiamare ogni lista con una sola parola. Peraltro, è una lingua che non conosco se non in maniera molto superficiale, quindi non ho nemmeno idea se le parole che ho scelto alla fine siano giuste.

(Spero che i madrelingua non si offendano all'abuso che ne ho fatto.)

@waltertross@mastodon.online ha avuto la cortesia di correggermi il tedesco: non è überprüfenlisten, ma überprüfungslisten (mi sembra di aver capito che -prüfen sarebbe il verbo, mentre -prüfungs sarebbe il sostantivo (?)).

Ho corretto e messo qualche redirect (spero correttamente!), grazie Walter.


  1. no, non condivido l'interpretazione che vede l'XSLT come un linguaggio di programmazione. ↩

An Opera Requiem, Part III: requiem for the open web?

Revisting the open web 10 years after the rendering engine switch of the Opera browser.

Foreword

I first started writing about the grim outlooks of the Opera browser over 10 years ago, forecasting dark times for the open web when the browser switched to Blink the next year, and declaring the Opera browser we knew and loved gone for good when the spin had surpassed any sense of decency after one year more. Since then, Firefox has become my daily driver, and I haven't had much to think or write about Opera until recently, with a long Mastodon thread where I brainstormed about the relationship between the browser, other browsers, and the open web. This article collects those thoughts in a (possibly more organic) long form, both as a means of preservation, and in favor those that find long Mastodon threads not to their liking.

Opera and the open web

I don't think people appreciate the role that Opera Software played in fostering the open web and “indie web” during the first browser wars (when the Opera browser was still built on their proprietary Presto engine), and a fortiori the role it had in their demise (when they switched to being “just another WebKit/Blink skin”), despite their browser never even reaching a 3% market share.

In the five years between the creation of the WHATWG and the switch from Presto to WebKit (and then Blink) by Opera, their role within the working group was essential as an independent standard implementor. Anything that was supported by two out of three (at the time, Apple, Mozilla, Opera) vendors meant different engines implemented the standard. Today, three out of five implementations agreeing is meaningless, since they are most likely just WebKit and its forks.

The Opera/Presto browser was pretty close to being a “Swiss army knife” for the web. Aside from the browser with a solid and modern rendering engine with decent standard support (for the time), it also integrated (in the same UI!) a workable email client, a decent IRC client, and a competitive RSS reader. The browser itself not only had better support for web standards than some of the competitors (including WebKit) in many areas, but it also put effort in supporting microformats.

As an example of how the Opera UI fostered web standards, not only it did automatic feed discovery (allowing subscription to RSS feeds even if they weren't announced on the visible part of the web page), but it famously featured a navigation bar with next/prev/up/top links that could be extracted from appropriately rel-marked link elements in the page (and for many common cases even when they were not properly rel-marked).

But the most impressive (and underrated) feature of Opera was Opera Unite. First introduced in 2009 in a beta release of Opera 10.10, Opera Unite was a web server that allowed JavaScript server-side scripting to write small static and dynamic websites that were accessible either directly (using UPnP to expose it on the Internet) or through a proxy service offered by Opera itself.

Read that again: in the years before its demise, the Opera/Presto browser not only integrated features to access a large chunk of the Internet aside from the web (email, USENET, IRC), but it featured a web server. In a period where most major players were working towards centralization of the web, Opera pioneered an effort that —if successful— would have made it possible for every Internet user to take both a passive and an active role in its participation.

Opera in the Presto days was a pioneer. I already mentioned in the other articles of this series some examples of the UI and technological innovations first demonstrated by Opera and made famous by other browsers. To those examples I will add two more. Anybody that enjoys a Progressive Web App today should be aware of the efforts made by Opera to standardize their Widgets feature, even if the standard they promoted was ultimately obsoleted by the current one, that relies on modern client features that were not available at the time. And the Opera-designed “demonstrative” Unite Applications were media, photo and file sharing applications. Does that make you think of anything?

Sometimes I wonder how different things could have been if the timing had been different. When Opera Unite was first announced, ActivityPub wasn't a thing yet, StatusNet had just been born, diaspora* didn't exist, and the only other major bidirectional federated protocol was XMPP, that had existed for 10 years and was in the process of being “Embrace, Extend, Extinguish”ed by Facebook and Google.

I have no problems imagining a different timeline, where ActivityPub had been already a better-established thing, and the demo Opera Unite applications for media and photo sharing had implemented basic support for it, resulting in self-hosted lightweight alternatives to Pixelfed or Funkwhale.

And this is actually the vision I have as ultimate goal for the Fediverse: one where, thanks also to client support, hosting and participation become even more trivial than setting up a static website.

Requiem for the open web?

In many ways, Opera giving up on their Presto engine marked not only the end of the browser war, with WebKit/Blink the uncontested winner, but it also marked the end of truly inspiring (inspired?) client innovations for the open Internet, although possibly not entirely by its own fault, since in the same period Firefox also largely seemed to “give up” on that front, even going as far as removing features they had (such as their RSS support).

With the modern Opera browser now just a derelict ghost of its past self, hooked into proprietary initiatives (think of its Messenger for closed silo networks) and cryptocurrency shilling, some of its legacy is now being carried by another Chromium skin/fork: Vivaldi. Although I do not appreciate it being partially closed source, or its reliance on Blink (that for example precludes JPEG XL support), it does seem to be still interested in keeping the spirit of the “Swiss army knife of the (open) web”.

One of the interesting ways in which this shows up is that in addition to email, RSS and calendars, Vivaldi has also actively promoted support for Mastodon, in a very simple yet effective way: providing a Web Panel for their instance, and allowing you to add your own. I expect the same will work on other Fediverse platforms, as long as they provide a functional web interface with good “small screen” support (since this is effectively what the Web Panels use).

The Vivaldi browser is the closest thing we have to a “Swiss army knife for the open Internet” today, and yet it doesn't even have feature parity with the late Opera/Presto. For example, it has no IRC client.

But in the context of my vision for the Fediverse, the most glaring omission is the lack of an equivalent to Opera Unite, an incentive to the development of easy-to-deploy self-hosted websites.

Even if Vivaldi (the company) did share my vision of an open web, I have my doubts that it has the energy and workforce necessary to push it. The fact that their main product is proprietary (despite the abundance of open source software they leverage) is also a downside. Getting Mozilla on board would be of great help in this, but considering the downwards direction they have taken with Firefox, that's even less likely (seriously, not even RSS?), which is a pity, because two independent browsers implementing support for a common lightweight server applications framework in the spirit of Opera Unite could be a major push in the right direction. And even if Vivaldi did invest in something like that, their efforts alone would get nowhere.

People may dismiss the usefulness of the “Swiss army knife” concept pushed by Opera/Presto up to 10 years ago, and by Vivaldi now, citing “bloat”, “lack of focus” or the classic principle of doing “one thing well” instead of a 100 things poorly (sometimes called the Unix philosophy). There is merit to the objection, but I have never seen it put in practice as it should be: on the contrary, feature rejection, or even worse removal, have been to the detriment of “doing one thing well”.

Two of my pet peeves in this regard are with Mozilla Firefox, and in both cases they are about feature removal because of perceived bloat.

The first is the removal for the support of the MNG format. The purported reason for this was the “bloat” coming from linking a 200KB library. Reading the issue tracker for this, 20 years later when Firefox installations are 200MB and counting is … enlightening.

I still care about the MNG format support not for the format itself —it's quite clear that it irredeemably failed— but because the same argument can be used in the future to stymie adoption of other formats such as JPEG XL, which is currently supported in Firefox Nightly, and will likely receive the same treatment (I wonder if with the same excuses) now that Google has decided to drop support for it from Chrome.

In other words, the issue isn't so much with the specific format (although that has its importance: MNG was the best we had at the time for a unified format that supported animation, transparency and optionally lossy compression), but the active choice to not uphold the interests of the open web.

The same thing holds for the second pet peeve of mine: Mozilla's decision to remove RSS and Atom feeds support.

Firefox had some support for all three aspects of web feeds support (discovery, visualization, subscription), and it was all wiped out with the release of Firefox 64, with maintenance cost being the (purported) reason. Even if we accept the motivations and that WebExtensions would be the best way to reimplement the features, the question remains: why didn't Mozilla provide an official extension for it?

If you want an example of why an absence of feed discovery built into the browser (or at least offered through a default-installed official extension), consider this recent post by @atomicpoet@mastodon.social on @fediversenews@venera.social —having to jump through hoops, looking at the page source code to find web feeds because the browser has removed the discovery feature is something that can trip even competent experts.

(And yes, the website could advertise the presence of the feeds on the visible part of the page, and the absence of visible links is to be blamed on them, but on the other hand: why duplicate the information when the browser can (and actually used to!) show you the information advertised in the document metadata, where it is supposed to be?)

Mozilla's choice to remove their built-in web feed support without providing an official extension to carry on the legacy is another strike to the open web and indie web on their side.

I often wonder what has been going on inside Mozilla. Firefox reached its largest market share (around 30%) some 10 years ago. Since then, it has been inexorably losing market share. There is little doubt that this has been largely due to the growth of mobile and Google's unfair marketing advantage, but I have little doubt that Mozilla's response has been the worst possible one: they have chosen to get into a “race to the bottom” based on mimicry instead of playing to their strengths or finding new ones through innovation. I can't say for sure that their market share wouldn't have fallen this quickly if they had taken a different path, but I know for sure that there are people who switched because Firefox didn't have anymore a compelling reasons to be used over the competition.

Again, this isn't about MNG or JPEG XL or RSS or web feeds support specifically: it's about the policies priority.

I do understand and appreciate that even just the maintenance of the engine to keep the pace with the evolution of the web standards is a huge undertaking —it's why so many browsers have just given up and chosen to “leech” on WebKit or Blink instead. But when the only reason to use your browser is that it's the only FLOSS alternative to Google's, you have a problem.

The fact that Vivaldi, a Chromium reskin with some proprietary glue, has more personality than Firefox (that doesn't even seem to have a Fediverse presence) is something that should really be a wake-up call for Mozilla

And before anybody gets into the comments to praise Mozilla for its history of web standards and user privacy defense —I don't need you to remind me of that. That's not the point. The point is that to actually be able to do that you need something more than “I'm not Google”. And the irony here is that while Firefox has nothing to claim for itself other than “not Google”, Vivaldi does, even if it's still using Blink as web engine, and is thus subject to Google's whims on that side (one example for all: concerning JPEG XL support). Heck, even the new Opera is more than just “not Google” —even though it's pursuing all the wrong “personality” traits for that.

Why is having a personality important? Because it's one of the pillars on which your capability to defend your position is founded. Mozilla cannot protect web standards through Firefox if their go-to solution is to remove support for standards that don't get the adoption they wish for in the timeframe they expect: nobody is going to adopt a standard if there is a credible threat that support for it may be senselessly removed in the near future.

The Do Not Track header has been deprecated, and has been largely useless because it was never adopted by most advertisers, using the cop-out of it not being legally binding. Despite this, Firefox (and most other browsers, with the only exception of Apple's Safari AFAIK) still support sending the header, despite it being arguably a waste of bandwidth and implementation resources (UI options to control its settings, JS access to it, etc). Why do they still do it?

Because it's part of their personality: even if just at face value, DNT header support is a signal that the browser cares about user privacy. (Don't get my started on the “new” Global Privacy Control standard when it would have sufficed to update the DNT spec in relation to the new legislation.) So while one could place reasonable confidence in Mozilla upholding past, current and future privacy-oriented standards, I don't feel the same concerning the open web.

I'm sure people have different ideas about what does it mean to support the open web. I think first and foremost it means allowing users (on both sides of the connection) to use the protocols and file formats of their choice. Every time a browser fails to implement (or worse decides to remove) support for a standard protocol or file format, it's failing the open web. Half-assing implementation of web standards was basically Microsoft's staple behavior during the first browser wars.

Microsoft had reasons for this: at first it was because they didn't “get” the Internet, later on it was because it was the only way they had to (attempt to) control it. They did all they could to cripple it. Remember when Opera Software released a “Bork” edition of their Opera browser in response to Microsoft serving them intentionally broken CSS? Now imagine what the Internet would have been like if Opera, Mozilla and few others hadn't held their ground.

If you think what Microsoft did was insane, consider this: Vivaldi had to change their user agent identification because Google, Facebook, Microsoft and even Netflix were intentionally breaking their websites when detecting the Vivaldi browser GAFAM are against the open web —and the worst in the bunch is Google, that also holds a dominant position with their browser both on the desktop and mobile space.

But the worst here isn't that Google is actively against the open web: it's that in contrast to the first browser wars, there is really nobody left to stand up to them. Consider for example Dave Winer's write-up on Google's effort to deprecate (non-secure) HTTP, and consider that Firefox, the only actual alternative, is also on Google's page, albeit less aggressively so.

Under the same pretense of security, support for classic (some would say obsolete) protocols such as FTP and Gopher has already been removed from all major browsers. In some browsers, such as Firefox, this has been an intentional choice. Others, like Vivaldi, have been basically forced into this position by their reliance on Google's engine.

And yes, I claim that security is just a pretense. Ad networks known to sell your data to the highest bidder and serve malware don't give a rat's ass about your security and privacy. The only thing they care about is making sure they are the one getting your data, and they are the one serving you the ad, even if it's malvertising.

(Firefox may not have such motives, but they definitely have an interest in reducing the code base, making maintenance easier for them. And as several have commented on Mastodon, they depend on Google for revenue, which makes them indirectly interested in toeing Google's line.)

When I first started putting down in writing the thoughts that would lead to this article, I didn't actually plan for it to turn so depressing. The original intent was quite the opposite: to celebrate the importance of even the smallest contributions in the resistance against apparently overwhelming odds, and even when the outcome is still not really the fair, open Internet one might have been fighting for. I could go with the Ursula K. Le Guin quote against capitalism's apparent inescapability now, but I think we can do better.

Someone may observe that protocols other than HTTP(S) are irrelevant in a discussion about the open web —which would be one of those pedantic, technically correct (the best kind of correct!) observations that completely misses the point. Yes, it's technically true that the World Wide Web is built on the HTTP protocol and the HTML and related file formats and specifications (such as CSS and JavaScript). But there is no open web without an open Internet.

And one of the keys to an open anything is ease of access. And sure enough, there are still plenty of dedicated tools to access specific parts of the Internet that are not the World Wide Web: clients for FTP, gopher, finger, USENET, email, IRC, or even for new hypertext navigation protocols like Gemini, exist. But why should I need a different client for each when I could access the whole Internet from a single client?

Why should I need to switch clients when following an FTP or Gemini URL in an HTTP-served HTML page, or conversely when following an HTTP link from a Gemtext page? Why shouldn't my Gemini client be able to render HTML pages delivered over the Gemini protocol, and my web browser able to render Gemtext natively if served over HTTP?

This is why the “Swiss army knife” browser model is essential to the open Internet, and a fortiori for the open Web.

Instead, we're seeing a growing, grotesque separation between a “lightweight” Internet and a “heavyweight” Internet where —ironically— the “lightweight” clients have support for a wider range of protocols and metadata whereas “heavyweight” clients are gravitating towards being HTTP-only, and frequently eschewing useful metadata.

Why is it that a historical but up-to-date (latest version at the time of writing is from January 2023) textual client like Lynx can not only connect to FTP, Gopher and finger in addition to HTTP, but also presents the user with the next/prev and web feed links stored in the document head, while the most recent version of Firefox cannot do any of those things, and is likely destined to lose even more functionality in the future?

And no, the answer is not «ah, but Firefox has to dedicate much more resources to support the latest version of the massive, quickly-evolving HTML, CSS and JavaScript standards». The answer is not that because Firefox actually had support for those things and actually spent resources in removing them. And while for some of them (e.g. web feeds) an argument could be made that the implementation needed a rewrite, I doubt that's the case for the removed protocols.

This is frustratingly compounded in major browsers by a lack of extensibility: while it is generally possible to define external protocol handlers, it's not generally possible to write handlers that would just stream the content internally. Historical note: the much-maligned Internet Explorer actually supported something like that. Some Qt browsers (such as Konqueror and Falkon) can also be extended using the KDE Framework KIO plugins.

I still remember the days when Mozilla was the king of customization. It was them who introduced the extension concept to the browser, allowing all kinds of experimentation on the UX. Many of the features we expect in a modern browser today were first introduced through XPI extensions in the Mozilla Suite of lore and the first versions of Firefox. Now they play catch-up with whatever Chrome dictates web extensions are allowed to do, barely managing to avoid the worst

Again, the issue here isn't that Mozilla added support for Chrome-style web extensions to Firefox. It's that it did so removing support for “legacy” extensions. And while I'm sure there were good technical reasons why the existing implementation couldn't be kept and was holding back engine progress, like in the RSS/Live Bookmark case, I have my doubts that it could not be replaced with something more modern that still provided the same or —at worst— a similar interface.

Even assuming the new architecture is so wildly different from the previous one to make support legacy extension impossible, I find it extremely unlikely that it wouldn't be possible to design an extension interface that would allow pluggable protocol interfaces and image format support in modern browsers. Why do smaller niche browsers have better support for these things that the mainstream ones?

Why is it that Falkon and Konqueror can leverage KIO to provide generic protocol access, and the Otter browser can leverage the extensive Qt image format support when using the QtWebKit engine to support more exotic formats (or the new JPEG XL standard), but neither Chrome nor Firefox nor Vivaldi offer comparable extensibility?

I'm sure somebody will try to make a claim about “security”, but I very strongly doubt that's anywhere close to the actual reason.

You know what makes this whole thing even more horrifying? That all major browser vendors and the W3C have actually worked their assess off to provide something like what I'm talking about for open protocols and standard image formats —but they've done it in submission to the power wielded by the mafia-like content distribution oligopoly, on an extremely controversial “standard” (read about the EFF opinion in their resignation letter).

Let me rephrase that: the kind of extension system I'm proposing to allow browsers to support more (open) protocols and (standard) image formats isn't impossible: in fact, major browsers already have similar systems in place to allow “consumption” of content locked by Digital Restrictions Management —the antithesis of the open web. So don't come tell me there's security issues with allowing the extensibility I'm asking for: it can't be worse than the hole opened by closed source DRM modules.

A positive outlook?

In many ways, the years between 2013 and 2018 were the worst for the open web, with the reduction in browser engine variety (Presto, Trident, even EdgeHTML were all discarded in that timespan), Firefox giving up on legacy extensions and web feeds, and the W3C EME betrayal.

Can we make the years between 2023 and 2028 those of its revival? With the Fediverse taking shape, a return to prominence of the “indie” web, and the birth of new protocols like Gemini, the times seem ripe.

git git gadget command

How often did you start writing a git command and then looked something up and restarted?

git git commit -m "A beautiful commit message"

You know you've been there. Multiple times. You start writing a git command, switch to a different terminal or window to look something up (e.g. the exact syntax for some exoteric option combination), switch back to your command line and start typing your command from the beginning, including the initial git. And this gives you the rather depressing error:

git: 'git' is not a git command. See 'git --help'

This thing has always be so common that at some point in the past someone actually proposed to fix this internally in git, ignoring extra gits in the command line. The proposal was ultimately discarded (I don't remember the reasons, and I'm too lazy to browse the git mailing list to find the references, but I'm sure the actual reason was that it would have been too user friendly), so we're left with having to solve it ourselves, especially if we're particularly prone to it.

One of the nifty features of git is that if you have an executable git-whatever in your search path, git will happily allow you to use whatever as a git command, invoking the binary.

However, if you try to exploit this by making a git-git that is just git, for example with a symlink

ln -s $(command -v git) ~/bin/git-git

then it won't work, because:

fatal: cannot handle git as a builtin

You can work around this check by making a simple git-git shell script that does the work for you:

#!/bin/sh
exec git "$@"

and the live happily ever after:

$ git git git help | head -n1
usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]

(of course, once you do it, it's recursive, just like the alias trick others have found before me).

(This post courtesy of my exchange with @glasnt@cloudisland.nz.)

Picking a toot

Adding “Share to Mastodon” links to the Wok via tootpick.

In my ongoing efforts to integrated this site with the Fediverse, I was made aware of Tootpick, a nice single-file webpage designed to share links to Mastodon. The service take advantage of a specific Mastodon API to pre-fill a post, while still giving the user the opportunity to modify the text before submission or cancel the submission altogether.

The author even provides a website to make it easier for others to add a “Share to Mastodon” link. However, one of the great advantage of Tootpick is that it's trivial to self-host: the entire service is provided by a single HTML page that takes care of everything via JavaScript, so one just needs to drop this HTML page somewhere on their website and link to it from the “Share to Mastodon” links just as they would with the author-hosted service.

Indeed, adding this file to the Wok was the trivial part of the integration, but that's arguably because I had never had any other form of “Share to Anything” links before —due to a combination of non-existent or very limited usage of social networks on my side for a long time, and a general distrust for “non-local” solutions.

(I should mention here that I used to have a UserJS/GreaseMonkey script to find social network links to the Wok (post in Italian), although it has been non-functional now for a long time, by and large due to the progressive interoperability shutdown of major social networks.)

This meant that I had to start thinking about how to add these “Share” links to the Wok.

The “natural” solution would have been to add such links programmatically in the page templates. At the cost of a full rebuild of the whole website, this would have allowed such links to be assembled programmatically from the same metadata that was being used to build the pages, in an arguably natural fashion: at page build time, without any intervention from the server or from the client, at least until the “Share” link was clicked (Tootpick still requires JavaScript).

Since JavaScript is required for the “Share” link in question, however, I thought it would make sense to write a small bit of JavaScript to go over all the permalinks in the page, and add “Share” links to them. In theory, this could even have spared me a full site rebuild if not for the fact that IkiWiki does not have an option to include a JavaScript file from every page, so I had to customize the base page template and force a full rebuild anyway —several of them, in fact, as I turns out some of the metadata was not easily accessible (especially for nested permalinks in index pages).

I was ultimately successful (as you may notice if browsing the site with JavaScript enabled), and I'm quite satisfied by the results so far, even though I'll confess that the experience has made me consider again the possibility to switch to some other static site builder that might have made this easier to manage.

Continuous Content Generation

LLM/AI isn't for and won't be replacing art. It's a tool to satisfy the capitalist need for infinite growth via continuous content generation.

There's been a growing brouhaha about recent progress in “Artificial Intelligence” (AI) research with the publication of Large Language Models (LLM) that thanks to “deep learning” are finally capable of producing outputs that are very credible to the superficial and/or untrained eye.

OpenAI, ChatGPT, DALL-E, Stable Diffusion, Midjourney have all made the news rounds a few times thanks to their presumed ability to “interpret” textual input and turn them into satisfactory visual or textual rendering of the prompted requirements. Three main lines of criticism have been moved to these kind of efforts, focusing respectively on (1) the nature of what is actually being produced (especially in terms of whether or not this could finally be considered true AI or not), (2) the ethical underpinnings of the training data selection, both in terms of breadth-and-scope (e.g. concerning bias in race or gender) and in terms of copyright (did the authors authorize the use of their work to train these models), (3) the future and intended use of the models (e.g. will they replace art or artists).

I'm only going to touch on the last point here, because there's an aspect that I think has gone missing in all the discussions I've seen so far, including the jokes (but are they really) about machines supposedly being intended to replace the boring, physically and mentally destructive work and instead being advanced to eliminate the creative endeavours while humans are relegated to the work the machines were supposed to eliminate. And I will argue that this is not really the case, although the net result will be quite similar to what would have happened if it had been.

There is a trend that has been going on for decades (half a century at least, in fact) and has seen a sudden jump in the last 15 or 20 years: the replacement of art with content, and artists with creators. Others have warned about this before, in more details than I will do here, but the gist of the point is that the growth of the Internet, and of social media in particular, has exacerbated the trend initiated by the mass commodification of creative output to a critical point.

This is the result of two apparently aligned interests: that of artists to be able to make a living out of their creativity, and the capitalist obsession with infinite growth incarnated by the publishers and distributors (be they formally recognized as such, or be they such by practical definitions of the term).

Art depending on the rich and powerful to thrive isn't news: it's why so much past visual arts have a religious theme, and why such an inordinate amount of words have been written to celebrate this or that local lord. If anything, actually, now more than ever the trend has been subverted thanks to new, distributed forms of patronage (see e.g. Kelsey's and Schneider's Street Performer Protocol as discussed by Cory Doctorow) that allow artists greater control on their creative endeavours rescinding the dependency from the interests of a single supporter: commissions still exist and for many are still the primary if not only way to make money out of their capabilities, but there are some artists that can afford to follow the “self-realization” principle: write/paint/compose what you want, and let those interested in your creative output come to you, as opposed to writing/painting/composing what may give better “engagement” on the platform(s) of choice.

In fact, it can be argued that the point of divergence between the mentioned apparently aligned interests is indeed the definition of engagement. For the artist, engagement materializes in an audience that is interested in and appreciates their creative output: For the publishers and distributors, engagement materializes in a returning consumer.

Some may argue that those are similar, e.g. in expectations about the production of new content over time. But despite the superficial parallels between the two, there are some crucial differences, starting from the more direct, “personal” relationship between an artist and their audience (which is not all fun and games, as it gives way to stalking, a sense of entitlement, and the myriad of downsides that come with less sterile connections). And this is something for which no metric exists, because it depends on the intangible quality of the interactions with the audience. Worse “engagement” metric do not correlate to a worse or smaller audience.

Scale is also very different: a few hundred supporters paying a few € each monthly can be sufficient to sustain an artist's ordinary life (conditional to location at the very least, of course). This is one of the pillars for the success of the digital Street Performer Protocol mentioned above: as long as enough members of the audience support the artist, the artist can thrive and their art remain accessible to all.

Finally, at least for the purpose of this discussion, there is a difference in expectations: although I'm sure any artist would be thrilled to reach the level of success that would allow them to live comfortably for the rest of their lives, from what I can see most of them would be content with just being able to make a decent living, without having to worry about whether they'll be able to cover rent next month and not depending on their partner's income or financial support from relatives.

The situation is very different for a commercial enterprise (into publishing or distribution, given the context), doubly so for one that has already achieved a certain success, and even more if it's publicly traded: there is no true “connection” between them and their consumer, operating costs are high, and most importantly there is an expectation of growth, e.g. to “increase shareholder value”. This leads to a particularly pressing need to publish and distribute new content, at an increasing rate.

Even before the Internet went mainstream, we've seen this trend manifesting e.g. for movies and animation with the spread of home theater solutions: while 40 years ago some cinemas still offered older classics, a decade later you would have been hard-pressed to find anything but new releases outside of film society screenings, despite there not being a significant change in production until the beginning of the XXI century, when massively production-cost-lowering technology improvements and expansion to “emerging markets” led to an explosive growth.

A plot of feature films released per year, showing a slow growth from the year 1920 to around 2000, and an explosive growth right after that
Number of feature films released per year (source for the data: IMDb)

One of the key ways in which the Internet and digital media distribution has revolutionized the field has been through a shift towards subscription models. Once the domain mostly of periodic journals and service providers, the subscription and streaming model pioneered by Netflix has become the method for larger media conglomerates to cope with the new technology, after fighting it for years and trying to get governments to regulate it in the name of lost profits (and getting a long way in with it).

Subscription models are very convenient for the company, as they guarantee a more stable revenue stream compared to one-shot purchases, and may result in higher profits at equal consumption rates. But while ongoing payments can be justified by periodic journals on the basis of quantifiable periodic updates and by the service providers with an uninterrupted service delivery and continuous infrastructure maintenance, they are more difficult to justify in the case of spotty updates (release of a new movie or music album) or consumption (watching the movie, listening to the album). There's both a practical and psychological reason for this: while a subscription to e.g. a journal gives you something tangible that remains with you even if you cancel the subscription service, the digital subscription services are more like an access fee to a library owned by somebody else, and regardless of how vast that collection is, if one ends up frequently perusing the same material, the natural observation is that it is ultimately cheaper to pay for it once and own it forever than to access it in streaming.

To make the subscription model enticing, the company would thus need to either select a clientele that has the curiosity to go through their entire catalog (which no company is going to aim for, because numbers), or keep up the attention of the “generic” consumer with a continuous stream of fresh, palatable content that keeps them distracted from reconsidering the benefits of one-shot purchases.

Recommendations from the existing catalog (especially if it's a large, well categorized one) can take the role only up to a certain point, as consumption levels out. Hence the need for continuous content generation. This is not new: cable TV started to push out reality shows for the same reason; TV series were born out of the need to keep housewives engaged to sell more advertisement; and we can go as far back as the feuilleton at least as the first examples of content serialized over long runs to keep people engaged, i.e. transfixed to the specific media channel. What had been changing in recent time is the scale, and the scope, of the phenomenon, with vicious cycles that benefit neither the company nor the consumer: as the consumer aims to maximize the utility of the subscription fee, the demand for fresh content grows, and as the demand for fresh content grows, the attention to the quality of the product diminishes, lowering longer-term engagement. This reflects not only in a massive increase in production, often of debatable quality, but reaches grotesque peaks where entire series are sacrificed after the first couple of seasons on the altar of immediate engagement growth, even when the higher quality was appreciated by a meaningful number of consumers.

We are now in an era dominated by Continuous Content Generation, where engagement is not the result of sustained quality, but of a continuous renewal that doesn't even let products reach maturity before they are swapped out for a more recent one, in a desperate search for instant freshness gratification. And this is the era where ‘content’ replaced ‘art’.

Now, there are practical reasons to use the terms content and creator rather than art and artist, not least the fact that not all content in these distribution channels is art: for example, journalism and essays may use similar or related tools and medium to those used for novels and poetry, but regardless of the aspirations and capabilities of the writer, they'll rarely be classified as art, regardless of how important their role is in keeping readers engaged. So using the ‘c’ words is a reasonable way to talk about all the material when going into specifics about what is art and what isn't is unnecessary.

However, the choice of terms is indicative of a diminishing interest in the content itself, i.e. it's not just a way to indicate all the available content, but most importantly it's an indicator that it doesn't matter what kind of content it is: obviously not everybody can hire a Jules Verne or Alexandre Dumas to write serialized novels and keep selling copies of a newspaper, especially since at scale you'd need one for each kind of audience among your consumers, but we're way past the point in which this is just a matter of quantity over quality: the scale is such that what causes engagement is completely irrelevant.

Hence the spread of clickbait, mainstream trolling, enRagement algorithms, and any other strategy that helps keeping people coming back for more.

It should now be clear where I'm going with this, given the premises: this same need for Continuous Content Generation, regardless of type, form, or actual content, is the most immediate practical target for the AI/LLM that are at the center of attention today.

In this sense, these models do not represent a threat to the independent artist (or other “content creator”) that has built or can build a following of its own. The models can be considered aimed primarily at replacing the armies of scriptwriters, copywriters, “bloggers” and whatnot that provide ‘content’ for the media industries. As the models grow more sophisticated, we can expect more of the output of these industries to be produced by or with the assistance of such models, with humans relegated towards the roles of prompters and selectors/verifiers. And if superficially this may seem like a good idea to some («oh good, less effort for me to write the articles I need to publish to get paid»), it should be obvious that this will entail not only a massive cut in the workforce, but potentially its almost complete elimination, when publishers realize that they can leverage their own consumers to fuel the machine (think about user comments as prompts for the next set of articles, or how many of you are freely offering tables of possible prompts that you can rest assured are being logged for future use —how about a model that writes prompts next, for example?).

It's also debatable whether or how much or how many consumers would actually care about or even realize that the content is being produced by LLMs, although it may be important in the beginning that the automaton intervention be as subtle as possible. If any of you were hoping that the recently presented tools that detect LLM outputs would just be used to denounce the number of AI-written articles already in circulation, think again: a more likely use will be for the tools to be sold to the publishers to help identify the machine-produced content that cannot be detected, and can thus be published with less danger of repercussion from readers that do care about where the article comes from, at least until the new choice of writing system gets normalized.

While this does eliminate some of the threats that AI/LLM pose to independent artists and other “content creators” (we really need a better word for this), its influence will still need to be considered. The most obvious effect is that by reducing the opportunity for relevant employment within the industry, it potentially increases the pool of artists that will have to make a living independently (if they wish to live off their art), and the competition may make it harder to build one's livelihood from it. More importantly, however, these models are (still) incapable of original content creation, which puts a limit to the variety of what they create. This may not be apparent now that its use is very limited, but with an adoption at scale for Continuous Content Generation these limits are likely to be hit sooner than later, even with judicious use of human-direct prompts and selection.

It therefore becomes essential for the models to be periodically injected with “noise” (new training data) to increase the variability of its creation (this is something that anyone experimenting with particularly unusual prompts has noticed already). Now while it's possible that this can lead to the displaced scriptwriters and copywriters to find employment specifically to produce such “noise”, what is most likely is that the work of independent artists that continue in their art unassisted by LLM will be unceremoniously hijacked as training data —and this is not a potential threat: this is something that has already happened and is still happening, as illustrated by efforts done by nearly all art hosting and online editing services (from Adobe to DeviantArt) to change their terms of service to include wording that allows them to feed the hosted content to such models, and making this opt-out (thus enabled by default) rather than opt-in. This is an area (outside of the scope of this article, as it falls within the second theme mentioned in the first paragraph) where legislation potentially could curb the phenomenon, but it's likely that lobbying by the media companies will render it ineffective at best, and counterproductive at worst, despite the clear preference from most artist that AI be kept away from their work.

Curiously, the reason why I think that AI/LLM do not pose an immediate, direct threat to the independent artists is not only that the lack of originality and variety in the output of the models makes it more valuable for the “reprocessed art” that is behind much of the “content production” of the media industry (think for example of the speed with which subgenres saturate in mass-produced animation and comics, and that's with human work), but also because the target audience for the two forms is wildly different, and the audience that seeks out the more original and varied “content” produced by independent artists is also more likely to value it being “artisanal”: so not only it is more difficult that AI/LLM would be able to produce the content this audience seeks, but that same audience would value it more when produced by an actual human rather than mechanically (or procedurally) in response to a writing prompt.

And again, while the higher appreciation for artisanal production poses a problem for physical production, it is considerably more sustainable in the digital space, by the higher efficacy of crowdfunding.

There is something ironic, I feel, in the “techbro” enthusiasm for AI/LLM travelling often in tandem with an unhealthy obsession for cryptocurrencies and the Non-Fungible Token (NFT) craze that goes with it. (One of) the purported intent(s) of NFTs (as advertised by said techbros) is to help artists “monetize” their art by artificially restricting purchases of what amount to “certificates of authenticity”. The reality (to no one's surprise, except the fools that bought into the scam) has been very different: most of the NFT-“certified” content that has floated around so far has been either procedurally generated, unoriginal, uninteresting variants of thematic images designed for unsubstantial products with the only aim of generating noise before the rug pull (Exit scam), or outright “stolen” infringing on the copyright, licensing, and/or moral rights of the original authors.

The irony here is that what makes art truly valuable isn't restriction on consumption (making it inaccessible); in fact, it could be argued (but I don't want to get into a philosophical discussion about what makes art art) that art achieves its peak value when it reaches the widest audience. (In this sense, the infinite reproducibility of digital works of art have the potential to be valued as nothing before them.) And this value stems from the uniqueness of the work of art at creation time. Each work of art is unique because it could only have been created by that particular artist in those particular circumstances: a different artist, or even the same artist in different circumstances, would have produced something different. (Heck, even in the same circumstances, just because of small differences in context: think for example about the English and German versions of the same movie by Hitchcock.)

The digital work of art is unique because the work of art itself is not the bits encoding its representation, in the same way in which the novel isn't the ink and paper with which it is transcribed: the art is the story, the images, the sounds that result from the decoding of that representation, and how they reflect on the aesthete's mind —and these are unique by creation.

I could go off on a tangent here on how art is the antithesis of capitalism, and the principle of substitution not applying to creative work is just part of it, but (back on topic) coming to terms with this is what shows the irony of the AI/LLM+NFT fandom: art is intrinsically not fungible, by virtue of creative originality at inception. LLM-generated content is the epitome of unoriginality, and the artificial restriction imposed by “minting” NFT for it does nothing to compensate for this.

So maybe instead of worrying about how these glorified procedural generation models may threaten the livelihood of artists, we should focus on rethinking the socioeconomic system in such a way that creativity may be valued for what it is, instead of the perverse system of disincentives that has been built around it to force it into the capitalist concept of “value” by (artificially created) scarcity of access.

(And how's «digital (street) performer» as a better alternative to «content creator»?)


Appendix: I'm going to collect here links to Mastodon threads that are relevant to the points discussed in this post:

Nuclear will not save us, part 4

No, not even nuclear fusion will do

Introduction

Here we go again. Any time there is some kind of breakthrough in nuclear power generation, there are crowd cheers about our energy production problems being solved, or close to being solved. And every time this happens, I'm forced to remind people that no improvement in energy production will “solve” our energy issues, unless we tackle the growth in consumption. And since those improvements will more often than not lead to an increase in the growth rate of energy consumption, these breakthroughs —however locally promising they may be— will end up being deleterious on longer —but not even that much longer— time spans.

Context

The news of the day is a breakthrough in nuclear fusion, with the National Ignition Facility at the Lawrence Livermore National Laboratory announcing fusion ignition, i.e. the ability to trigger a self-sustaining fusion process that produces more energy than it consumes.

This is wonderful news. It's a scientific and technological advancement that humans have been dreaming about for more than a century, and one that paves the way to a potentially cleaner and safer energy production mechanism than anything we've seen so far.

It's also still far from being anywhere close to actually be productive in that sense, as detailed by Michael Schirber in this article on APS (if not else because the amount of energy used to start the reaction is still orders of magnitude higher than the one released by the reaction).

Still it's an important result, and one that gives much better hope that the energy production based on nuclear fusion may actually be finally within reach, and that this may revolutionize energy production dramatically reducing its environmental impact, as well as its cost.

Unfortunately.

The dangers within

I've said it before, but I will say it again, because apparently this is a point where repetita iuvant: no form of power generation bound by the 90PJ/kg limit will suffice unless we curb the rate at which energy consumption grows. It doesn't matter if it's fission or fusion. It doesn't matter how efficient the energy production is. The only thing that matters is that exponential growth is faster than our perception. And as I mentioned in the first post of this series, the cheaper and the cleaner the energy production is, the higher the risk is that its adoption will lead to a faster growth rate in energy consumption.

In scarcity, people are frugal; in abundance, wasteful.

You don't need to teach someone who can barely make ends meet how to conserve food, water, heat or money. Yet a billionaire will not care about fuel costing 3 times as much as before when choosing to fly somewhere on their private jet.

There's a reason why energy conservation and efficiency have become such a hot topic in the last 50 years: the 1970s energy crisis (which also gave a strong push to the investment in nuclear (fission) energy production, until the Chernobyl disaster in 1986 cooled off much of the enthusiasm). Why does this matter? Because the discourse in the last years has largely shifted from efficiency and lower consumption (or at least lower growth in consumption) to a wider-reaching ecological discourse (“green” energy, low emissions, you name it).

There are different reasons for this change in topic, ranging from a genuine interest into the increasing threat of global warming and anthropic influence on climate change to the easier manipulation of the discourse into a business opportunity (“greenwashing” and friends). But the shift in attention is deleterious: not because reducing pollution is bad, but because pollution and the energy crisis are deeply interconnected by … energy consumption growth.

Let's pretend for a moment that tomorrow nuclear fusion became commercially viable, providing us with the cheapest, lowest-environmental-impact energy source we could ever dream of. Let's pretend that with a single flip of a switch all energy consumption could be switched over to this cheap, low-impact source. This would absorb instantly a sizable fraction —I'd even go as far as say: the majority if not all— of the clamor about the environment, the carbon footprint, and all the “hot takes” that have replaced in the public discourse the only thing that really matters: energy consumption growth.

In such a scenario, it doesn't matter if a billionaire's private jet or yacht consumes in a year as much as half the population of the country they live in: there's so much cheap energy, and the higher consumption has so little impact that … who cares? Very few care now that it has an enormous impact on the environment and it risks depriving more important resources from having access to cheaper energy; how much would people care when it wouldn't?

In such a scenario, there's simply no interest in tackling the fundamental problem. In fact, quite the opposite, we can expect a radical upturning of the perspective from the general population: if energy is so easy to obtain and it has so little environmental impact, why would it even matter to keep its consumption in check, or to request that it be kept in check from government bodies? The most likely outcome of such a scenario is a sudden jump in energy consumption growth, as the limiting factors of cost and pollution prevention regulation are removed: from the rest of the world catching up to the “Western” standards of living more quickly to the “West” coming up with even higher-maintenance standards to compensate for the environmental damage of the last centuries (air conditioning everywhere, massive desalinization plants for fresh water, pervasive augmented reality, you name it, it'll be there).

And the net effect of this will be catastrophic, because no energy source can sustain exponential consumption growth for long, and by the time people realize that even the nuclear fusion fuel can run out it will be too late —again— and the collapse will be so much harder.

“The most abundant element in the universe”

Much of the enthusiasm behind nuclear fusion comes from fallacies similar to the ones we've already discussed in our previous chapter of this series: material abundance and constant consumption rates. And we've seen already that at the current growth rate the mass of our entire solar system won't last 30 centuries, regardless of energy production system (and assuming optimistically 100% efficient mass-to-energy conversion). And we've seen some interesting computations on the EET for fission fuel. So this time we'll play around with the numbers for fusion.

One of the biggest and most dishonest talking points of fusion fans is that since the fuel is hydrogen, «the most abundant element in the universe», it's virtually impossible to run out: for all intents and purposes, it will last “forever*”.
(*conditions may apply)

We already know that no matter how large the amount of fuel is, as long as it's finite and the consumption keeps growing exponentially (i.e. at a constant rate) it'll run out —and much earlier than predicted (how does 50 centuries sound for the entire mass of the galaxy converted to energy at 100% efficiency, again?), but where the talking point fails miserably is that while it's true that hydrogen is the most abundant element in the universe (estimated around 75% of matter is hydrogen), or even the solar system, it is not the most abundant element on this planet. (Why? Because most of the hydrogen in the universe is in the stars where it's already being used to run a nuclear fusion process!)

And the statistics are even worse if we look at “free hydrogen”, rather than hydrogen in other molecules (such as water or hydrocarbons). If we look at the abundance of hydrogen on Earth, it barely makes the top-10 in abundance for the crust, hidden in that 1.2% of “other trace elements”. In the atmosphere, it's even more rare: at ground level it's 0.6 parts per million (PDF warning), and we have to climb to the exosphere (starting approximately 700 km above sea level) to find it in more consistent concentrations … in a medium so rarefied it doesn't even behave like a gas anymore.

I'm sure you can see where this is going, and it shouldn't be a surprise, when we've already seen in the previous chapters how quickly we'd run out of fuel on Earth regardless of energy generation method. But wait, there's more!

Nuclear fusion doesn't use “classic” hydrogen (aka protium), actually: the most important elements for the fusion process are deuterium (around 2 in 10,000 hydrogen atoms) and tritium, the hydrogen isotopes with extra neutrons (1 and 2 respectively), and if the “successful” (for appropriate definitions thereof) experiment that renewed the enthusiasm in the fusion process recently is any indication, tritium is of particular importance (although in theory we could do without). And tritium is also the rarest of the isotopes: due to its 12-year half-life, it's barely found in nature (we're talking 1 in 1018 hydrogen atoms, at scale), and is more typically produced as a byproduct of other processes —such as nuclear fission, or other fusion processes.

I'll leave it as an exercise to the reader this time to estimate the mass of the available “free” molecular hydrogen and the corresponding EET for fusion power generation at current (or higher!) growth rates. After three chapters, and with the help of the form in the second chapter of the series, there shouldn't be any need for hand-guiding you through the process.

The not-so-clean energy source

Of course once we run out of molecular hydrogen —in contrast to other fuels for power generation— there are several orders of magnitude of fuel still available. The problem is that accessing this destroys the second myth peddled by nuclear fusion supporters: that fusion is the cleanest form of energy generation, even more so than fission, and without the risks deriving from radioactive waste typically associated with fission.

Leaving aside that any talk about the cleanliness and riskiness of the process (e.g. per unit of energy produced or per unit of power generation) is pure speculation, and will remain so until the first commercially viable fusion power generation plants are finally deployed and have proven themselves for a few decades at least, even from a purely theoretical standpoint the myth is on shaky ground. Indeed, the myth is tightly bound to the one about the abundance of hydrogen: assuming you have plenty of molecular hydrogen available, the fusion process is indeed one of the least impactful forms of energy generation. The question is: how do you get that hydrogen in the first place?

Fusion would be really clean if we could just have a passive collector of molecular hydrogen from the atmosphere, and extremely efficient ways to prepare it for the fusion process (e.g. deuterium extraction, tritium generation), but we have neither. And even if we did, where would we get the hydrogen from when we run out of the free molecular hydrogen in the air, something that is bound to happen sooner rather than later if we get seriously invested on fusion?

And this is where things become interesting: while there's definitely room to invest in the capturing of the hydrogen released e.g. by volcanic activity, the more readily accessible “stores” of hydrogen are water, carbohydrates and hydrocarbons. And the processes to extract hydrogen from these have two important downsides in the context of our discussion: they are either very energy intensive (which brings us back to the increased energy consumption, or in a restricted view to a lower power generation efficiency), or quite environmentally unfriendly (e.g. combustion, water depletion), when not both.

I can already see the objections about how these would still be less of an environmental problem than, say, the drilling and mining required for fossil and fission fuel, and while that may be the case now, I'd like to revisit this when the requirements for hydrogen extraction/production raise to the needs of our present and future power generation requirements.

Nuclear will not save us

I'll refrain from going on a tirade that would just repeat the conclusion of my previous chapter, but a recap is appropriate still.

The fuel employed, the process used to produce the energy don't matter. The single most important factor is how quickly energy consumption grows. And the cheaper the energy generation is, the more quickly its consumption will grow. If anything, a more efficient and cleaner energy production method is more likely to boost energy consumption, that will result in an even harder fall when the ceiling is hit.

Still, I'm glad for the progress in the research on nuclear fusion, and while I believe that the press release is laced with excess optimism, I'm looking forward to the time, a few more decades from now, when the technology will have progressed enough to turn fusion into a viable power source. Any option we have at our disposal to minimize the environmental impact of energy generation and optimize energy production with the means we have our disposal is more than welcome.

Of course, in those few decades our global power consumption will have doubled again (at least), unless the global economy suffers another major collapse. (It's fascinating, really: take any year-over-year global power consumption change and you can identify recessions simply by looking at when it energy consumption change dropped close to zero —or worse, went into the negatives.)

And if we don't fix that, nuclear will not save us.

Titleless

About titleless entries (and other future changes) in the Wok

Having discovered the ActivityPub-based microblogging platform Mastodon and its feature to produce RSS, Dave Winer (one of the people that helped defined the RSS format) has set up to fight the lack of support for titleless feed entries in feed readers. The intent is commendable, as well as Dave's approach to titleless blog updates, but seeing his take on it has made me think again about my (current) approach to maintaining the Wok.

I've actually pondered several times in the past about this and related issues, wondering about the best approach to handle esp. collections, such as my quotes collection, my “lightning” aphorisms (the closest thing to an on-site microblog), my “upsetting” discoveries, etc, updates to which could be considered both from a “single item” perspective and from a container update perspective.

The problem for me isn't just one of titlelessness, though, especially since I actually generally prefer to have titles, doubly more so when I can think of interesting ones: my problem is actually that I'm growing tired of some of the limits of the platform I'm using, but at the same time have a distinct preference for its underlying architecture, which is arguably responsible for those same limitations.

I like that each post here is a text file. But this also means that it needs a filename, even if the post itself might not have a title. Of course, the post having a title makes it easier to choose a filename. I could go with titleless posts, but then I'd have to think of a way to name the files in a way that is unrelated to the title. (Not that dramatic: this cold just be the date-time).

There's more though: the post metadata (author, date) need to be entered manually. This is strictly speaking not necessary, especially since this is a single-author thing, so the author could be inferred, and the date could be taken from the file metadata itself —except for the whole revision-control tracking and pushing across different machines, that messes this up. The presence of in-file metadata isn't a bore as such, but having to enter it and keep it updated is, and while for a long-form post like this one it's not even that much of a bother, it is one of the obstacles to quick-posting one-liners or other small content.

The obvious way out would be to cook up some scripts to handle this. The obvious danger is that these scripts could easily grow to become an ad hoc, informally specified, bug-ridden, slow implementation of a blogging platform. And considering I have all intentions to move away from IkiWiki, would this even be worth the effort?

Of course, there's no guarantee that my “moving away from IkiWiki” is going to happen any time soon, so a quick & dirty patching up of the issue in question might have some value, even though we all know about the permanence of the temporary. Even worse, it's surely not going to be “quick” in any sense of the word (but plenty dirty) if the script will need to accommodate the many existing different collections that I've already started. It could work, OTOH, for at least the more organic one of them, and potentially for a new one with a similar structure to be created ad hoc: because, and that's another thing, there's something to be said about microblogging, that can't be said for other form of long-form composition, and it's that its “immediacy” promotes usage.

From my experience on Twitter first and Mastodon later, I've noticed that the posting format encourages writing even for content longer than the character count limit for the single post usually imposed on these platforms: I've found myself writing long threads that could have just as well been long-form posts more often than not. This isn't just a matter of practicality due to the higher degree of automation, or the frequency with which one might find themselves on the website or “app”: there's something about the limit that tricks the mind, appealing to the possibility of jotting down “just a couple of words”, even when one ends up writing several thousands.

As far as I can see I'm not the only one feeling this way, although I wonder if others also perceive the tension between this and the consideration that microblogging is not designed for long form. I am a big fan of “using the right tool for the right job”, but on the other hand, a tool that invites you to write is better at its job than one that isn't as encouraging. Moreover, the chunked format of microblogging threads helps give a structure, a rhythm to the text that must be sought with purpose in standard blogging. And the resulting rhythm isn't just stylistic: it provides hooks to the reader for comments, quotes, etc, in a natural way for the platform itself.

Microblogging doesn't necessarily entail a lack of title, but it often is titleless, to the point that a title isn't even supported on some platforms. This contributes to the simplicity and immediacy of the posting format, but also reduces flexibility unless the platform does support the feature —or something that can be (ab)used to a similar effect, like Mastodon's Content Warnings (CWs). Usage of a platform always requires some adaptation to the characteristics of the platform: for example, Mastodon's lack of a “collapse thread” feature has led Cory Doctorow to use CWs on “child” posts in his famously long threads on Mastodon, and as he tells it when discussing Pluralistic, composing his daily thread starts on Twitter, because it's the “least forgiving” platform.

Now, I'm not anywhere near as prolific as Cory Doctorow, so I probably won't ever need the scripts have helped him lighten the manual load, and my blogging isn't professional enough to justify to myself the long-winded routine of multi-posting to separate platforms in addition to my self-hosted site (POSSE: Publish Own Site, Share Everywhere), but I am annoyed by my own over-reliance on Twitter previously and Mastodon now to post content long enough that it would have deserved its own entry in the Wok, so I'm now left pondering on the strategy to adopt for the future (aside from backing up my off-site posts and importing them here —one of these days).

To my advantage I have not only the much lower production rate, but also the much smaller platform expanse: I only actually care about sharing my content on the Fediverse. This is something I achieve even now by sharing links to my articles on Mastodon, but I'm looking into better-integrated solutions, some way to support at least a minimal functional subset of ActivityPub that would allow others to follow my posts and maybe interact with them (favorite, boost, comment) directly rather than through my Mastodon account. Once this is achieved, the next step would be to aim towards a simplified way to microblog on the Wok, with a dedicated section and possibly some way to simplify the creation of posts (and chains thereof) on this future section. And yet, I'm not really looking forward to hacking my way through the IkiWiki codebase (again). Switching to a different platform might help in this sense, but in this case I'd also take the opportunity to also move to a different format (AsciiDoctor) for my source, but this is less supported by existing static site generators … and suddenly this all gets on the road to become a full-time unpaid web development job.

I might never get to the point of having the platform I really want in my hands, but maybe some interesting tech ideas may come from walking this path, even if only sporadically.

Solarpunk

The road not taken towards energy independence

Intro

OK, I admit it, this isn't really about Solarpunk, at least not in the literary/artistic genre. And yet still it is, in some sense, since it is about the future that the genre envisions.

Since the beginning of the 2022 Russian invasion of Ukraine on February 24, a parallel economic conflict has been escalating between Russia on one side, and most European and NATO countries on the other. The “Western” side has imposed a number of sanctions, preventing circulation of most goods and people, and Russia has retaliated with the only weapon available to it, its control on the provision of natural gas to Europe. The last step of this conflict (at the moment of writing) is the “Western” side aiming to put a price cap on gas purchase, and Russia retaliating by shutting down delivery altogether.

This aspect of the conflict in particular has generated a lot of noise on social media, a significant portion of it quite obviously fed by Russian propaganda, revolving around the danger of the spike in energy prices and how its effect on the energy bill will negatively affect the “Western” economies, potentially even more than the Russian economy.

What is fascinating about this isn't so much the obvious propaganda trolling, but rather how the political discourse, both nationally and internationally, has been focused on “how to pull through the crisis” rather than on how to avoid the crisis altogether, accelerating on gaining independence from the Russian gas altogether, possibly without throwing our economies into the arms of the next authoritarian regime.

For obvious reasons, I'm talking here about plans that don't need long planning stages or lead-in times, but can still provide significant long term benefit. This excludes for example investment in nuclear energy, that in the best of cases take years to complete, with issues and delays extending this to decades, with enormous increases in costs if the experience of the Flamanville or Olkiluoto nuclear power plants has anything to teach us.

It's probably obvious from the choice of title for this post that the plans I'd rather see taken into considerations revolve around expanding renewable energy sources utilization, and chiefly solar among them.

Why solar

Why solar specifically, though?

There are several properties that make solar particularly palatable as the option to invest on in the short term. Let's see some of them.

Speed

Solar panel are ridiculously quickly to deploy. Actually setting them up takes only a few hours. Including the planning and acquisition stage one can expect to have an installation up and working in a couple of months, with delays taking up to six months: in this it's comparable to the typical deployment time for a wind farm, and orders of magnitude lower than the time needed e.g. for hydroelectric power.

Scalability and graduality

It's a very “local” power source, that “anybody” can set up on the roofs of buildings and other coverable land (thing e.g. about parking lots). It also scales well and gradually, allowing larger installations to start reaping benefits before the whole system is up, with incremental expansion.

Climate-oriented

Note that this isn't about it being more climate friendly, but rather more climate oriented, i.e. better suited for the direction climate is changing. 2022 has been an exceptionally hot and dry year in many parts of the world (although it's likely places like Pakistan, Afghanistan, and other parts of South Asia, might disagree on the specifics).

The decade of droughts that have hit the northern emisphere from Europe to China and the Americas, are affecting not only agriculture (as it has been in the previous years too), but also hydroelectric, nuclear, and even coal power generation.

With the trend showing no clear sign of reversing, solar and wind promise thus to be the most “climate-oriented”, green energy sources, i.e. the ones least likely to suffer from strong setbacks in the future.

It's not perfect

Yeah, it's obviously not a perfect solution. It won't allow Europe to achieve total independence from Russian gas (or from fossil fuels) in the next 12 months. It may require significant imports of rare earths for the battery systems that help prop up the periodic discontinuity of solar. A price spike might also be expected if expansion is concentrated in the upcoming few months before winter.

Yet none of these objections, alone or together with the others, are meaningful reasons not to invest in solar (or wind) right now, because we don't need (neither should be strive for) a “perfect” solution, we simply need to get started (the earlier, the better) on a solution that can be improved and expanded in time and that can give the first results with a small turnaround. And solar is just the right thing for this.

Getting a head start

It's fascinating, really, how this has been handled across Europe. The EU is taking initiative, the Baltic countries are setting up to expand offshore wind energy production, Portugal (already getting over 50% of its energy from renewable sources, plans to further expand both solar and wind.

What I found a bit depressing is that Italy seems to be lagging behind in these projects. The last significant boost in growth of solar installations was 10 years ago, and even the proposed tax breaks in 2020 for a number of residential energy improvements don't seem to have pushed growth much.

With an incoming national election and worries about the spike of energy price, the fast route to energy independence (and solar as the means to it) should be at the center of the political discoures. And while it's completely unsurprising that right-wing parties would be more open to sucking up to Russia again, it's more troubling that the rest of the spectrum doesn't seem to even think about it, focused either on long-term project of dubious utility (nuclear power plants) or on how the “common man” may help reduce consumption e.g. by turning off and disconnecting appliances or reducing heating.

Why isn't there a “solar panels on every rooftop” plan? Why doesn't every school, office building, factory, warehouse, mall, start investing now in the installation of solar panels on their buildings, or covering their parking spaces?

And yes, I'm well aware that even starting now the benefits won't be reaped before next year, since November to January are the least useful for solar energy production, but paraphrasing a saying dear to fans of nuclear power:

the best time to install solar panels was 6 months ago, the next best time is right now

EVs are still worth it

Why a transition to Electric Vehicles is worth it even with energy production backed by fossil fuels

Introduction

In the wake of the EU Parliament's controversial decision to ban sales of combustion-engine vehicles by 2035, the harshest criticism essentially revolves around the purported “idiocy” of adopting electric engines in vehicles when most electricity is still produced from fossil fuel (often supported by the memetic news of diesel generators being used to charge fully-electrict Tesla cars).

While there is little doubt that such a setup is, shall we say, “less green” than it would be if electric vehicles were charged with ”green” energy (i.e. energy generated by renewable and/or less polluting sources such as wind, solar, water, or even nuclear), objecting to a wider adoption of EVs on that basis1 not only completely misses the point (a “green” transition can be gradual, it doesn't have to be all-or-none), but it's particularly stupid in the sense of the perfect being enemy of the good.

The fact that a diesel-charged fully-electric vehicle is still “greener” than a combustion-engine equivalent has been remarked by many when discussing Tesla's “loss of face”, but I don't have any particular appreciation for the articles I've found on a quick search on topic, so I've decided to present my own take on the subject. Note that this take only mostly looks at the “finished products”, so for a wider discussion on the topic of the “greenery” of vehicles (electric or not) including manufacturing and whatnot you'll have to look elsewhere. What you'll find here is some a few key advantages of the fossil-fuel-backed EV transition that I deem often overlooked.

Pollution delocalization

This particular advantage is in fact the first I thought of, the one that triggered my desire to write this post, and even if it was the only one (it is not), it would be —for me— sufficient.

Replacing all internal-combustion-engine vehicles (ICVs henceforth) with full-electric vehicles (EVs henceforth) displaces the pollution source. ICVs usage is concentrated in the same areas where people live, with each of them being a moving point-source of pollution right under our nose (and eyes, and skin, etc). Switching to EV would concentrate and move all those individual point-sources into a pollution sources located elsewhere, most typically farther from densely inhabited places.

Pollution from ICVs has well-known effects on health problems ranging from birth defects to premature death. While delocalizing the pollution source doesn't eliminate the effect altogether, it does improve things, and the farther the power plants are, the better.

In addition to displacing (if not reducing, see below) good ol' air pollution, replacing ICVs with EVs also massively decreases noise pollution. In fact, EVs are so much quieter that there are worries about the safety implications of the lack of noise.

Efficiency

This may come as a bit of a surprise, but EVs can actually be more efficient than ICVs in energy usage from the same amount of fuel. This depends on multiple factors ranging from the vehicle use-case to the power plant generation.

Current vehicle combustion engines have peak efficiencies ranging from 35% (gasoline) to 45% (diesel). Of course, in practical usage this only happens under ideal conditions (full load during acceleration with the engine at peak efficiency RPMs, which is usually between 2K and 3K RPMs) and one of the big ironies of ICVs is that maximum fuel economy is instead achieved at cruising speeds at the highest possible gear with the lowest possible RPMs (typically around 1K) which is actually not very efficient, in terms of fuel energy extraction. Combined with the more or less frequent (and immensely wasteful) stop-and-go (most typical in urban usage, yet less uncommon than most people would believe in extra-urban and highway usages), the effective efficiency of most ICVs is between 12% and 30%, with worst cases dropping as low as 6%, and best cases at around 37%.

{ Verify if drivetrain losses are accounted for in these figures. }

{ Add considerations on the cost of refining crude oil into vehicle fuel. }

Electric vehicles are more efficient in using the energy from their batteries (at least 60% considering all losses) and much less affected by idling or other low-efficiency usages. Of course, this has to be compounded with the efficiency of power plant energy production and grid distribution losses. The latter amount to between 10% and 30% (depending on distance, quality of the grid and a number of other factors), leading to an overall efficiency between 40% and 55% from power generation to motion.

The crux is, unsurprisingly, at the power generation step. Even though energy efficiency of nearly 60% are possible with modern tech, most power plants do not reach such levels (typical efficiency is 35% for coal, 38% for oil, 45% for natural gas, with the most efficient ones reaching resp. 42%, 45%, 52%), resulting in an effective efficiency for EVs (from fuel to wheel) between 14% and 29%.

So even without additional considerations emerges that switching to EVs from ICVs would typically result in comparable efficiency in fuel usage. However, the argument doesn't end here: there's more to consider both on the EV side and on the power generation side.

One of the significant advantage of EVs over ICVs is regenerative braking, i.e. the capability to recover some of the energy spent to put (and keep) the vehicle in motion when braking. Although similar systems (particularly KERS, Kinetic Energy Recovery Systems) have been explored for ICVs (particularly in racing cars), they have not seen any meaningful adoption for civilian transport, in contrast to the widespread use of regenerative braking in electric and hybrid vehicles. Taking brake energy recovery into account, the efficiency of EVs rises to the 75%-90% range, for an effective efficiency (fuel to wheel, including power generation and grid losses) between 18% and 42%.

The other aspect to consider is that while most of the inefficiency of ICVs goes into wasted heat, power plants can co-generate electric power and heat, with overall efficiencies as high as 88%.

Although this doesn't improve the fuel-to-wheel efficiency of EVs per se, it does improve the overall fuel consumption efficiency, thus reducing waste.

Efficiency improvements

Technology improves (in fact, it's interesting to note that technological progress seems to have had a higher impact on the efficiency of ICVs than in the efficiency of EVs, although this is largely due to the fact that the EV efficiency is already much higher, and that considerably less R&D has gone into improving EVs until recently).

One question that is interesting to pose is: how long does it take for a technological improvement to have an actual measurable effect (e.g. leading to lower pollution or higher efficiency).

In a largely saturated market like that of civilian vehicles, even if all new cars were to adopt the better technology, the replacement of the existing cars with the new ones would take decades if not for government incentives to switch to lower-emission vehicles.

With the highest sources of inefficiencies for EVs being located outside of the vehicles themselves (distribution grid, and most importantly power plants), many technological improvements would lead to indirect benefits to the effective efficiency of EVs without any intervention on the user side.

This doesn't hold true for all improvements (e.g. a better battery technology leading to higher density and thus lighter batteries for the same capacity would still require physical maintenance on the vehicles, although still less problematic than buying a whole new car as needed for most technological progress on ICVs), but e.g. a 5% reduction in power grid losses or a 5% improvements in efficiency for power plants would automatically lead to the corresponding gains in the overall effective efficiency (fuel to wheel) for all EVs recharging on said power grid.

Smoother transition

The previous point naturally segues into the final and (for some perspective) most important point: an early transition from ICVs to EVs will lead to smoother transition to other power sources.

Road transportation accounts for nearly 50% of oil consumption in the EU and constitutes the main emitting source for a number of pollutants responsible for the low air quality (and related health issues) in urban areas. Even if there was a full switch to “green energy” generation today, the around 250M vehicles currently in circulation in Europe would remain responsible for this massive consumption of oil and the associated pollution and health issues.

Even though, as discussed above, an accelerated transition from ICVs to EVs would neither eliminate our dependency on oil nor reduce the associated pollution (although it would reduce the health problems associated with the emitting sources being concentrated in highly populated areas), it would make the transition away from fossil fuel more effective, any subsequent increase in the percentage of energy produced from “green” sources would automatically (albeit indirectly) make road transportation more “green”.

Addendum: could it happen or not?

With the sales of electric vehicles doubling worldwide every 2 years or less (average growth rate 50% or more), one might even wonder if the EU initiative is even needed: if the trend continues and the percentage of EV car sales (to total car sales) were to keep doubling every 2 or 3 years, starting at an 8% of car sales being for electric vehicles in 2021, we would approach 100% of car sales worldwide being EVs in 10 to 15 years (thus with the target of the EU parliament proposal). The growing prices of car fuel (largely a consequence of the 2022 Russian invasion of Ukraine) is also likely to support such a trend.

(It should be noted however that these statistics include, so-called plug-in hybrid vehicles, that have both an internal combustion engine and a battery-backed electric motor. These are the ones that have seen the fastest growth in the most recent years, yet they would be affected by the ban proposed by the EU Parliament.)

What is missing on the other side is the infrastructure to support such a transition: charging stations are still few and far apart, and mostly concentrated in the higher-density, higher-traffic regions. Massive infrastructural upgrades are needed to support the target of the EV transition, and not just in terms of power distribution: large increases in the number of circulating EVs will also require an adequate growth in power generation. And between the looming energy crisis, the impact of climate change on “green” energy production, and the long times, increasing costs and general resistance to nuclear, that is something that might not ramp fast enough in the envisioned time frame.

It makes one wonder if investment in infrastructure (power generation, better grid, more charging stations), a differentiation of power sources, and support for local power production (“solar panels everywhere!”) to bring down electricity costs would be a more effective (albeit indirect) strategy to incentivize adoption of EVs. One thing for sure is that these things have to happen anyway for the “only EVs after 2035” to be sustainable.

Post Scriptum: a bet that I'm sure to win

Assuming the proposal (or some equivalent initiative to accelerate the adoption of EVs) passes, you can bet that 50 yeas from now, when we will be enjoying the benefits of the widespread use of EVs over ICVs, libertoloids (libertarians, ancap et similia) that are now so vocally against the EU plan will boast how the free market led to the resulting quality of life improvements, conveniently forgetting about the massive impact that regulations and incentives have had in directing such market.

How can I be so sure? Because that's exactly what they are doing about the improved energy consumption and reduced pollution that were driven by large scale government initiatives, particularly from the 1970s onwards.

Claiming “engineers did that, not government regulations” is a platitude inasfar as it's true, and is otherwise false. Yes, engineers were essential to achieve the technological progress that improved energy consumption and reduced pollution, but the main incentive to move in that direction came from the government regulation. We'd still be dying by smog in large numbers if private entrepreneurship profit had remained the driving motive for technological progress.

Until and unless the markets finally manage to incorporate the true cost of large scale externalities such as the environmental damage, it will never be able to lead to such improvements in quality of life. It's not by chance that the industrial revolution actually led to a decrease in life expectancy, in highly industrialized cities, with respiratory issues becoming the dominant cause of death (PDF warning).

(And yes, a similar discussion holds for epidemics, but that would be way off topic.)


  1. of course there are other objections, such as “this shouldn't be forced by law, but a decision made by the market”, which I'm not discussing here, and not just because “the market” not accounting for externalities —which are key in this discussion— makes such an objection irrelevant. ↩

Testing Mastodon

On Twitter, Mastodon, self-hosting and the migration between social networks

The self-hosted pipe dream

My ultimate aim, for my online presence, would be to be completely self-reliant. The aim is, objectivele, a bit of a pipe dream, since I'm well aware of the gigantic efforts that it would entail to actually reach a point of total independence, (and for starters, even for something as essential as email I am pretty sure that I will never make it) but I like to get my gains wherever I can.

The most obvious example (you're reading it now) is the choice to abandon external blogging platforms in favour of this self-hosted wok. For other protocols (such as IRC), “self-hosting” doesn't really make sense, but controlling your client allows you to control your experience and most importantly your backups (I have all the logs of all my IRC conversations, without having to ask them to anybody else). For some services this is not possible, although sometimes there are “bridges” that allow some degree of client control: for example, I use Bitlbee to connect to some of my instant-messaging accounts through an IRC-like interface, although the reliability of these “bridges” is severely limited by an explicit intent from their operators (e.g. Google) to limit interoperability.

In some cases, I have simply scaled down my presence, sometimes helped in that by the demise or downfall of the corresponding platform: this is for example the case for the now-long-gone FriendFeed (my idea of what could have been “social networking done right”, even if still on a proprietary, centralized platform), or Tumblr, which I only sporadically visit (most of the content being luckily accessible from other platforms or “followable” through other means).

Up until recently, the only remaining “significant” online point-of-presence for me has been the microblogging platform Twitter. I must say that even my on-platform presence has been sporadic for a long time, but had recently gained some weight, and each of my post there has been made with a “second thought” about the loss of control over my content. Much of it may be recoverable through the Twitter data export feature, but it's still a non-trivial process, and the fundamental implication about the loss of control remains even when workarounds are found.

(By the way, even for Twitter it's possible to set up a Bitlbee bridge, although given the extensive use of graphical elements the experience is far from being as smooth as with other services.)

The social network escape

I do not have the time or inclination to discuss the dangers of the centralization of social networks (especially when others have written more and better than me on the topic, and I may even link some of the relevant content from e.g. the EFF or Cory Doctorow here, after I find the time to collect it), but the work to create open and distributed alternatives has been going on for over a decade now, driven largely by the interest of individuals and groups worried about the implications of proprietary control of online spaces.

The most famous example at the time was probably Diaspora*, born as an alternative to FaceBook in 2010, that even reached a certain prominence in the news during a bout of «delete Facebook», but was hardly the first or the most successful (for example, the microblogging platform identi.ca, alternative to Twitter, had already been active for a couple of years).

The coordination of efforts from separate groups, each dedicated to a specific aspect of the “modern”, “social” web (blogging and microblogging, aggregation, music and video streaming and sharing, discussion fora, etc) has led over the years to the creation of what is now known as the Fediverse, a “universe” built on the “federation” of individual entities. The development of common protocols (most notably the now recommended ActivityPub) and the growing maturity of the developed software has finally reached the point of (at least technical) feasibility for an alternative to centralized social networks, altough the question remains about the possibility for them to become a viable, and widely adopted, alternative to the centralized platforms.

Enter Mastodon

The recent bid by Elon Musk to purchase Twitter and make it private (“to restore freedom of speech”, but I will discuss the idiocy of the claim and of those who actually believe it in a different time and place) has brought back into the news the alternatives to it, and in particular the currently most popular Mastodon.

As with all Fediverse components, Mastodon is not a hosted platform in se (in the sense of a centralized website to which users register), but a software stack that provides a platform. Each installation of the software is an instance, and there are therefore multiple websites to which one can register to have a Mastodon account, similarly to how people can get an email address from different providers (their ISP, Google, HotMail, etc). And just like with email, Mastodon users can communicate with each other, follow each other's updates, etc through the common protocol, regardless of the instance they are registered with.

Not being a centralized service puts a barrier to entry on Mastodon, especially for people used now to decades of centralization: making a Mastodon account requires an active and conscious choice about where (which instance) to create it, with the associated burden of understanding the difference between instances, the fact that they each have their own terms and conditions (and possible additional restrictions on who may or may not register with them), and so on and so forth. There are “general” instances, both global and language-specific, that may be considered the go-to fallbacks, and are probably a safer bet (in term of reliability and permanance) compared to smaller instances: these provide, for the technically uninclined, the closest thing to an “optimal” situation outside of fully centralized solutions, similar to the larger email providers nowadays used by most people (the classic HotMail, Google's gmail, etc), with the benefits (and downsides) of the decentralized, federated model.

I personally don't think that the decentralized model poses a particular obstacle to adoption (despite the slightly higher barrier to entry), and I will discuss elsewhere the details on what may make (or fail) the future of the Fediverse, aside obviously from the FUD propaganda fueled by centralized services that are threatened by this model (hint: it involves the participation in the Fediverse of some high-profile accounts, possibly on their own instance in an official form: think for example of institutional accounts from the US or EU being on their respective mastodon.gov or mastodon.europa.eu instances, in contrast to e.g. the unofficial mirrors from the proprietary platform that can be found through the respublica.eu instance).

(Edit: I just found out that the EU actually has an official Mastodon instance. That's actually pretty good news.)

In fact, from my “self-host-all-the-things” perspective, a much larger problem with Mastodon is that it's non-trivial to set up a personal instance: while it is possible, Mastodon is a bit infamous for being a massive resource hog with complex setup necessary even in the “reduced” use case of a self-hosted personal instance, to the point that it's frequent to find recommendations to try alternative microblogging platforms that still integrate in the Fediverse, most typically Pleroma.

As a result, I won't be able to consider myself commpletely self-reliant on the “social network” side of things yet, even while moving away from the centralized platforms.

Testing Mastodon

So yeah, I've taken the opportunity to set up a Mastodon account on a general instance. I'm not particularly worried about the future of Twitter with Musk at the helm (in fact, I doubt anything would change, and it even looks like the deal, that was assumed done, might actually fall through), but like in other circumstances, I've grabbed at the chance to stop putting off exploring alternatives to Twitter when the circumstances presented themselves.

I don't consider my current Mastodon account to be “definitive”, even though it will likely last for several years, as I don't see myself switching to a self-hosted instance anytime soon, although that would be the ultimate goal. In the mean time, I've made an effort to set up as much as possible in a way that would allow me to interact with the platform “on my own terms”: this includes setting up a Bitlbee bridge to be able to follow and interact with my timeline from an IRC-like interface (I must say that I'm not particularly impressed by its stability yet), and the adoption of a practical command-line utility that I've used extensively to search and follow various accounts from a variety of instances. Periodic, automatic backups of my stream are something that I intend to explore soon.

The process of migration from the proprietary platform to Mastodon will take some time, a transition period with permanence on both, and will probably result in a long tail (most likely reduced to the few really interesting “big name” accounts that won't switch or clone their presence across networks). Luckily, a lot of effort has gone already in the community to help in this regard: I've discovered a few services that make it easier to bridge Mastodon and Twitter, including a “wrapper” for Twitter accounts that are presented as if members of a Mastodon instance, and a service that should make it easier to manage accounts across different platforms (I haven't tested it yet, but it should come in useful while transitioning from the proprietary to the open platform). There are also ways to improve one's visibility across instances, such as this Italian bot designed specifically for this purpose.

I have not even made my first Mastodon “top level” toot (post) yet (sharing this article will probably be the first one), but I've already had the opportunity to interact with some users (unsurprisingly, mostly about the nature of Mastodon itself, as the influx of new people exposes doubts and perplexities about its accessibility, long-term viability, and the potential of the social platform on its own merits rather than just to host people running away from the proprietary platform for whatever reason or exploring the fad of the moment). The experience has been rather smooth, although I've noticed some minor issues already (most notably, the fact that you cannot set the toot language when posting from the web interface, the less aggressive/expansive behavior for embedded links esp. other toots —which may or may not be an issue, depending on the use case— and the fact that a fixed column size is used in the “advanced” web interface).

It will be interesting to see how the currently much lower traffic on Mastodon develops in the following days and months, if the migration pressure keeps up or dwindles (by general drop in interest or because the Musk acquisition falls through, if it does). It will also be interesting to see how well the platform holds up as the influx of new users puts strain on the software and hardware instances holding the network together (already the general instances maintanied by the Mastodon developers themselves have shown signs of “cracking under pressure”).

And who knows, this all might lead to better, more lightweight software and possibly more interest in making self-hosting more approachable.

Nuclear will not save us, part 3

Shorter-term considerations on the Exponential Expiration Time (EET) of nuclear power

Introduction

If you believe that nuclear power is the solution to the energetical (and possibly environmental) issues of the more modern developed nations, the question you should ask yourself is: for how long still?

A few weeks ago I've started a series discussing the “expiration time” for nuclear power under the assumption of a constant growth in energy consumption (with a rate between 2% and 3% per year). The results were not very encouraging for the long term: even at the lower growth rate of 2%, the energy requirements would grow so much that even the entire mass of the Milky Way, converted entirely into energy according to the famous E=mc2 equation (100% efficient nuclear energy extraction, giving us around 90PJ/kg), would suffice us for less than 5 millennia: not even the time that would be needed to move from one end to the other of said galaxy.

Such is the power of the exponential function.

While the post was not intended as a prediction, but mostly just as a “cautionary tale” about the need to reduce the speed at which energy consumption is growing, it has been criticized for the timescales it considers —timescales ranging from several centuries to millennia, timescales in which “anything may happen”.

I have tried to address the main objections in the follow-up post, discussing primarily two points: one being the choice of 90PJ/kg as upper bound of energy production, and the other being the assumption of energy consumption growing at the current rate for the foreseeable future, and most likely even beyond.

Despite the validity of these longer-term considerations (again: not predictions, just considerations), I don't doubt that many (if not most) people would find it useless to reason over such time spans, refusing to take them into account for shorter-term decision-making.

In this third installment of the series, we're thus going to focus on a much shorter term (say, within the century or so), within which it's much harder to deny a continuing growth in energy consumption, at the current rate (which we will optimistically round down to 2% per year), and it's plausible that nuclear energy extraction will continue within the current order of magnitude of efficiency, or only slightly more (say, no more than 10-1 PJ/kg from the current approximately 1.210-3 PJ/kg).

Some preliminary numbers

If you've gone over the first two installments of the series, you may have noticed that the summary table in part 2 has an exceptionally low number in the upper-left corner: where all other scenarios offer an EET of two centuries or more, the lowest scenario gives us only 14 years. Surely that's too low? How is that possible?

The EET of 14 years is indeed too low. It corresponds to the EET under the following assumptions:

  1. constant growth rate of 2% per year;
  2. current tech level, extracting around 1.210-3 PJ per kg of uranium;
  3. 8109 kg of available uranium (the amount estimated to be in current known conventional reserves);
  4. the worldwide total primary energy consumption (6105 PJ/year) is entirely satisfied from nuclear.

The first two assumptions are entirely reasonable within the timespan of less than two decades, the third assumption is possibly even too generous (it assumes that within these two decades, we'd be able to even just extract all of the uranium from the estimated known conventional reserves).

The last assumption, on the other hand, is completely unrealistic: nuclear power generation today barely covers a fraction of worldwide total primary energy consumption. Existing civilian power plants produce less than 2600TWh (or less than 104 PJ) of electricity per year (and the amount is going to decrease, if the current initiatives to transition away from nuclear are any indication of the near future).

(That being said, that uranium wouldn't actually last long at full usage shouldn't even be that big of a piece of news for anyone following the field: even back in 2008 there was awareness about how long uranium would last if production was to ramp up, given the discovered deposits. In fact, we actually get more leeway in our estimates because we're using the much larger amount of estimated known conventional reserves.)

But let's try to get a bit more realistic.

Ramping up nuclear

The first exercise is to see what would happen if we ramped up nuclear power (instead of transitioning away), to try and cover a larger slice of the total primary energy consumption, at the current tech level (1.210-3 PJ/kg).

For simplicity, let's round things a little bit. Assume we currently produce 104 PJ/year from nuclear (while this is rounded up, in our calculation the final differences is of at best a couple of years over a whole century), and that the readily available uranium from known conventional reserves is 1010 kg (this is a bit on the generous side, but it's one way account for the discovery of some more uranium deposits).

We have two questions: how long will it take to cover the current global primary energy consumption (6105 PJ/year) and how quickly will we run out of uranium. In particularly, we'd like to at least get to satisfy the current primary energy requirements before running out of uranium

The answers to these questions obviously depend on how fast we can ramp up energy production from nuclear power: the faster we ramp up production, the quicker we match primary energy needs, but at the same time, the faster we ramp up production, the quicker we run out of uranium.

(You can follow the exercise by plugging in the relevant numbers in the form found after the table in part 2 of this series, just consider ‘production’ instead of ‘consumption’ in the first and last field.)

It's interesting to see that with anything less than a 4% growth rate for nuclear power generation, we won't even get to produce one whole year's worth of the current primary energy requirement before running out of uranium: at 4%, we would run out of fuel after slightly less than a century, while producing barely more than 5105 PJ/year.

Anything less than a 4% growth rate (18 years doubling time) would allow uranium to last for over a century, but without covering the current worldwide primary energy consumption. Ramping up at a 5% rate (more specifically, around 4.82%, 15 years doubling time) would allow us to match the current worldwide primary energy consumption just as we run out of easily accessible uranium, 85 years down the line.

To get some meaningful (multi-year) amount of coverage, we would have to ramp up production even faster, but this would shorten the time of availability of the fuel: for example, at a 7% growth rate (doubling time: 10 years, still realistic considering the time it takes to actually build or expand nuclear power stations) the known uranium reserves would have an EET of only 64 years.

Actually, if the ramping up limit was the current total primary energy consumption, uranium would last a little bit longer: the EET production rate would be 8.8105 PJ/year, which is higher than the current consumption. This would buy us a few years if we stopped ramping up as soon as we reached parity, pushing the EET to around 70 years (not much, but still something).

Playing catchup

On the other hand, assuming that the global primary energy consumption remains constant in the next century is quite a stretch: we can expect it to keep growing at the current rate of at least 2% per year for the foreseeable future.

Given the ramping-up timeline, this would give us at least another doubling, potentially even two: this means that even getting at 6105 PJ/year would cover at best only half of the future primary energy needs. We should strive for more. And yet, even a 7% ramp-up rate wouldn't manage to cover a single doubling (1.2106 PJ/year target) before running out of uranium.

We would need at least a 10% ramp-up rate (doubling time: 7 years, which is about the quickest we can do to bring new reactors online) since that would push production to 1.22106 PJ/year —just as uranium runs out, 48 years from now.

We could do “better” of course: knowing in advance the number of reactors needed to match the future energy request, we could build all of them at the same time. But that would only get us much closer to the dreaded 14-years EET for conventional uranium reserves (a quick estimate gives us around 30 years at best).

Ultimately, the conclusion remains the same: at the current technological level, and with the current estimates on the quantity of uranium available in conventional resources, we wouldn't be able to cover more than a few decades of global energy requirements at best, even with conservative estimates on how quickly the latter will grow.

Breeder reactors and the myth of the “renewable” nuclear power

Given that the short expiration time of uranium at current tech level even just to satisfy the current global energy requirements (let alone its increase over the next decades) has been known for decades, one may wonder where the myth of nuclear power as “renewable” comes from.

We can find the answer in a 1983 paper by Bernard L. Cohen published on the American Journal of Physics, vol 51 and titled “Breeder reactors: A renewable energy source”. The abstract reads:

Based on a cost analysis of uranium extracted from seawater, it is concluded that the world’s energy requirements for the next 5 billion years can be met by breeder reactors with no price increase due to fuel costs.

Hence, nuclear power is considered “renewable” in the sense of being able to last as much as other energy sources traditionally considered renewables (such as wind and solar), whose expiration time is essentially given by the time needed for the Sun to run out. (I think that's an acceptable interpretation of the term, so I'm not going to contest that.)

Cohen's work starts from the well-known (even at the time!) short expiration time for traditional nuclear reactors, and shows how moving to breeder reactors would allow unconventional sources of uranium (particularly, as mentioned in the abstract, uranium extracted from seawater) to become cheap (in the economic sense) enough to be feasible without a significant increase in the price of generated electricity.

The combination of 100× more effective energy production, and the much higher amount of fuel, lead him to calculate the 5 billion years expiration time —assuming a constant rate of production equal to the total primary energy consumption in 1983.

It should be clear now why Cohen's number don't match up with our initial analysis: uranium would only last long enough to be considered “renewable” at constant production rates, not at ever-increasing rates. In fact, if you want to know the exponential expiration time for seawater uranium in breeder reactors, you just have to look at the second row, second column of the famous table: if energy consumption keeps growing as it is, all the uranium in the sea water fed to breeder reactors wouldn't last us 500 years.

Of course we don't know how accurate of a forecast my “doubling every 30 years” assumption is for future energy consumption (although it's much less far-fetched than some may think) but at the very least we know that Cohen's assumption of constancy was wrong, since consumption has already doubled once since, and it shows no sign of stopping growing anytime soon.

In fact, as I mentioned in the first post, the biggest risk for nuclear comes specifically from the perception of its “renewability”. In some sense, we can expect this to be the opposite of a self-fulfilling prophecy: the appearance of nearly infinite, cheap energy, combined with our inability to understand the exponential function, will more likely encourage an increase in energy consumption, as wasteful behavior devalues in face of the perceived enormity of available energy, ultimately leading to such a steep growth in energy consumption that the source would be consumed in an extremely short time.

By contrast, higher friction against the adoption of nuclear, combined with the much lower energy cap of all other sources, is likely to drive more efforts into efficiency and energy consumption minimization, thus slowing down the growth of energy consumption, and potentially allowing future nuclear power use to last much longer (even though, most likely, still considerably less than the billions of years prospected by Cohen).

What does it really mean for an energy source to be renewable?

The truth is that, in face of ever-expanding energy requirements, no energy source can be considered truly renewable: the only difference is whether the production of energy from it can keep up with the requirements, or not.

Traditional renewables (wind, solar, hydro, wave, geothermal) can last “forever” (or at least until the Sun dies out) simply because we cannot extract them faster than they regenerate: as such, they won't “die out” (until the Sun does), but at the same time we'll reach a point (and I posit that most likely we're already there) where even if we were able to extract every millijoule as it gets generated, it still wouldn't be enough to match the requirements.

With non-renewables, the energy is all there from the beginning, just waiting for us to extract it. This means that (provided sufficient technological progress) we can extract it at a nearly arbitrary rate, thus keeping up with the growing requirements, but at the cost of exhausting the resource at a disastrous pace.

The importance of reducing energy consumption growth (and thus to avoid the energy Malthusian trap) is thus dual: maximize the usefulness of traditional renewable sources on one hand, and maximize the duration of non-renewable sources on the other. And yet, it would take extremely low growth factors for non-renewable sources to get anywhere close to billions of years in EET.

As an example, consider the case of Cohen's setup (breeder reactors, seawater uranium) in a slightly different scenario. Assume for example that energy consumption continues to grow at the current pace for slightly more than a century (due to ongoing population growth and developing countries lifting their standards of living), leading to three more doublings, arriving short of 5106 PJ/year. Assume also that only at this point humanity switched to breeding reactors fueled by seawater uranium, covering with it the total primary energy requirements, and that from this moment onwards energy consumption kept growing at a lower pace. Depending on how low the new pace is, the EET for the seawater uranium in breeding reactors grows proportionally larger:

Growth
(per year)
EET
(years)
doublings within the EET
(approx)
2%
(no change)
388
(no change1)
11
1%70710
0.5%12759
0.25%22738
0.125%39947
0.0625%68906
0.03125%116045
0.015625%189394
0.0078125%296523
0.00390625%439672.5
0.001953125%608991.7
0.0009765625%780301
0.00048828125%92549< 1

It should be clear that even at very small energy consumption growth factors (the smallest presented factor corresponds to a doubling over more than 140K years) it's simply impossible to have non-renewable resources last billions of years, although some may consider anything over 10K years to be “acceptable”, or at least “not our problem anymore”.

(Side note: even with a 100% conversion of mass to energy, i.e. 90 PJ/kg, the lowest growth rate considered won't give us billions of years: all the seawater uranium would last barely more than a million years, and all of the uranium and thorium estimated to be in the crust would last less than 4 million years, and our entire galaxy 15 million years; to get to a billion years for the Milky Way, growth would have to be lower than 10-5% per year, at 90 PJ/kg.)

Does it make sense to make decisions based on something so far into the future?

While it's true that we can't make predictions that far into the future (especially not in the millennia or hundreds thereof that might be provided by the very low growth case), it's true that at the very least we should avoid closing the paths to that future altogether.

Put in another way, we may not be able to look that far, but we are able to determine if we'll get there at all, possibly without passing through a societal collapse.

A quote frequently attributed to Albert Einstein recites something to the tune of:

I do not know with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.

Regardless of how accurate the quote (and its attribution) is, the sentiment is clear: the enormous destructive power (offered by the atom bomb or whatever even worse weapon comes after it) would be enough to throw civilization back to the Stone Age level.

A similar argument can be made here for energy consumption: we don't know when we'll overtop even the most effective form of energy production, but we do know that when that happens it will inevitably lead to civilization collapse —and it will be sudden, despite being perfectly predictable (or I wouldn't be writing this now).

With the famous bacteria in a bottle example, Bartlett highlights, among other things, how even just a few doublings before expiration most people wouldn't realize how close the expiration was, due to the vastness of the available resources and the lack of awareness on how quickly such vastness is consumed by the exponential function, and how even the successful efforts of farsighted individuals to expand the availability would only buy marginal amounts of time before the collapse.

In this perspective, it's never too early to act in the direction of reducing the exponential growth, and in fact, it's actually more likely to be almost always too late. And even if it wasn't too late already, even with the best intentions, there is actually very little, if anything at all, achievable at the individual level that would actually put a dent in the exponential. Frustrating as it may be, even a collective “doing our best” to avoid being wasteful hardly scratches the energy consumption (10 less watts per 24 hours per day per person in the world is still less than 0.5% of the total energy consumption).

The fundamental issue is much more profound, and much more systematic. And the first step in the right direction is to raise awareness about the true nature of the issue: there's a much more urgent problem to address than how to produce energy, and it's how to reduce consumption —and not by the crumbs for which the end user is responsible, but for the entire chain of production, from raw material extraction down to the point of sale.

As I mentioned, this might be the only upside of the transition away from nuclear, and similar “Green New Deal” fairy tale initiatives: promoting consumption reduction by energy starvation —although one would wish there were better ways. And worse, it really won't be enough anyway, as long as it's set in the same system for which growth is such an essential component.

We need a completely new direction.

A final(?) remark

I'm hardly the first to make such considerations, and I will surely not be the last. Aside from Bartlett whose famous talk on “Arithmetic, Population and Energy” that consciously or not inspired my initial curiosity into looking at the exponential expiration time for nuclear power, others have now and again discussed the finite limits we're set to meet sooner rather than later in our path of growth, including ones I haven't discussed here, such as the waste heat disposal issue.

And yet, awareness of the issue and of its importance is slow in the uptake. It could easily propagate exponentially, and yet (for once that the exponential could work in our favour!) it seems to encounter such large resistance that it barely trickles out with linear growth, and a slow one at that.

Where does this resistance come from? With all the campaigning on climate and “going green” and sustainability, one would expect this crucial side of the issue to be heard more. The numbers and the math behind it aren't even that hard to grasp. So why?

A possible explanation could be that the timeline is still too long to be able to catch people's attention: we can't get people truly involved with the climatological catastrophes we are bound to experience in the next decades, why would they worry about energy suddenly running out three centuries from now?

But I think there's something more to it. And yes, a sizable part of it is the pathetic realization that climate, sustainability and “going green” can be varnish to commercial exploitation (greenwashing, as they call it); full-chain consumption curbing, on the other hand, cannot, as it's the antithesis of what commercial exploitation thrives on.

But beyond that, there's most likely the realization that we're already at the point where any serious effort at sustainability with current standards of living would be in vain without a drastic reduction not so much of the consumption, but rather of the consumers.


  1. note that this is in addition to the time necessary to get from 6105 to 5106 PJ/year; the difference between starting the consumption at the 6105 PJ/year level versus starting at the 5106 PJ/year level is marginal. ↩

Nuclear will not save us, part 2

Are we sure energy consumption will keep growing at the current rate? A follow-up on why we need to curb energy consumption growth or it will be curbed for us.

Introduction

A couple of weeks ago I wrote an article illustrating why even the transition to nuclear energy will not be able to keep up with our energy consumption if such consumption keeps growing at the rate it has been growing since at least the industrial revolution.

I've recently had the opportunity to debate the contents of the article, so I feel that it might be appropriate to clarify some aspects, and delve into some points with more details.

But first of all, a recap.

In the previous article, I make some back-of-the-envelope estimates about for how long it would be possible to keep growing our energy consumptions at an exponential rate (doubling approximately every 30±5 years, i.e. with a rate between 2% and 3% per year) under several (generous) assumptions on our energy productions capabilities.

Exponential Expiration Time (EET) scenarios for nuclear energy assuming a constant growth in energy consumption with rate $k=2%$ per year, starting from the current $r_0 = 6*10^5$ PJ/yr, for different efficiency levels $E$ (in percent) of mass-to-energy conversion (theoretical maximum: $P = 90$ PJ/kg), and different amounts of available mass $M$. The EET is computed in years using [Bartlett's formula][eet] $(1/k) ln(k R/r_0 + 1)$ and the total amount of energy that can be produced is computed as $R = E*M*P$.
Mass (kg)
$8*10^9$ $5*10^12$ $7*10^12$ $1.2*10^13$ $1.2*10^17$ $5*10^17$ $7*10^17$ $3*10^22$ $9*10^22$ $7*10^23$ $3*10^27$ $2*10^30$ $4*10^42$
$0.0013%$
$0.13%$
$1%$
$10%$
$100%$
$10\times$

Note: the table currently needs JavaScript enabled, because I'm too lazy to copy the data by hand for each of cell. On the other hand, this means that with the same code you can play with the numbers yourself, by changing the numbers in the following form.

(The defaults are for the current tech level and growth rate, using the total mass of the current known uranium reserves and the current nuclear energy production instead of the entire primary global energy consumption, so the EET refers e.g. to a condition in which the fraction of total energy covered by nuclear remains constant, while the total energy consumption grows at the current rate.)







Point #1: it's not a prediction

The point of the article was not to make a forecast about what will happen.

The only point of the article was to show the upper bounds of the exponential expiration time (EET) of energy sources if energy consumption keeps growing at the rate it's growing.

Specifically because it was an estimation of the upper bound, the longest-term predictions were done under ideal, and absolutely unrealistic, conditions, such as in particular the possibility to convert matter (any matter) to energy with 100% efficiency, which —by our current understanding of the physical world— cannot produce more than 90PJ of energy per kg of mass.

Now, this is obviously unrealistic: even with nuclear (the most efficient form of energy production we know now) we only get five orders of magnitude less than 90PJ/kg. But it's intentionally unrealistic, leaving plenty of room to scientific and technological progress to catch up.

Objection #0: you can't tell that it's an upper bound

It's obviously quite possible that in the future (even in the near future) we might be able to find a more efficient form of energy production compared to the best we can do now.

However, the possibility of such an energy source being practically exploitable to produce significantly (i.e. orders of magnitude) more than 90PJ/kg is extremely slim. What it would require is:

  1. a scientific discovery that invalidates the famous E=mc2 formula, showing a way to produce orders of magnitude more than 90PJ of energy per kg of mass or equivalent;
  2. technological progress to make such scientific discovery exploitable to produce energy with sufficient efficiency that the amount of energy produced is still orders of magnitude more than 90PJ/kg (or equivalent);
  3. that such scientific discovery and technological progress happen before we hit the EET of our current energy production methods.

Now, I'm not saying that this is impossible, but the chances of this happening are so low that I can quite safely claim that the estimations to the EET computed with 90PJ/kg are, indeed, the upper bounds to the EET assuming energy consumption keeps growing at the current rate.

That being said …

So, again, the point of the article was not to try and predict the future, but only to see for how long still we can keep growing at the rate we're growing.

In fact, if a point had to be taken from the article, I would say that the main point should be the final suggestion: that it's better to invest in reducing the growth rate of energy consumption than it is to invest in improving energy production efficiency.

But let's move on to the more solid objections.

Objection #1: I'm ignoring the benefits of technological progress

One objection I've read is that my calculations don't take into account the benefits in terms of efficiency (both in energy production and in energy consumption) that will come from technological progress.

For energy production, this is actually mostly false: as I mentioned in the previous point, my estimations of the EET are done in such favorable conditions that I leave room for several orders of magnitude of improvements in energy production efficiency (at least up to the quite realistic but ideal limit of 90PJ/kg). Of course, it's not completely impossible that we won't find (before the expiration date!) a means of energy production that allows us to extract, in practice, more than 90PJ/kg. But unless such a very hypothetical method, beyond even our current scientific comprehension, allows to produce several orders of magnitude more than 90PJ/kg, this part of the objection is completely moot. In fact, even with several orders of magnitude more it would be a very weak objection, since each order of magnitude increase in efficiency only buys us around 3 doublings, which at the current rate means around a century.

For energy consumption, the objection is true, in the sense that I do not discuss the possibility for technological progress to improve the efficiency of our energy consumption, i.e. the possibility to waste less of the produced energy, or to do more work with the same amount of energy.

This is true, but again it's intentional, since how the energy consumed is being spent is irrelevant to my point. The only thing that matters is how much, and how quickly this grows.

Now, for the “how much”, the efficiency of the consumption is completely irrelevant. Where it can become relevant is on the growth of the consumption itself. However, finding a more efficient way to use energy doesn't necessarily mean that less energy will be used (in fact, historically this is mostly false, see also the Jevons paradox).

That being said, even if improvements in efficiency of consumption did lead globally to a decrease in energy consumption growth, it wouldn't invalidate my point. As an objection, this would make sense if my post was an attempt at making a prediction of what would happen. But it's not, so this is not really an objection.

Au contraire, given that —if a point has to be made— the point would actually be that we should concentrate our efforts on reducing energy consumption growth, encouraging such technological progress (and such application of it) is actually exactly what my post aims for, by providing the estimated EET for our civilization if we don't go in that direction.

That being said, I can't say I'm particularly optimistic of this actually happening any time soon: when humanity finds a way to use energy more efficiently, this doesn't usually turn into “doing the same work with less”, but it tends to become instead a “let's do even more work with the same amounts of energy”.

In fact, even when at the individual level this may lead to lower consumptions, this decrease is not reflected globally; on the contrary, the higher efficiency leads to more widespread adoption of the technology, leading to an overall higher consumption: which is exactly why, despite the massive increase in efficiency since the beginning of the industrial era, energy consumption is still growing at a more-or-less constant rate.

Objection #2: to grow exponentially for that long, we would have taken to the stars

This was the first objection that tried to take issue with the continuing exponential growth. It was an interesting one, but still rather weak. Moreover, albeit in a bit underhanded way, I had already addressed it in the post, pointing out that the entire (estimated) mass of the Milky Way will last less than 5 thousand years if energy consumption keeps growing at this rate.

For comparison, the radius of the Milky Way is estimated to be between 80 thousand and 100 thousand light years: we wold run out of energy long before even being able to visit our galaxy without FTL.

With FTL? Possibly we could visit our galaxy then, but who knows how much energy is consumed by that.

Objection #3: you can't make predictions that far into the future

(“That far” being either the millennia for the consumption of our solar system and beyond, or even just the few hundred years before we run out of fissile material to fuel nuclear reactors thousands of times more efficient than the ones we own now.)

This objection comes in at least two variants.

One is essentially on the theme of the already-addressed objections #0 or #1 above, the other comes as a variation on the theme that the exponential growth assumption is invalid.

In either case, it's obviously true that I can't make predictions that far into the future. But then again, it's also true that I'm not making predictions, I'm just calculating the EET under the assumption of constant growth.

Of course, if the exponential growth assumption is invalid, then the EET doesn't hold —but that's not because I can't make predictions into the future, it's because the exponential growth assumption is invalid.

And that's actually OK, because the whole point, as I mentioned, is that we should slow down the growth to either get out of the exponential growth altogether, or at least lower the grow rate to something that will allow growth for a much, much longer period.

So let's get to the final objection:

Objection #4: we will not grow exponentially for long anyway

On one hand, I could dismiss this with a simple “duh”, since the whole point of the previous post is that if we don't do it by our own choice, it will happen anyway, catastrophically, when we get so close to the EET that it will be apparent to all we won't be able to keep going —except that it will then be too late to slow down without a civilization collapse.

It's interesting however to see the forms that this objection can take. Aside from #2 above, and the masked #3, there's a couple of interesting variants of this that deserve a mention.

Objection #4a: the magnitude of the consumption after a few more doubling is inconceivable

While the wording wasn't exactly that, the basic idea is that if we keep doubling for centuries still, the order of magnitude of the consumption would be so high that we can't even imagine what all that energy would be used for.

And while it's true that we would be hard-pressed to imagine energy consumptions that large, it's not really much of an objection, since this has always been the case. Would anyone have imagined, even just 20 or 30 years ago, that we'd end up air-conditioning the desert?

Ironically, this objection was raised by the same individual that objected to the 90PJ/kg upper limit: so you can imagine us finding a way to produce more energy than that, but not us consuming several orders of magnitude more energy than now?

Honestly, I have fewer problems imagining the latter than the former: flying cars anyone? teletransportation? robots for everything?

Objection #4b: population and consumptions will stabilize in time

This is an interesting objection, because in the long term it's quite likely to be true. I will call this the “logistic” objection, because the fundamental idea is that population and consumption follow a logistic function, which is essentially the only way to avoid the Malthusian trap of overpopulation (more on this later).

Now, let's accept for the moment that this is indeed mostly likely to be true in the long term. The big question is: how long of a term, and how fast will it stabilize?

There are two primary contributions to the global energy consumption: per-capita consumption, and world population. For the global energy consumption to stabilize, we thus need (1) the world population to stabilize and (2) the per-capita consumption to stabilize.

Both of these things are actually strongly correlated to the quality of life and standards of living, and so far they have exhibited a distinct tendency to “flatten out” while improving: more developed and wealthier nations have both a more stable population (sometimes even exhibiting negative growth, if not for immigration) and a reduced (or, again, slightly negative in some cases) growth in energy consumption per capita (although different countries have settled at different rates). Developing nations, on the other hand, have an energy consumption growth that is much higher than the world average: China and India, for example, that together account for nearly half the world population, both have a primary energy consumption growth rate that is around 5% per year (doubling time: 14 years).

Note that in both my previous and this posts the only real underlying assumption is that we don't want to reduce our standards of living nor quality of life. It's clear that without this assumption the exponential growth hypothesis doesn't hold, since it's quite simple to reduce energy consumptions simply by stopping using energy —and thus renounce all of the things in our life that depend on it. (This is also evident when looking at the global work energy consumption over time, and how it “dips” after each recession.)

Let's take the USA today as reference for “stable” energy consumption per capita, which is about 80MWh or slightly less than 300GJ per person. (By the way, did you know that the USA is not the worst offender in terms of energy use per capita? small nations such as Iceland and Qatar have much higher per-person energy use, currently closer to 200MWh per person, or 720GJ per person; even Norway sits slightly higher than the USA, at over 90MWh per person.)

We can expect global energy consumption to keep growing at least until the whole world reaches a similar per-capita consumption, and considering that the world average per-capita consumption is 20MWh per person, growing at a rate of slightly less than 1% annually on average (doubling time: over 70 years), this will take a century and a half if things keep going at the current rate. In fact, it will take at least 70 years even just to get to, say, German levels (around 40MWh per person per year).

If energy consumption per capita stabilizes, global energy consumption will only grow with population: after the ~2.1% growth rate peak reached in the '60s of the XX century, population growth rate has been on a stable decline, and is currently slightly over 1% per year, projected to drop below 1% halfway through this century —thus earlier, in fact, than the doubling time of the per-capita energy consumption.

With these two pieces of information, we can thus say that —unless something goes catastrophically wrong— the global energy consumption will keep at the current rate at least until the end of the XXI century. What will happen after that? According to those raising the objection, the flattening out of the population growth will only require the maintenance of the standards of living, which will require a constant (if not decreasing thanks to technological progress) amount of energy per year.

But is this actually the case?

In the following sections I will discuss two possible counter-points to the “logistic” objection, at the end of which I will drive the following conclusion: the most likely alternative to exponential growth is not stabilization, but societal collapse, i.e. a profound crisis that will lead to a drastic decrease in quality of life and standards of living for the majority of world population.

Counter-objection #1: there's no guarantee that the population will stabilize

Let's briefly recap what the Malthusian trap is. The basic idea is that, in a case where resources (e.g. food) are abundant, population grows exponentially. However, if the resources do not grow at the same rate as the population, we soon reach a point where they are not abundant anymore: there are less resources than the population would require, and this leads to the population collapsing (this is the “trap”), until it again drops below the level of scarcity, and the cycle begins again.

This kind of phenomenon has in fact been historically observed, both locally and globally. However, this seems to have stopped happening since the industrial revolution: since the XIX century in particular, population worldwide has instead grown at an ever-increasing pace up to the second half of the 1960s, peaking at around 2.1% per year. The growth rate has since been decreasing, dropping today to about half the peak rate, but still keeping a positive (larger than 1%, in fact) rate.

The observed trend is quite different from what could be expected by Malthus' model. The chief explanation for this has been the accelerating pace of technological progress, that has allowed the avoidance of the Malthusian trap by changing the ways resources are consumed (improving the efficiency of their consumption, accelerating the shift from one source to another as the previous one became scarcer, etc).

Avoiding the Malthusian trap has allowed a different mechanism to take over: the demographic transition from a child-mortality growth limit to an old-age growth limit. In this model, the plateau in population growth depends essentially on the improvement of living conditions that lead to lower child mortality, and a subsequent (and consequent) lowering of fertility (as a larger percentage of children reach adult age). As long as technological progress maintains the resource/population ratio high enough to avoid the Malthusian trap, this demographic transition shifts the age distribution up, as humans lives approach their maximum natural extent and fewer children are born.

This plateau actually contributes to avoiding the Malthusian trap by keeping the population size below the threshold of resource exhaustion.

There's more to it, though.

Looking at the timeframe of the rapid growth in world population, it's interesting to see how the time span of growing growth rate matches pretty well with the period of more revolutionary scientific and technological breakthroughs.

It's possibly a sad state of affairs that since the end of the Cold War technological progress, despite advancing at an incredible pace, has not given us any world-changing breakthroughs: most of the tech we use today is more a refinement of tech that emerged between the interwar period and the end of the Cold War than something completely new and original. (Sad state of affairs because it would hint that wars, be them hot or cold, are a stronger promoter of technological progress than peace.)

In some sense, we've reached a plateau not only in population growth (in the more developed nations), but also in the —allow me the expression— “originality” of technological progress.

Now the question is: when the next significant breakthrough happens, will it come alone, or will it be associated with a renewed increase in population growth rates?

One would be led to answer negatively, since we're already reaching the maximum natural extent of human life, but it's actually quite plausible that we can expect another spike. Some possible scenarios as examples:

  1. improved medical knowledge allowing significant age extension, upping e.g. the average age of death by about 50% compared to now; this would lead to another (although smaller) demographic transition to reach the new plateau associated with the longer life expectancy;
  2. colonization of the currently uninhabited (or very sparsely inhabited) areas on the planet surface, including both deserts and oceans: again a new spike in population growth;
  3. space travel and the colonization of the inner planets (Mars and Venus at least) would lead to a massive spike in population growth (not world population only anymore, but global human population growth, of course), something that will go on for several centuries more.

These are just examples, of course, but each and all of them are quite plausible. And together with many others, possibly unthinkable at the moment, they are a hint that we are only one technological breakthrough away from the delay of the population stabilization than we can forecast at the moment.

And with it, of course, the associated growth in energy consumption.

Counter-objection #2: stable population and quality of life does not imply stable energy consumption

While it is true that most modern, industrial, “Western” societies have reached a largely stable population, quality of life and energy consumption, I posit that the stabilization of the energy consumption is not, in fact, due to the stabilization of the population and their standards of living. In fact, I will further argue that a stable population at our standards of living cannot be maintained without growing energy consumption. Allow me to justify the latter first, and then explain why we have the perception of a locally stable energy consumption where population and standards of living have reached our levels.

As I've already mentioned, the accelerating pace of technological progress has allowed us to avoid the Malthusian trap (so far): humanity has been able to circumvent the resource/population ratio inversion by improving resource utilization and regeneration at a faster pace than population growth. However, the cost of these improvements has always been paid in terms of energy consumption.

Increased crop yields rely on synthetic fertilizers —whose production is more energy-intensive than that of natural ones— and on agricultural machinery —whose construction and use is more energy-intensive than traditional human- or animal-based alternatives. Modern distribution networks are likewise more energy-consuming to build, maintain and use than footwork or animal transportation. For raw materials, especially those that are essentially non-renewable, the trap has been avoided by shifting consumption, as they became scarcer, from the “lower-hanging fruits” to materials that are harder to find, extract, create or manipulate, and that would therefore be prohibitive at lower levels of efficiency or energy production.

It's interesting to show the last part (material source transitions) from an example that will probably soon apply to energy production itself: as shown before, the EET of the current estimated uranium in known conventional sources (8 million metric tons) is only 14 years (assuming constant energy consumption growth of 2% per year, and nuclear alone being used for energy production). This means that soon uranium extraction from unconventional sources (especially the sea) will become not only convenient, but in fact the only possible way to keep maintaining our energy requirements —but extracting uranium from the sea is much more expensive, energy-wise, than the conventional methods.

In essence, what the industrial revolution has allowed has been to shift the entire burden of resource management into one single resource (category): energy. This, by the way, is why energy is the only resource I've discussed in the previous post: its EET is the only one that really matters, since expiration of any other resource can be compensated by increasing energy consumption.

For example, it has been said that “water is the oil of the 21st century”: this maxim is intended to mean that (clean, drinkable) water will become so scarce in the near future that it's likely to become as pricey and crucial as oil was (as primary energy source) in the XX century. Water, after all, is an essential resource for human survival and well-being both as a primary resource (drinking) and as secondary resource (e.g. farming), and with its usage growing at an exponential rate (doubling time: around 20 years), some scientists are worried that we'll soon hit its EET.

I'm actually not worried of that happening before we hit the energy EET, because with water like with any other resource we can (and in fact I predict we will) be able to expand our (clean, drinkable) water reserves trading out more energy consumption to reduce water consumption, improve filtering and develop better ways to extract useful water from the sea or the atmosphere.

In other words, as long as we can keep producing energy, humanity is largely unaffected by the Malthusian trap of other resources (or, in yet other words, the only resource that would trigger the Malthusian trap now is energy —and it will happen, as we've discussed in the previous post in this series).

The problem with that is: by avoiding the Malthusian trap, even if population stops increasing, we're already past the Malthusian trap point, meaning we're already consuming resources faster than they can regenerate: and this means that even if population stops growing, we will soon run out of the resources we're using, and we'll need to move to other, more “energetically expensive” resources to replace them. A similar argument holds for the environment: we have triggered a vicious cycle where our standards of living destroys the environment at a rate faster than it can regenerate, and this leads to higher energy consumption to preserve inhabited areas at levels which are more comfortable for humans (open air conditioning in the desert is only the prelude), which in turns accelerates the destruction of the environment, requiring a growing energy consumption to compensate: the “best” recipe for exponential growth.

That being said, it's quite possible (but see below) that the growth rate of energy consumption then (after the world population settles in size and standards of living) will be lower than the one we are experiencing now that the population is growing, and that's a good thing. But the key point is that our current standard of living still requires exponential growth in energy consumption just to be maintained at the present level.

Why then, one may ask, we are not seeing such growth in energy consumption in nations where the population and living standards have largely stabilized?

The answer to this is that what we are observing is a local aspect of a non-local phenomenon: a large part of the energy consumption needed to maintain our standards of living has been externalized, by outsourcing much (if not most) of the manufacturing process and resource extraction to the developing nations.

In other words, the energy consumption growth rate observed in developing nations accounts not only for the growth in size and standards of living of their population, but also for the maintenance of ours —hence energy consumption growth rates of 5% or higher in the face of population growth rates of 3% or lower.

In this situation it's obviously hard to isolate the component in energy consumption growth related to internal factors from the ones related to the burden of the maintenance of “stabilized” nations, but as the developing countries approach our levels of stability and quality of life, and the outsourcing possibilities diminish, we are likely to see a new redistribution (and relocalization) of the energy consumptions that will help characterize the factors. My “gut feeling” (correlating the energy consumption and population growth) is that the baseline (“maintenance-only”) energy consumption growth will remain around 2% (or marginally lower, but most likely not lower than 1%), but we'll have to wait and see.

And the conclusions?

Even though the estimation of the energy EET was not intended to be a prediction of how things will turn out, it's quite plausible that the current growth rate in energy consumption will continue long enough to get us there, unless either active action is taken to focus research on reducing the energy consumption (growth) needed to maintain our current standards of living or we end up hitting some other snag (before the energy EET) that leads to societal/civilization collapse, with the consequent drastic reduction in energy consumption.

And nuclear still won't save us.

Nuclear will not save us

Back-of-the-envelope calculations to why even nuclear won't save us, without curbing energy consumption

Introduction

Humanity has been looking for alternatives to fossil fuels for over a century, but the problem has started to become more pressing since the 1960s, when people started to reflect on the fact that the resources would sooner or later be exhausted, it was reinforced during the 1970s energy crisis and has been moved to the foreground of both energy and climate discussions, due to the significant impact that burning fossil fuels has on the environment (something that even the oil companies themselves have known for at least half a century, despite their reliance on —and frequent financial support to— “climate skeptics” to deny the significant effect of anthropogenic effects on climate change —something that has been known (or at least suspected) for decades, and they have finally admitted).

For a brief moment, nuclear energy was seen as the most viable alternative, but the enthusiasm behind it received a collective cold shower after the Chernobyl disaster and with the growing issue of the nuclear waste management, that has brought attention back to “renewables” (extracting energy from the wind, the sun or the water) —with its own sets of issue.

Nuclear power still has its fans, whose arguments mainly focus on two aspects:

  • nuclear is actually the “greenest” energy source, even compared to “renewables” (especially in the medium/long term);
  • nuclear is the only energy source that can keep up with the requirements of modern, advanced societies, especially if you cut out fossil fuels

I'm not going to debate the first point here, but I'll instead focus on the second one. And my argument won't be to deny the efficiency of nuclear power (in fact, the opposite), but to show that despite its efficiency, even nuclear power cannot keep up, and that the real issue we need to tackle, as we've known for decades if not centuries now, is our inability to understand the exponential function.

But let's get into the meat of the discussion.

Fact #1: nuclear energy production has the highest density

This is an undeniable fact by whichever means you measure the density: it is true when you compare it with any of the renewables in terms of energy produced per square meter of occupied land, and it is true if you compare it with any fossil fuel generator in terms of energy produced per unit of mass consumed.

For example, an actual nuclear power plant at the current technological level occupies around 3km² and produces around 1GW, with an effective (surface) density of about 300W/m². By comparison, geothermal can do at best 15W/m², and solar —that can peak at less than 200W/m² on a good day (literally)— will typically do around 7W/m² (considering the Sun cycles) —and everything else is less than a blip compared to that.

In terms of energy density, gasoline and natural gas with their 45MJ/kg and 55MJ/kg respectively are clear winners among fossil fuels, but their chemical energy density is completely eclipsed by the nuclear energy density of uranium: a 1GW plant consumes less than 30 tons of uranium per year, giving us an effective energy density (at our current technological level) of more than 1000GJ/kg: 5 orders of magnitude higher than that of the best fossil fuels. In fact, even going by the worst possible estimates the uranium ore (from which the actual uranium used as fuel is extracted) has an effective energy density of slightly less than 80MJ/kg, which is still more than 1.5 the maximum theoretical we can get from fossil.

These data points alone could explain why so many people remain solidly convinced that nuclear power is the only viable alternative to fossil fuels, despite the economical, political and social costs of nuclear waste management.

But there's more! The attentive reader will have noticed that I've insisted on the «current technological level» moniker. There's a reason for that: while fossil fuel as an energy source has a long and well-established history, with an associated enormous progress in the efficiency of its exploitation, the same can't be said neither for most renewables, nor for nuclear.

For example, solar irradiance on the Earth surface is around 1kW/m² —about 5 times what we manage to get from it in ideal conditions, and 3 times higher than the surface energy production density of a modern nuclear power plant. A technology breakthrough in solar energy production that could bring the efficiency from 20% to 80% would make solar competitive in massively irradiated regions (think: the Sahara desert).

But the same is true also for nuclear —and in fact, for nuclear, it's considerably more true: indeed, the upper bound on the amount of energy that can be produced from matter is given us by the famous E=mc2 mass–energy equivalence equation. If we could convert 1kg of mass entirely into energy, this would produce close to 90 petajoules of energy, 90 million GJ: 90 thousand times more than what a nuclear power plant can produce today from the fuel pellets fed to it.

If we managed to improve the efficiency of nuclear energy production by a factor of 1000, we'd have an efficiency of only about 1.3%, and it would still completely eclipse any other energy generation method even if they were 100% efficient.

To say that there's room for improvements would be the understatement of the millennium. And this, too, would be an argument in favor of the adoption of nuclear power, and most importantly in investing massively in research for its improvement (especially considering that more efficient production also means less waste to worry about).

And yet, as we'll be seeing momentarily, even reaching 100% efficiency in nuclear energy extraction will not save us.

Ballpark figure #1: mass of the Earth crust.

Let's now do a quick computation of the total mass of the Earth crust, the “thin” (on a planetary scale) layer whose surface veil is the land we trod upon.

The surface of the earth is marginally more than S=510106 km². To estimate the total mass of the crust, let's pretend, very generously, that the crust can be assumed to be H=50 km deep everywhere (this is actually only true for the thickest parts of the continental crust), and with a constant, uniform density equal to that of the most dense igneous rocks (ρ=3500 kg/m³). Rounding up, this gives us a mass of the crust equal to SHρ=91022 kg.

(This is quite a large overestimation, since the actual average thickness is less than half of H, and the average density is less than 3000 kg/m³, so we're talking about at best a third of the overestimation; but as we shall see, even the generous overestimation of 91022 metric tons will not save us.)

How much energy could we extract from the crust?

Let's play a little game. Let's pretend that we have a 100% efficient mass–energy conversion: 1kg of mass of any kind goes in, 90PJ of energy (and no waste!) comes out.

For comparison, the world's yearly primary energy consumption currently amounts to more than 170103 TWh —let's be generous and round it down to 600103 PJ.

If we had the amazing 100% mass-to-energy conversion technology, less than 7 (metric) tons of mass would be sufficient to satisfy the current energy requirements for the whole world in a year. (For comparison, a modern 1GW nuclear power plant produces 5 tons of waste per year.)

If we had this wonderfully 100% efficient technology, it would take R=1.31019 years, at the current energy consumption rate, to exhaust the 91019 (metric) tons of the Earth's crust.

(Try it from the other side: 91022 kg of mass producing 90 PJ/kg means 8.11024 PJ of energy, which divided by 6105 PJ of yearly consumption give us a more accurate R=1.351019.)

Needless to say, we wouldn't need to worry about wasting energy ever again, considering the sun will run out long before that (estimated: 5109 years).

Or would we?

Enter the exponential function

Looking again at the world's energy consumption, we can notice that it has been growing at an almost constant rate (a ballpark estimation from the plot gives us a rate of about 2% or 3% per year, corresponding to a doubling time of about 25 to 35 years) —that is, exponentially.

And a widespread idea among supporters of nuclear energy is that with nuclear there's no need to change that —nuclear energy is the solution, after all, given how much it can give us now, and how much potential it still has, there's no need to limit how much energy we use.

The math, however, says different. Since the energy consumption will grow over time, the previously computed ratio R=1.31019 does not tell us anymore the number of years before the crust is consumed —to determine that, we rather need to check how many doublings will fit in that ratio, which we can approximate by log2(R) —and that's less than 64 doublings: at the current growth rate, that means something between 1500 and 2000 years.

For a more detailed computation, we can apply the “Exponential Expiration Time” formula, found for example in Bartlett's work: the EET in our case ln(k1.351019+1)k, which gives us 1351 years for a 3% growth rate, and 2007 years at a 2% growth rate.

This deserves repeating: at the current rate at which energy consumption grows, the entire crust of our planet would run out in at most 2000 years in the best-case scenario that we manage to find a 100% efficient mass to energy conversion method within the next decade.

Be more realistic

The actual timespan we can expect is in fact much lower than that.

For example, we're nowhere close to being 100% efficient in mass to energy conversion: in fact, you'll recall that even if we manage to improve our efficiency by a thousandfold, we'll only be barely more than 1% efficient —meaning that even the two-orders-of-magnitude-lower R=1.351017 is still an extremely generous estimate.

But there's more: the mass of the Earth crust is likely one third of that of our gross overestimation, bringing R down to around R=4.51016. But what's worse, the amount of uranium in the crust is currently estimated to be only about 4 parts in a million, which would bring R further down to about R=1.81011.

To wit, that would give us between 747 and 1100 years before we ran out of fuel, assuming we managed to extract all of the uranium and convert it to energy with a 1% efficiency, which is a thousand times better than what we can do now..

I'll take this opportunity to clarify something important about the exponential function —with an example.

At our current tech level, we would have R=2.34108 —all the uranium would be gone in 525 to 768 years. For thorium, which is around 3 times more abundant, the estimate is 562 to 822 years. Now ask yourself: what if we use both? Surely that means over a thousand years (525+562), possibly closer to 2000 (768+822)?

No.

That's not how the exponential function works.

If energy consumption keeps growing at this steady 2-3% rate, thorium and uranium combined would only last 571 to 837 years: switching to thorium after depleting all the uranium would only add around 50 to 80 years.

Can it get worse?

It should be clear from even the most optimistic numbers seen so far that nuclear energy by itself is not sustainable in the long term: even if we switched entirely to nuclear power and found a breakthrough in the next decade or so that would bring the efficiency up by a thousand times, we won't last more than a few centuries before running out of energy, unless something is done to stop the exponential growth in energy consumption.

But it gets worse. I'm not particularly optimistic about humanity's wisdom. In fact, in my experience, the more a resource is abundant, the faster its consumption grows. And this goes for energy too.

In my mind, the biggest threat posed by nuclear power isn't even the risk posed by the mismanagement of the plants or of the still-radioactive waste. The biggest threat posed by nuclear power is the “yahoo, practically infinite energy in our hands!” attitude of its supporters, which is quite likely to lead to energy consumption growing at an even higher rate than the current one, if we ever switch to nuclear on a more extensive scale.

And with an increased growth rate, we'll run out of energy much, much earlier: at a 7% growth rate in energy consumption (doubling time: 10 years), all the estimated uranium in the crust would be gone in 237 years at our current tech level, or 332 years assuming we get the 1% efficiency breakthrough now; and the entire crust would be depleted in 591 years assuming 100% efficient mass-to-energy conversion from any material.

And no, there is no “we'll find something better in the mean time”, because there's nothing better than 100% efficient mass-to-energy conversion. Even harnessing the mass of other celestial bodies won't do more than extend the expiration time by another few hundred years, maybe a couple of millennia at best: at a growth rate of 2% and 100% conversion efficiency, the entire planet of Mars would last us for no more than 2105 years —and remember, that's not in addition to the depletion of the crust of our planet: in fact, adding the overestimated mass of Earth's crust to the mass of Mars won't even budge the expiration time by a single year.

The entire mass of all the celestial bodies in the solar system would last around 2500 years. If we add in the Sun (which means, essentially, just the mass of Sun, actually), we would still run out in 2852 years, at a 2% growth rate and 100% efficiency.

(Wait a second, I'll hear somebody say: how comes the Sun will last for billions of years still, but if we converted all of its mass into energy using our 100% efficient mechanism it won't even last 3000 years? And the answer, my friend, is again the exponential function: the Sun produces energy at a (more or less) constant rate, but we're talking about how quickly it will be depleted at a growing rate. Does that help put things in perspective? No? How about the entire Milky Way would last less than 43 centuries?)

So yes, there is no “we'll find something better”, not at the current growth rate.

The only sustainable option is reducing the growth rate of the total energy consumption.

Degrowth is the answer

Now, with this title I'm not proposing degrowth as the solution, I'm simply stating a fact: degrowth will happen, regardless of whether humanity choose voluntarily to go down that path or not. The only difference is how it will happen. But it will happen. Because if we don't wisen up and curb our own growth, we will run out of resources, and at the current growth rate that will happen at best in a few centuries, with or without nuclear power: and when it does happen (not if, but when), we will have sudden, drastic, forceful degrowth imposed on us by the lack of resources (most importantly, energy).

We're running towards an unbreakable wall. There is no other option but deceleration, and that's because deceleration will happen, whether we want it or not. Our only choice is between slowing down gracefully, and stopping before we hit the wall, or experiencing the sudden, instantaneous and painful deceleration that will happen the moment we hit that wall.

And now for the “good” news

Slowing down the growth rate is an extremely effective way to extend the EET. Let's have a look at this from our worst-case scenario: at the current technological level, and 3% growth rate, all of the estimated uranium and thorium in Earth's crust will be depleted in 571 years, but with a 2% growth rate it would last 837 years.

Dropping the growth rate to 1%, they would last 1605 years —which is more or less the EET for the entire crust at 100% efficient conversion with a 2.5% growth rate.

Going even lower, to 0.5% growth rate, they would last over 3000 years —more than it would take to deplete the Sun with 100% efficient conversion and a 2% growth rate.

TL;DR:
Increasing the adoption and efficiency of nuclear power generation can buy us maybe a few centuries.
Decreasing the growth rate can buy us millennia.

Where would you invest with these odds?

(See also the next article on the topic for additional details and comments on the plausibility of a continuing exponential growth.)

Getting ready for 2078

Taking advantage of IRC drama to change IRC client

There has recently been quite some drama on IRC: the largest IRC network dedicated to free/libre and open source software (FLOSS), Freenode, has been taken over by a fraudulent “entrepreneur”, causing the entire (volunteer!) staff that had operated the network for decades to just quit en masse to create an alternative to Freenode, named Libera.

Most communities and projects that previously relied on Freenode have now started a migration process to move to the newly established Libera or to the pre-existing OFTC networks, leaving Freenode with “skeleton” channels and users —so much so that the new Freenode administration has made changes to the Terms of Service to basically allow, if not straightforwardly encourage, hostile takeovers of “inactive” community channels.

Drama aside, I've taken the switchover from Freenode to Libera as an opportunity to do some long-needed cleanup of my IRC networks and channel list —but not just that.

In a famous XKCD strip, “Team Chat”, Randall Munroe pokes fun at the surprising persistence of IRC as a communication platform: from the “old days” in which it was the protocol for both real-time and asynchronous communication, to the current times, where every major innovation in “instant messaging” has to allow some kind of bridge to IRC, to a hypothetical future where all human consciousness has merged, except for that single individual that still uses IRC to interface and communicate with others. The alt-text of the comic reveals an even more distant future, where finally some progress is made … in a fashion:

2078: He announces that he's finally making the jump from screen+irssi to tmux+weechat.

This is quite the nerdy joke (as frequent with XKCD).

For the uninitiated, screen is a terminal multiplexer, i.e. a program that allows you to control multiple terminals from a single one. One of the major features of terminal multiplexers is that they are “resistant” to disconnections: if your connection fails while you're using the multiplexer, you can reattach to the previous session when the connection comes back up, allowing you to continue working with nothing worse than some wasted time. This particular feature makes it a very convenient “wrapper” in conjunction with an IRC client: you run the client from within a multiplexer session running on some server, and this allows you to reconnect to it from anywhere and never lose track of your IRC conversations.

The joke is that screen is “a bit long in the tooth”, and there are more modern and feature-rich terminal multiplexers around, tmux being the most common one. Similarly, irssi is by many considered now a bit “stale” and underdeveloped, compared to other IRC clients such as weechat. Still, most people have a tendency to stick to the tools they're used to (“if it's not broken, don't fix it”), so that switching to a more modern multiplexer and IRC client combo would be considered “more effort than it's worth” —it would take some very strong selling point of the new combo to convince them into investing time and active brain power for the switchover.

(It would be so much simpler if there were ways to convert ones' configuration from one tool to the other, but not only this isn't always possible, it's also such a low-priority feature for most developers that it's rarely done even when possible.)

In my case, I have long abandoned screen for tmux, not only to host my “permanent” connections to IRC, but in general for all my terminal multiplexing needs. (Why? That would be a long story, but the short of it is that I find the level of control and (VIM-like) command syntax of tmux sufficiently superior to their screen counterparts to justify the switch; finding a documented tmux configuration that eased the transition also helped a lot.)

So for a long time I was in a sort of hybrid (XKCD-wise) situation, using the venerable irssi as my IRC client, but within tmux. And with the Freenode/Libera.chat drama, I've had the opportunity to revisit the relevant XKCD comic, and finally give weechat a try.

I'm sold. I've now completed the transition to tmux+weechat, and thus consider myself ready for 2078.

(If you're curious about the reason why: weechat's selling point for me was their relay feature, that allows connection to a running weechat instance from e.g. the Android app in a more practical way than going through something like ConnectBot or its specialized cousin to connect to the IRC client running in a terminal multiplexer via SSH —because let's be honest here, Android as an operating system, and the devices on which it runs, aren't really designed for this kind of usage, usually.)

XPS 15 7590: the worst computer I've ever had

An excellent laptop on paper, ruined by a catastrophically bad implementation

On September 2019 I got a Dell XPS 15 7590, a powerful 15.6" laptop, to replace my previous Dell XPS 15 9570 from 2013 that, after 6 years of honorable service and several maintenance interventions (which, given the unusual stress I put my laptops in, was not unusual) was getting a bit too long in the tooth.

I have generally been quite satisfied with my Dell laptops (all the way back to the first one I owned, a venerable Dell Inpsiron 8100 with an out-of-this world 15" 1600×1200 UXGA high-density display, that I've mentioned before), with which I've had generally better luck than with laptops from other vendors.

This is however not the case with the one I'm currently using. In fact, my experience with this laptop has been so bad that I have no qualms in claiming that this is the worst computer I've ever had. (And I've had some pretty poor experiences, including a laptop HDD failing exactly one week after the warranty expiration, and working for two years with the shittiest display ever attached to a laptop.)

In fact, what makes the XPS 15 7590 situation particularly crappy is that not only it's a badly designed piece of hardware with components of debatable reliability (as I'm going to discuss momentarily), but it's the fact that —at least on paper— it's supposed to be a high-end powerhorse, starring an Intel Core i7-9750H 6-core/12-thred CPU running at 2.6GHz and an NVIDIA GeForce GTX 1650 3D accelerator, with a high-capacity battery to provide the user with several hours of gaming/officing/video streaming.

(Narrator: «It doesn't»)

There's so many things that went wrong in the materialization of this hardware that I'm not even sure where I should start listing them from.

First of all, I should probably mention that the power requirements of the laptop are enormous: you need a 130W power source to be able to use it while charging, and even with the battery configured for slow charging the power is still barely sufficient. I also have a strong suspicion that the distribution of the power within the system is far from reliable, due to at least two different symptoms: monitor flickering when switching from/to battery/AC, and the laptop simply shutting down when turning the discrete GPU / 3D accelerator on while on battery.

To make things worse, the power connector in the system is dramatically loose, leading to unpleasant situations where finding the correct angle/depth/tension to make the laptop even just sense that the power cord is inserted becomes a ridiculous game of contorsionism, or finding out the connector had gone disconnected by the laptop nearly dying under your hands (or suddently shutting off because you switched on the accelerator, as mentioned before).

It doesn't end here, obviously: a strong contributor to the power issues of this model is the horribly inadequate cooling system: the system runs at over 60°C even when under light load, with the fans having troubles keeping the temperature low enough under heavy load, leading to frequently throttling of both the CPU and the 3D accelerator —and a consequent massive reduction in performance (videos stuttering, gaming with FPS dropping in the single digits, long compilation time, near impossibility to do any serious benchmark of my HPC code).

And of course, the combination of higher-than-expected power requirements and lower-than expected cooling capabilities, the battery has never lasted as long as advertised (maybe half of that, out of the box).

The rest of the hardware isn't much better: it took several firmware updates to get the WiFi working reliably, Bluetooth connections still randomly die without apparent cause, and the touchpad has issues recovering from sleep mode. This last issue is particularly frustratring because it's not even easy to circumvent: when the touchpad is borked the touchscreen doesn't work either, and even external mice become unreliable due to the touchpad still firing up random events. (Apparently, a workaround is to keep the left touchpad bottom for a few seconds and this can help reset the device, or at least clear the queue or whatever else is causing the malfunction.)

Now, before anybody comes up and mention that I might just have been unlucky and drawn the short stick, getting myself a defective laptop —nope: these are structural issue, reported by several users, and not even related to the operating system (as a Linux user, I'm used to hardware issues related to poor testing with that operating system, and in fact I was half convinced that e.g. the touchpad issue might be Linux-related —but no, Windows users have the exact same issues, so it's something in hardware.)

In fact Dell even recently (January 2021) released a new BIOS version that tries to address several of the issues I mentioned, and while it does improve some of them up to a point, it's still not enough to completely fix most of them (e.g. the power cord detection is improved, but it's still extremely volatile, especially when the laptop has been on for several days; moreover, the touchpad still has issues when getting out of sleep mode). But at least the laptop does run cooler now (between 50 and 60 degrees Celsius with a light load) most of the time.

Now, as I've said before, I've had some pretty poor experience with laptops. Indeed, until I got this one, I would have said that the worst I've ever had was the one before the previous one: the one whose HDD died right after the warranty expiration, which was also the one with the, shall we say, less than stellar display; and flimsy plastic finishes; and several other small annoyances. Yet despite the traumatic experience of the HDD death (a one-off issue against every wise person should be adequately prepared) most of my gripes against the previous holder of the “worst laptop I've ever owned” had only minor annoyances counting against it. Also I came to it from my wonderful über-bright matte UXGA display of Inspiron 8100, which might have heavily biased me against its display.

But no, the XPS 15 7590 isn't like that. It's really bad.

Mind you, on paper it's really a wonderful laptop. The 4K display is also crystal clear —when it's not flickering due to the power distribution issues— and it's even a touch screen, if you choose that configuration1. The keyboard is backlit, and as laptop keyboards go, it's a pretty nice keyboard —except that some times it seems to eat up characters (but again this might be an operating system issue, although I've generally seen these issues when the touchpad isn't working either). The touchpad is large and comfortable to use, including support for multi-touch gestures —when it actually works. The number and type of connectors, while not exceptional, is pretty adequate, especially paired with the USB-C adapter with VGA, HDMI, Ethernet and USB-A 3.0 connectors. The CPU and 3D accelerator (discrete GPU) are high-end, top-of-the-line offers (for the release date of this model) —too bad you don't really get to exploit them at their full power for long, due to the thermal and power issues. The 32GB of RAM and 1TB of NVMe storage are also a very nice touch —and possibly the only thing that hasn't given me any significant issues … yet.

In the end, as I already mentioned, the biggest let-down is that what you're left with after all the issues are taken out isn't nowhere near what it was supposed, which —for the hefty price the product carries— is simply unacceptable.

I mean, if I pay 200€ for a laptop (I did, in fact, buy one such thing for my mother, that was quite strict on the upper bounds we were allowed to spend for her present) I don't expect much from it, other than the bare minimum. And in fact, with all its downsides and limitation, that laptop was exactly what we expected to be, and even managed to last way longer than we had envisioned, with minimal maintenance (although to be fair we did expand the RAM and we did replace the internal hard disk with an SSD). That's fine —I'm not buying a Ferrari, I don't expect a Ferrari.

But when I do buy a Ferrari Enzo, I most definitely don't expect to find myself using something that —on a good day— may at best resemble an Open Tigra with the pretense of being an Enzo.

TLDR

The single biggest (for me) issue is that the power connector is loose and will frequently drop the laptop out of charge.

A close second is the horrible thermals, and the consequent CPU and GPU throttling.

The inability for the touchpad to reliably come out of sleep is a distant third (at least inasmuch it can be worked around in ways that the other two issues cannot).


  1. I've had usage of the touchscreen lead to hard lock-ups for the system, but I'm quite sure this is an operating system/driver issue, and not a hardware one; I can't be 100% sure though, because it's an issue which is neither easy to reproduce nor easy to debug. ↩

(How to) avoid division by zero (in C)

Leveraging boolean operators to avoid divisions by zero without conditional expressions.

Let's say we're collecting some data, and we want to compute an average of the values. Or we computed the absolute error, and we want the relative error. This requires the division of some number (e.g. the sum of the values, or the absolute error) by some other number (e.g. the number of values, the reference value).

Catastrophe arises when the number we want to divide by is 0: if the list of values we want to average is empty, for example, we would end up with an expression such as 0/0 (undefined).

Programmatically, we would like to avoid such corner cases with as little hassle as possible. The standard way to handle these cases is by using conditional expressions: if the value we want to divide for is zero, do something special, otherwise do the division we're actually interested in.

This can be cumbersome.

In what follows, we'll assume that the special handling of the zero division case would be to return the numerator unchanged: we want r=ab if b is non-zero, otherwise r=a will do. In (C) code, this could be written:

if (b != 0)
    r = a/b;
else
    r = a;

We can write this more succinctly using the ternary operator:

r = b != 0 ? a/b : a;

or, leveraging the fact that any non-zero value is “true”:

r = b ? a/b : a;

I'll leave it to the reader to decide if this expression is more readable or not, but the fundamental issue remains that this kind of conditional handling is still not nice. Worse, if this is done in a loop (e.g. to convert a set of absolute errors into a set of relative errors, dividing each by the corresponding —potentially null!— reference value) It can even produce sub-optimal code on modern machines with vector capabilities: since the expression for the two sides is different, and there is no way to know (until the program is running) which elements will follow which path, the compiler will have to produce sub-optimal scalar code instead of potentially much faster vectorized code.

Ideally, we would want to have the same operation done on both sides of the conditional. This can, in fact, be achieved by remarking that a is the same as a/1. We can thus write:

r = a/(b ? b : 1);

The advantage of this expression is that, as the body of a loop, it leads to better vectorization opportunities, delegating the conditional to the construction of the divisor.

But we can do better! There's a nifty trick we can employ (at least in C), leveraging the fact that the boolean negation of any non-zero value is 0, and the boolean negation of 0 is 1. The trick is:

r = a/(b + !b);

Why does this work?

If b == 0, then !b == 1, and b + !b == 0 + 1 == 1.

If b != 0, then !b == 0, and b + !b == b + 0 == b.

The result of b + !b is thus exactly the same as b ? b : 1, without using conditionals.

Addendum (OpenCL C and vector types)

The trick above doesn't work if a, b are vector types, at least in OpenCL C since the specification in this case requires that the component-wise negation of 0 is -1 rather than 1. So, for vector types, the trick becomes:

r = a/(b - !b);

to correct for the difference in sign.

Other programming languages

The trick extends trivially to any programming language that can seamlessly cast between numerical and logical values, For example, in MATLAB, Octave or Scilab one would use:

r = a./(b + ~b)

for the same purpose (notice the use of ./ rather than / to allow component-wise division between equi-dimensional vectors or matrices), and in Python:

r = a/(b + (not b))

Other languages may need explicit casting. For example, the expression in Mathematica would be:

r = a/(b + Boole[b == 0])

using the Boole function introduced in version 5.1, and in FORTRAN you would need something even uglier such as

r = a/(b + MERGE(1, 0, b == 0))

(and a recent enough version of the standard where MERGE is defined, I believe this was introduced with F90) which is just as ugly as the C version with the ternary operator.

A surprising practical gadget: the finger mouse

Introduction

A friend of mine has been doing for a while now a weekly reading on YouTube. Sometimes you can clearly see him holding a computer mouse in his hands, whose only purpose is to scroll the reading material.

Myself, I'm a big fan of webcomics, and find myself frequently reading material that is published online in long strip format, where each chapter or episode is a single continuous vertical strip. This format is geared towards “mobile” usage, designed to be viewed on a display in “portrait” orientation, but if you're willing to risk it on your laptop (and don't want to spend the money to get a convertible one that can be transformed in a tablet), you can simply flip the laptop on its size, reading it like a book. The worst downside I've found to this configuration is —possibly suprisingly— the input mechanism.

The solution to my long strip webcoming fruition issues and my friend's reading is the same: something that allow scrolling documents on the computer without the full encumbrance of a traditional mouse.

Enter the finger mouse

The finger mouse, or ring mouse, is an input device that is tied to a finger and typically operated with the other fingers (usually the thumb) of the same hand.

There are at least three forms of finger mice, that I've seen, that chiefly differ by how motion is handled: the trackball, the “nub”, the gyro and the optical.

Trackball finger mice follow the same mechanism as traditional desktop trackballs, and thus the reverse of the old-style mice with balls: you roll the ball with the thumb, and the motion of the ball is converted into planar motions (combinations of left/right and up/down).

Nub finger mice follow the same mechanism as the TrackPoint™ or pointing stick found on some laptop keyboards (most famously IBM/Lenovo): the nub is pushed around with the thumb, and again this converts to planar motions.

Gyroscopic mice use an internal gyroscope to convert hand motions into planar motions. This has the advantage of freeing up some estate on the rest of the device for more buttons.

Finally, optical finger mice work exactly like the usual modern mice, with a laser and optical sensor, the only difference being that instead of holding them with your hands, the pointing device is tied to a finger.

The search (and the finding)

While researching finger mice options (as a present for my friend and obviously for me), I've been held back by two things: pricing and size.

Size was a particularly surprising issue: most of the finger mice options I've seen appear to be unwieldy, some even resembling more dashboards that would require a full hand (other than the one holding it) to operate, than a practical single-handed input device.

Price was no joke either: with more modest pricing ranging between 25€ and 50€, and some options breaking the 100€ barrier or even approaching 200€, one would be led to ask: who is the intended target for these devices? Most definitely not amateurs like me or my friend, but I would be hard pressed to find a justification even at the professional level, except maybe for the lower-cost options if you spent your life doing presentations.

Ultimately, I did find a palatable solution in this (knock-off?) solution: it has everything I wanted (i.e. an easily acccessible scrollwheel) and the price (around 10€ plus shipping) was low enough to cover the worst case scenario. And this is its review.

Upsides

First, the good news. I'm extremely favorably impressed by the device. It works, it does what I wanted it for, and it's in fact an exceptionally practical device. I mean, I'm not going to say it's good enough for gaming, but I did use it exactly for that too, in the end.

I'm not a pro gamer, most of the games I play are not particularly challenging and I'm generally not a fan of stuff that requires quick reflexes, and perfect timing. But, I do play puzzle-platform games and sometimes you do need pretty good control and timing for them. And I was able to achieve both with this device —definitely much more so with it than with my laptop's touchpad.

To wit, a couple of years ago I had abandoned The swapper shortly after starting it, because I came across a puzzle that had an obvious solution that I was unable to complete on my trackpad. Shortly after getting the new finger mouse, and using it to my enjoyment as no more than a scrollwheel for my weekly dose of long-strip webcomics, I decided to give it a go: let's see if we can finish that stupid puzzle; what's the worst that can happen?

In this case, the worst that happened was that I did manage to solve the puzzle, and many other puzzles after it, all while lying in bed with the laptop on my stomach, a hand on the keyboard (WASD) and the other, with the finger mouse, lying relaxed on the bed sheets. Until 2:30am.

So yes, it's accurate enough at least for casual gaming (I've also replayed Portal, and finally started Lugaru, which was unplayable on the touchpad) and what's more it works on surfaces where a standard mouse would have issues working, such as bed sheets and covers or the shirt or T-shirt you're wearing. Or the palmrests of your laptop, if you don't want to look too weird (but in that case you're not the kind of person that flips the laptop on its side to use it in portrait mode, so you have one less reason to enjoy this gadget).

The device runs on battery, with a single AAA battery. It has a physical switch to turn the power on and off, and from what I understand it goes into low power mode while not being in use too. And of course you can use recharable batteries in it without issues (it's what I'm using).

And it works out of the box (at least on my machine, running Linux).

Downsides

The device isn't perfect.

It's wireless, which while practical may be an issue for security-conscious people (and possibly health fanatics too).

It does require a surface for use as a mouse (but of course not if you only care about the scrollwheel, which is my case for the most part), but it's not that big of an issue since, as mentioned, I've been able to use it even on surfaces where even standard optical mice are notoriously problematic (there are, however, surfaces on top of which the finger mouse has issues too).

It can take a bit to get used to it, and it feels wierd. The most comfortable way to use it is to tie it to the outside of the middle finger, resting the index finger on top of it, and leaving the thumb to control the buttons and scrollwheel. It's not particularly heavy, but not exceptionally light either (yet I suspect a large part of the weight actually comes from the battery, so if you can find an extra-light battery, that might fix the issue for you). I got used to it and it doesn't annoy me in the least, but I've read reviews of people that find it too weird, so this is most definitely subjective.

It ties to the finger with a strap; this allows freedom to regulate the tightness, but it may be difficult to find the optimal one: too tight, and the diminished circulation can make your finger go numb; not tight enough, and the wiggling will chafe your skin.

It's designed to be used with the right hand. This isn't a big problem for me, since I've always used mice with my right hand even though I'm left-handed, but it might be an issue for other people. It can be used with the left hand, and the most practical way I've found for is to tie it to the inside of the middle finger (so it's inside your hand, more similar to classic mice), but you'll need to flip the axis directions (both horizontal and vertical —and possibly the buttons too) unless you use it on your stomach.

Availability

The specific product I bought for myself is already not available anymore on the Amazon page, but several other similarly-priced variants are there. The product I have identifies with USB ID 062A:4010, registered to MosArt for a wireless Keyboard/Mouse combo (even though in this case there's only a mouse), and I've seen the same product ID used in several cheapo brands mouse and keyboard/mouse combos (Trust, RadioShack, etc). Products similar to mine, always from no-name brands and at similar (around 10€, sometimes less) prices, can also be found on both Amazon and other e-commerce sites. I don't know how closesly they match the products I've reviewed (aside from the branding), but given my package flew in almost directly from the factory in China, I'm going out on a limb and guess that for the most part they're all the same thing.

Ah, you want pictures too? There's a couple on my Twitter.

Por una subraya

Days of work lost because of an underscore

(I'm told guion bajo is the preferred name for the underscore sign _ in Castilian, but that would have made it harder to echo Por una cabeza. Then again, why the Spanish title? Because.)

(Also, this is going to be a very boring post, because it's mostly just a rant to let off some steam after a frustrating debug session.)


I'm getting into the bad habit of not trusting the compiler, especially when it comes to a specific compiler1. I'm not sure if there's a particular reason for that, other than —possibly— a particular dislike for its closed nature, or past unpleasant experiences in trying to make it work with the more recent versions of the host compiler(s).

Compilers have progressed enourmously in the latest years. I have a strong suspicion that this has been by and large merit of the (re)surgence of the Clang/LLVM family, and the strong pressure it has put the GCC developers under —with the consequent significant improvements on both sides.

However, compilers that need to somehow interact with these compilers (most famously the nvcc compiler developed by NVIDIA for CUDA) have a tendency to lag behind: you can't always the latest version of GCC (or Clang for the amtter) with them, and they themselves do not provide many of the benefits that developers have come to expect from modern compiler, especially in the fields of error and warning message quality and detail, or even in the nature of those same warnings and errors.


This rant is born out of a stressing and frustrating debugging session that has lasted for a few days, and that could have easily been avoided with better tools. What made the bug particularly frustrating was that it seemed to trigger or disappear in the most incoherent of circurmstances. Adding some conditional code (even code that would never run) or moving code around in assumingly idempotent transformations would be enough to make it appear, or disappear again, until the program was recompiled.

The most frustrating part was that, when the code seemed to work, it would seem to work correctly (or at least give credible results). When it seemed to not work, it would simply produce invalid values from thin air.

The symptoms, for anyone with some experience in the field, would be obviously: reading from unitialized memory —even if for some magic reason it seemed to work (when it worked) despite the massively parallel nature of the code and the hundreds of thousands of cycles it ran for.

The code in question is something like this:

struct A : B, C, D
{
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
    :
        B(index_, params),
        C(index_, pdata, params),
        D(r, params),
        relPos(relPos_),
        r(r_),
        mass(relPos.w),
        f(func(r, params))
    {}
};

Can you spot what's wrong with the code?

Spoiler Alert!

Here's the correct version of the code:

struct A : B, C, D
{
    float4 relPos;
    float r;
    float mass;
    float f;
/* etc */
    A(params_t const& params, pdata_t const& pdata,
      const int index_, float4 const& relPos_, const float r_)
    :
        B(index_, params),
        C(index_, pdata, params),
        D(r_, params),
        relPos(relPos_),
        r(r_),
        mass(relPos.w),
        f(func(r, params))
    {}
};

The only difference, in case you're having trouble noticing, is that D is being initialized using r_ instead of r.

What's the difference? The object we're talking about, and initialization order. r is the member of our structure, r_ is the parameter we're passing to the constructor to initialize it. After the structure initialization is complete, they will hold the same value, but until r gets initialized (with the value r_), its content is undefined, and using it (instead of r_) will lead to undefined behavior; and D gets initialized before r, because it's one of the parent structures for the structure we want to initialize —and note that this would happen even if we put the initialization of r before the initialization of D, because initialization actually happens in the order the members (and parents) are declared, not in the order their initialization is expressed.

That single _ made me waste at least two days of work.

Now, this error is my fault —it's undoubtedly my fault, it's a clear example of PEBKAC. And yet, proper tooling would have caught it for me, and made it easier to debug.


  1. if you want to know, I'm talking about the nvcc compiler, i.e. the compiler the handles the single-source CUDA files for GPU programming. ↩

10 digits

The question

How many digits do you need, in base 10, to represent a given (binary) number?

A premise

The C++ standard defines a trait for numerical datatypes that describes “the number of base-10 digits that can be represented by a given type without change”: std::numeric_limits::digits10.

What this means is that all numbers with at most that many digits in base 10 will be representable in the given type. For example, 8-bit integers can represent all numbers from 0 to 99, but not all numbers from 0 to 999, so their digit10 value will be 2.

For integer types, the value can be obtained by taking the number of bits (binary digits) used by the type, dividing by log2(10) (or multiplying by log10(2), which is the same thing), and taking the integer part of the results.

This works because with n bits you can represent 2n values, and with d digits you can represent 10d values, and the condition for digit10 is that d should be such that 10d2n. By taking the logarithm on both sides we get dlog10(2n)=nlog10(2), and since d must be an integer, we get the formula d=nlog10(2).

(Technically, this is still a bit of a simplification, since actually the highest representable number with n bits is 2n-1, and that's still only for unsigned types; for signed one things get more complicated, but that's beyond our scope here.)

The answer

What we want is in some sense the complement of digit10, since we want to ensure that our number of (decimal) digits will be sufficient to represent all numbers of the binary type. Following the same line of reasoning above, we want d such that 2n10d, and thus, skipping a few passages, d=nlog10(2), at least assuming unsigned integer types.

We're looking for the simplest formula that gives us the given result. With C++, we could actually just use digits10 plus one, but we want something independent, for example because we want to use this with C (or any other language that doesn't have a digits10 equivalent).

The first thing we want to do is avoid the logarithm. We could compute the actual value, or at least a value with sufficient precision, but in fact we'll avoid doing that, and instead remember that 210 is pretty close to 103, which puts the logarithm in question in the 310 ballpark, an approximation that is good enough for the first several powers of 210.

With this knowledge, we can approximate dn310. In most programming languages integer division with positive operands returns the floor rather than the ceiling, but it can be turned into something that returns the ceiling by adding to the numerator one less than the denominator1. So:

d=n3+910

is the formula we're looking for. In a language like C, where the size of types is given in bytes, that would be come something like

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*3+9)/10)

where we're assuming 8 bits per byte (adapt as needed if you're on an insane architecture).

Limitations

The C expression provided above isn't universal. It is better than the even more aggressive approximation sizeof(type)*CHAR_BIT/3, which for example fails for 8-bit bytes (gives 2 instead of 3) and overestimates the result for 64-bit data types (gives 21 instead of 20), but it's not universal.

It works for most standard signed data types, because the number of base-10 digits needed to represent them is almost always the same as their unsigned equivalents, but for example it doesn't work for 64-bit data types (the signed ones need one less digit in this case).

Moreover, it actually starts breaking down for very large integers, because the 310 approximation commits an error of about 10% which starts becoming significant at 256 bits or higher: the formula predicts 77 digits, but 78 are actually needed.

We can expand this by taking more digits to approximate the logarith. For example

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*301+999)/1000)

doesn't break down until 4096 bits, at which point it misses one digit again. On the other hand

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*30103+99999)/100000)

can get us reasonably high (in fact, by a quick check it seems this formula should work correctly even for types with 2216=265536 bits, if not more). It also has a nice symmetry to it, even though I guess it would overflow on machines with smaller word sizes (but then again, you probably wouldn't need it there anyway).


  1. If a,b are non-negative integers with b>0, then a+b-1b=ab: (1) if a is a multiple of b, then adding b-1 doesn't go to the next multiple, and thus on both sides we have ab (which is an integer) and (2) if a is not a multiple of b, adding b-1 will overtake exactly one multiple of b. More formally, we can write a=kb+c where k,c are non-negative integers, and c<b (c=0 if a is a multiple, and c>0 otherwise). Define s=sign(c), i.e. s=0 if c=0 and s=1 otherwise. Then a+b-1b=kb+c+b-1b=(k+1)+c-1b=k+1-(1-s)=k+s and ab=kb+cb=k+cb=k+s. ↩

Mixed DPI and the X Window System

I'm writing this article because I'm getting tired of repeating the same concepts every time someone makes misinformed statements about the (lack of) support for mixed-DPI configurations in X11. It is my hope that anybody looking for information on the subject may be directed here, to get the facts about the actual possibilities offered by the protocol, avoiding the biased misinformation available from other sources.

If you only care about “how to do it”, jump straight to The RANDR way, otherwise read along.

So, what are we talking about?

The X Window System

The X Window System (frequently shortened to X11 or even just X), is a system to create and manage graphical user interfaces. It handles both the creation and rendering of graphical elements inside specific subregions of the screen (windows), and the interaction with input devices (such as keyboards and mice).

It's built around a protocol by means of which programs (clients) tell another program (the server, that controls the actual display) what to put on the screen, and conversely by means of which the server can inform the client about all the necessary information concerning both the display and the input devices.

The protocol in question has evolved over time, and reached version 11 in 1987. While the core protocol hasn't introduced any backwards-incompatible changes in the last 30 years (hence the name X11 used to refer to the X Window System), its extensible design has allowed it to keep abreast of technological progress thanks to the introduction and standardization of a number of extensions, that have effectively become part of the subsequent revisions of the protocol (the last one being X11R7.7, released in 2012; the next, X11R7.8, following more a “rolling release” model).

DPI

Bitmapped visual surfaces (monitor displays, printed sheets of paper, images projected on a wall) have a certain resolution density, i.e. a certa number of dots or pixels per unit of length: dots per inch (DPI) or pixel per inch (PPI) is a common way to measure it. The reciprocal of the the DPI is usually called “dot pitch”, and refers to the distance between adjacent dots (or pixels). This is usually measured in millimeters, so conversion between DPI and dot pitch is obtained with

DPI   = pitch/25.4
pitch = 25.4/DPI

(there being 25.4 millimeters to the inch).

When it comes to graphics, knowing the DPI of the output is essential to ensure consistent rendering (for example, a drawing program may have a “100% zoom” option where the user might expect a 10cm line to take 10cm on screen), but when it comes to graphical interface elements (text in messages and labels, sizes of buttons and other widgets) the information itself may not be sufficient: usage of the surface should ideally also be taken into consideration.

To this end, the concept of reference pixel was introduced in CSS, representing the pixel of an “ideal” display with a resolution of exactly 96 DPI (dot pitch of around 0.26mm) viewed from a distance of 28 inches (71cm). The reference pixel thus becomes the umpteenth unit of (typographical) length, with exactly 4 reference pixels every 3 typographical points.

Effectively, this allows the definition of a device pixel ratio, as the ratio of device pixels to reference pixels, taking into account the device resolution (DPI) and its assumed distance from the observer (for example, a typical wall-projected image has a much lower DPI than a typical monitor, but is also viewed from much further away, so that the device pixel ratio can be assumed to be the same).

Mixed DPI

A mixed-DPI configuration is a setup where the same display server controls multiple monitors, each with a different DPI.

For example, my current laptop has a built-in 15.6" display (physical dimensions in millimeters: 346×194) with a resolution of 3200×1800 pixels, and a pixel density of about 235 DPI —for all intents and purposes, this is a HiDPI monitor, with slightly higher density than Apple's Retina display brand. I frequently use it together with a 19" external monitor (physical dimensions in millimeters: 408×255) with a resolution of 1440×900 pixels and a pixel density of about 90 DPI —absolutely normal, maybe even somewhat on the lower side.

The massive difference in pixel density between the two monitors can lead to extremely inconsistent appearance of graphical user interfaces that do not take it into consideration: if they render assuming the standard (reference) DPI, elements will appear reasonably sized on the external monitor, but extremely small on the built-in monitor; conversely, if they double the pixel sizing of all interface elements, they will appear properly sized on the built-in monitor, but oversized on the external one.

Proper support for such configuration requires all graphical and textual elements to take a number of pixel which depends on the monitor it is being drawn on. The question is: is this possible with X11?

And the answer is yes. But let's see how this happens in details.

A brief history of X11 and its support for multiple monitors

The origins: the X Screen

An interesting aspect of X11 is that it was designed in a period where the quality and characteristics of bitmap displays (monitors) was much less consistent than it is today. The core protocol thus provides a significant amount of information for the monitors it controls: the resolution, the physical size, the allowed color depth(s), the available color palettes, etc.

A single server could make use of multiple monitors (referred to as “X Screen”s), and each of them could have wildly different characteristics (for example: one could be a high-resolution monochrome display, the other could be a lower-resolution color display). Due to the possible inconsistency between monitors, the classical support for multiple monitors in X did not allow windows to be moved from one X Screen to another. (How would the server render a window created to use a certain kind of visual on a different display that didn't support it?)

It should be noted that while the server itself didn't natively support moving windows across X Screens, clients could be aware of the availability of multiple displays, and they could allow (by their own means) the user to “send” a window to a different display (effectively destroying it, and recreating it with matching content, but taking into account the different characteristics of the other display).

A parenthetical: the client, the server and the toolkit

Multiple X Screen support being dependent on the client, rather than the server, is actually a common leit motif in X11: due to one of its founding principles (“mechanism, not policy”), a lot of X11 features are limited only by how much the clients are aware of them and can make use of them. So, something may be allowed by the protocol, but certain sets of applications don't make use of the functionality.

This is particularly relevant today, when very few applications actually communicate with the X server directly, preferring to rely on an intermediate toolkit library that handles all the nasty little details of communicating with the display server (and possibly even display servers of different nature, not just X11) according to the higher-level “wishes” of the application (“put a window with this size and this content somewhere on the screen”).

The upside of this is that when the toolkit gains support for a certain feature, all applications using it can rely (sometimes automatically) on this. The downside is that if the toolkit removes support for certain features or configurations, suddenly all applications using it stop supporting them too. We'll see some example of this specifically about DPI in this article.

Towards a more modern multi-monitor support: the Xinerama extension

In 1998, an extension to the core X11 protocol was devised to integrate multiple displays seamlessly, making them appear as a single X Screen, and thus allowing windows to freely move between them.

This extension (Xinerama) had some requirements (most importantly, all displays had to support the same visuals), but for the most part they could be heterogeneous.

An important downside of the Xinerama extension is that while it provides information about the resolution (in pixels) and relative position (in pixels!) of the displays, it doesn't reveal any information about the physical characteristics of the displays.

This is an important difference with respect to the classic “separate X Screens” approach: the classic method allowed clients to compute the monitor DPI (as both the resolution and the physical size were provided), but this is not possible in Xinerama.

As a consequence, DPI-aware applications were actually irremediably broken on servers that only supported this extension, unless all the outputs had the same (or similar enough) DPI.

Modern multi-monitor in X11: the XRANDR extension

Xinerama had a number of limitations (the lack of physical information about the monitors being just one of many), and it was essentially superseded by the RANDR (Resize and Rotate) extension when the latter reached version 1.2 in 2007.

Point of interest for our discussion, the RANDR extension took into consideration both the resolution and physical size of the display even when originally proposed in 2001. And even today that it has grown in scope and functionality, it provides all necessary information for each connected, enabled display.

The RANDR caveat

One of the main aspects of the RANDR extension is that each display is essentially a “viewport” on a virtual framebuffer. This virtual framebuffer is the one reported as “X Screen” via the core protocol, even though it doesn't necessarily match any physical screen (not even when a single physical screen is available!).

This gives great flexibility on how to combine monitors (including overlaps, cloning, etc); the hidden cost is that all of the physical information that the core protocol would report about the virtual backend to its X Screen become essentially meaningless.

For this reason, when the RANDR extension is enabled, the core protocol will synthetize ficticious physical dimensions for its X Screen, from the overall framebuffer size, assuming a “reference” pixel density of 96 DPI.

When using a single display covering the whole framebuffer, this leads to a discrepancy between the physical information provided by the core protocol, and the one reported by the RANDR extension. Luckily, the solution for this is trivial, as the RANDR extension allows changing the ficticious dimensions of the X Screen to any value (for example, by using commands such as xrandr --dpi eDP-1, to tell the X server to match the core protocol DPI information to that of the eDP-1 output).

Mixed DPI in X11

Ultimately, X11, as a display protocol, has almost always had support for mixed DPI configurations. With the possible exception of the short period between the introduction of Xinerama and the maturity of the RANDR extension, the server has always been able to provide its clients with all the necessary information to adapt their rendering, window by window, widget by widget, based on the physical characteristics of the outputs in use.

Whether or not this information is being used correctly by clients, however, it's an entirely different matter.

The core way

If you like the old ways, you can manage your mixed DPI setup the classic way, by using separate X Screens for each monitor.

The only thing to be aware of is that if your server is recent enough (and supports the RANDR extension), then by default the core protocol will report a DPI of 96, as discussed here. This can be worked around by calling xrandr as appropriate during the server initialization.

Of course, whether or not applications will use the provided DPI information, X Screen by X Screen, is again entirely up the application. For applications that do not query the X server about DPI information (e.g. all applications using GTK+3, due to this regression), the Xft.dpi resource can be set appropriately for each X Screen.

The RANDR way

On a modern X server with RANDR enabled and monitors with (very) different DPIs merged in a single framebuffer, well-written applications and toolkits can leverage the information provided by the RANDR extension to get the DPI information for each output, and use this to change the font and widget rendering depending on window location.

(This will still result in poor rendering when a window spans multiple montiors, but if you can live with a 2-inch bezel in the middle of your window, you can probably survive misrendering due to poor choice of device pixel ratios.)

The good news is that all applications using the Qt toolkit can do this more or less automatically, provided they use a recent enough version (5.6 at least, 5.9 recommended). Correctly designed Applications can request this behavior from the toolkit on their own (QApplication::setAttribute(Qt::AA_EnableHighDpiScaling);), but the interesting thing is that the user can ask this to be enabled even for legacy applications, by setting the environment variable QT_AUTO_SCREEN_SCALE_FACTOR=1.

(The caveat is that the scaling factor for each monitor is determined from the ratio between the device pixel ratio of the monitor and the device pixel ratio of the primary monitor. So make sure that the DPI reported by the core protocol (which is used as base reference) matches the DPI of your primary monitor —or override the default DPI used by Qt applications by setting the QT_FONT_DPI environment variable appropriately.)

The downside is that outside of Qt, not many applications and tookits have this level of DPI-awareness, and the other major toolkit (GTK+) seems to have no intention to acquire it.

A possible workaround

If you're stuck with poorly written toolkits and applications, RANDR still offers a clumsy workaround: you can level out the heterogeneity in DPI across monitors by pushing your lower-DPI displays to a higher virtual resolution than their native one, and then scaling this down. Combined with appropriate settings to change the DPI reported by the core protocol, or the appropriate Screen resources or other settings, this may lead to a more consistent experience.

For example, I could set my external 1440×900 monitor to “scale down” from a virtual 2880×1800 resolution (xrandr --output DP-1 --scale-from 2880x1800), which would bring its virtual DPI more on par with that of my HiDPI laptop monitor. The cost is a somewhat poorer image overall, due to the combined up/downscaling, but it's a workable workaround for poorly written applications.

(If you think this idea is a bit stupid, shed a tear for the future of the display servers: this same mechanism is essentially how Wayland compositors —Wayland being the purported future replacement for X— cope with mixed-DPI setups.)

Final words

Just remember, if you have a mixed DPI setup and it's not properly supported in X, this is not an X11 limitation: it's the toolkit's (or the application's) fault. Check what the server knows about your setup and ask yourself why your programs don't make use of that information.

If you're a developer, follow Qt's example and patch your toolkit or application of choice to properly support mixed DPI via RANDR. If you're a user, ask for this to be implemented, or consider switching to better applications with proper mixed DPI support.

The capability is there, let's make proper use of it.

A small update

There's a proof of concept patchset that introduces mixed-DPI support for GTK+ under X11. It doesn't implement all of the ideas I mentioned above (in particular, there's no Xft.dpi support to override the DPI reported by core), but it works reasonably well on pure GTK+ applications (more so than in applications that have their own toolkit abstraction layer, such Firefox, Chromium, LibreOffice).

Cross-make selection

I was recently presented, as a mere onlooker, with the potential differences that exist in the syntax of a Makefile for anything non-trivial, when using different implementations of make.

(For the uninitiated, a Makefile is essentially a list of recipes that are automatically followed to build some targets from given dependencies, and are usually used to describe how to compile a program. Different implementations of make, the program that reads the Makefiles and runs the recipes, exist; and the issue is that for anything beyond the simplest of declarations and recipe structure, the syntax they support is different, and incompatible.)

Used as I was to using GNU make and its extensive set of functions and conditionals and predefined macros and rules, I rarely bothered looking into alternatives, except maybe for completely different build systems or meta-build-systems (the infamous GNU autotools, cmake, etc). However, being presented with the fact that even simple text transformations could not be done in the same way across the two major implementations of make (GNU and BSD) piqued my curiosity, and I set off to convert the rather simple (but still GNU-dependent) Makefile of my clinfo project to make it work at least in both GNU and BSD make.

Get the code for
clinfo:

gitweb
clinfo
git
clinfo
GitHub
clinfo

Since clinfo is a rather simple program, its Makefile is very simple too:

  1. it defines the path under which the main source file can be found;
  2. it defines a list of header files, on which the main source file depends;
  3. it detects the operating system used for the compilation;
  4. it selects libraries to be passed to the linker to produce the final executable (LDLIBS), based on the operating system.

The last two points are necessary because:

  • under Linux, but not under any other operating system, the dl library is needed too;
  • under Darwin, linking to OpenCL is done using -framework OpenCL, whereas under any other operating system, this is achieved with a simpler -lOpenCL (provided the library is found in the path).

In all this, the GNU-specific things used in the Makefile were:

  1. the use of the wildcard function to find the header files;
  2. the use of the shell function to find the operating system;
  3. the use of the ifeq/else/endif conditionals to decide which flags to add to the LDLIBS.

Avoiding wildcard

In my case, the first GNUism is easily avoided by enumerating the header files explicitly: this has the underside that if a new header file is ever added to the project, I should remember to add it myself.

(An alternative approach would be to use some form of automatic dependency list generation, such as the -MM flag supported by most current compiles; however, this was deemed overkill for my case.)

(A third option, assuming a recent enough GNU make, is presented below.)

Avoiding shell

BSD make supports something similar to GNU make's shell function by means of the special != assignment operator. The good news is that GNU make has added support for the same assignment operator since version 4 (introduced in late 2013). This offers an alternative solution for wildcard as well: assigning the output of ls to a variable, using !=.

If you want to support versions of GNU make older than 4, though, you're out of luck: there is no trivial way to assign the output of a shell invocation to a Makefile variable that works on both GNU and BSD make (let alone when strict POSIX compliance is required).

If (and only if) the assignments can be done ‘before’ any other assignment is done, it is however possible to put them into a GNUmakefile (using GNU's syntax) and makefile (using BSD's syntax), and then have both of these include the shared part of the code. This works because GNU make will look for GNUmakefile first.

In my case, the only call to shell I had was a $(shell uname -s) to get the name of the operating system. The interesting thing in this case is that BSD make actually defines its own OS variable holding just what I was looking for.

My solution was therefore to add a GNUmakefile which defined OS using the shell invocation, and then include the same Makefile which is parsed directly by BSD make.

Conditional content for variables

Now comes the interesting part: we want the content of a variable (LDLIBS in our case) to be set based on the content of another variable (OS in our case).

There are actually two things that we want to do:

  1. (the simple one) add something to the content of LDLIBS only if OS has a specific value;
  2. (the difficult one) add something to the content of LDLIBS only if OS does not have a specific value.

Both of these would be rather trivial if we had conditional statements, but while both BSD and GNU make do have them, their syntax is completely incompatible. We therefore have to resort to a different approach, one that leverages features present in both implementations.

In this case, we're going to use the fact that when using a variable, you can use another variable to decide the name of the variable to use: whenever make comes across the syntax $(foo) (or ${foo}), it replaces it with the content of the foo variable. The interesting thing is that this holds even within another set of $() or ${}, so that if foo = bar and bar = quuz, then $(${foo}) expands to $(bar) and thus ultimately to quuz.

Add something to a variable only when another variable has a specific value

This possibility actually allows us to solve the ‘simple’ conditional problem, with something like:

LDLIBS_Darwin = -framework OpenCL
LDLIBS_Linux  = -ldl
LDLIBS += ${LDLIBS_$(OS)}

Now, if OS = Darwin, LDLIBS will get extended by appending the value of LDLIBS_Darwin; if OS = Linux, LDLIBS gets extended by appending the value of LDLIBS_Linux, and otherwise it gets extended by appending the value of LDLIBS_, which is not defined, and thus empty.

This allows us to achieve exactly what we want: add specific values to a variable only when another variable has a specific value.

Add something to a variable only when another variable does not have a specific value

The ‘variable content as part of the variable name’ trick cannot be employed as-is for the complementary action, which is adding something only when the content of the control variable is not some specific value (in our case, adding -lOpenCL when OS is not Darwin).

We could actually use the same trick if the Makefile syntax allowed something like a -= operator to ‘remove’ things from the content of a variable (interestingly, the vim scripting and configuration language does have such an operator). Since the operator is missing, though, we'll have to work around it, and to achieve this we will use the possibility (shared by both GNU and BSD make) to manipulate the content of variables during expansion.

Variable content manipulation is another field where the syntax accepted by the various implementations differs wildly, but there is a small subset which is actually supported by most of them (even beyond GNU and BSD): the suffix substitution operator.

The idea is that often you want to do something like enumerate all your source files in a variable sources = file1.c file2.c file3.c etc and then you want to have a variable with all the object files that need to be linked, that just happen to be the same, with the .c suffix replaced by .o: in both GNU and BSD make (and not just them), this can be achieved by doing objs = $(sources:.c=.o). The best part of this is that the strings to be replaced, and the replacement, can be taken from the expansion of a variable!

We can then combine all this knowledge into our ‘hack’: always include the value we want to selectively exclude, and then remove it by ‘suffix’ substitution, where the suffix to be replaced is defined by a variable-expanded variable name: a horrible, yet effective, hack:

LDLIBS = -lOpenCL
LDLIBS_not_Darwin = -lOpenCL
LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

This works because when OS = Darwin, the substitution argument will be $(LDLIBS_not_Darwin) which in turn expands to -lOpenCL, so that in the end the value assigned to LDLIBS will be ${LDLIBS:-lOpenCL=}, which is LDLIBS with -lOpenCL replaced by the empty string. For all other values of OS, we'll have ${LDLIBS:=} which just happens to be the same as ${LDLIBS}, and thus LDLIBS will not be changed1

Cross-make selection

We can then combine both previous ideas:

LDLIBS = -lOpenCL

LDLIBS_Darwin = -framework OpenCL
LDLIBS_not_Darwin = -lOpenCL
LDLIBS_Linux  = -ldl

LDLIBS += ${LDLIBS_$(OS)}
LDLIBS := ${LDLIBS:$(LDLIBS_not_${OS})=}

And there we go: LDLIBS will be -framework OpenCL on Darwin, -lOpenCL -ldl on Linux, -lOpenCL on any other platform, regardless of wether GNU or BSD make are being used.

Despite the somewhat hackish nature of this approach (especially for the ‘exclusion’ case), I actually like it, for two reasons.

The first is, obviously, portability. Not requiring a specific incarnation of make is at the very least an act of courtesy. Being able to do without writing two separate, mostly duplicate, Makefiles is even better.

But there's another reason why I like the approach: even though the variable-in-variable syntax isn't exactly the most pleasurable to read, the intermediate variable names end up having a nice, self-explanatory name that gives a nice logical structure to the whole thing.

That being said, working around this kind of portability issues can make a developer better appreciate the need for more portable build systems, despite the heavier onus in terms of dependencies. Of course, for a smaller projects, deploying something as massive as autotools or cmake would still be ridiculous overkill: so to anyone that prefers leaner (if more fragile) options, I offer this set of solutions, in the hope that they'll help stimulate convergence.


  1. technically, we will replace the unexpanded value of LDLIBS with its expanded value; the implications of this are subtle, and a bit out of scope for this article. As long as this is kept as the 'last' change to LDLIBS, everything should be fine. ↩

Poi poi, mai mai

Note: this article makes use of MathML, the standard XML markup for math formulas. Sadly, this is not properly supported on some allegedly ‘modern’ and ‘feature-rich’ browsers. If the formulas don't make sense in your browser, consider reporting the issue to the respective developers and/or switching to a standard-compliant browser.

For the last three decades or so, non-integer numbers have been represented on computers following (predominantly) the floating-point standard known as IEEE-754.

The basic idea is that each number can be written in what is also known as engineering or scientific notation, such as 2.34567×1089, where the 2.34567 part is known as the mantissa or significand, 10 is the base and 89 is the exponent. Of course, on computers 2 is more typically used as base, and the mantissa and exponent are written in binary.

Following the IEEE-754 standard, a floating-point number is encoded using the most significant bit as sign (with 0 indicating a positive number and 1 indicating a negative number), followed by some bits encoding the exponent (in biased representation), and the rest of the bits to encode the fractional part of the mantissa (the leading digit of the mantissa is assumed to be 1, except for denormals in which case it's assumed 0, and is thus always implicit).

The biased representation for the exponent is used for a number of reasons, but the one I care about here is that it allows “special cases”. Specifically, the encoded value of 0 is used to indicate the number 0 (when the mantissa is also set to 0) and denormals (which I will not discuss here). An exponent with all bits set to 1, on the other hand, is used to represent (when the mantissa is set to 0) and special values called “Not-a-Number” (or NaN for short).

The ability of the IEEE-754 standard to describe such special values (infinities and NaN) is one of its most powerful features, although often not appreciated by programmers. Infinity is extremely useful to properly handle functions with special values (such as the trigonometric tangent, or even division of a non-zero value by zero), whereas NaNs are useful to indicate that somewhere an invalid operation was attempted (such as dividing zero by zero, or taking the square root of a negative number).


Consider now the proverb “later means never”. The Italian proverb with the same meaning (that is, procrastination is often an excuse to not do things ever) is slightly different, and it takes a variety of forms («il poi è parente del mai», «poi è parente di mai», «poi poi è parente di mai mai») which basically translate to “later is a relative of never”.

What is interesting is that if we were to define “later” and “never” as “moments in time”, and assign numerical values to it, we could associate “later” with infinity (we are procrastinating, after all), while “never”, which cannot actually be a “moment in time” (it is never, after all) would be … not a number.

(Actually, it's also possible to consider “later” as being indefinite in time, and thus not a (specific) number, and “never” having an infinite value. Or to have both later and never be not numbers. But that's fine, it still works!)

So as it happens, both later and never can be represented in the IEEE-754 floating-point standard, and they share the special exponent that marks non-finite numbers.

Later, it would seem, is indeed a relative of never.

Warp shuffles, or why OpenCL should expose low-level interfaces

Since OpenCL 2.0, the OpenCL C device programming language includes a set of work-group parallel reduction and scan built-in functions. These functions allow developers to execute local reductions and scans for the most common operations (addition, minimum and maximum), and allow vendors to implement them very efficiently using hardware intrinsics that are not normally exposed in OpenCL C.

In this article I aim at challenging the idea that exposing such high-level functions, but not the lower-level intrinsics on which their efficient implementation might rely, results in lower flexibility and less efficient OpenCL programs, and is ultimately detrimental to the quality of the standard itself.

While the arguments I will propose will be focused specifically on the parallel reduction and scans offered by OpenCL C since OpenCL 2.0, the fundamental idea applies in a much more general context: it is more important for a language or library to provide the building blocks on which to build certain high-level features than to expose the high-level features themselves (hiding the underlying building blocks).

For example, the same kind of argument would apply to a language or library that aimed at providing support for Interval Analysis (IA). A fundamental computational aspect which is required for proper IA support is directed rounding: just exposing directed rounding would be enough to allow efficient (custom) implementations of IA, and also allow other numerical feats (as discussed here); conversely, while it's possible to provide support for IA without exposing the underlying required directed rounding features, doing so results in an inefficient, inflexible standard1.

The case against high-level reduction operations

To clarify, I'm not actually against the presence of high-level reduction and scan functions in OpenCL. They are definitely a very practical and useful set of functions, with the potential of very efficient implementations by vendors —in fact, more efficient than any programmer may achieve, not just because they can be tuned (by the vendor) for the specific hardware, but also because they can in fact be implemented making use of hardware capabilities that are not exposed in the standard nor via extensions.

The problem is that the set of available functions is very limited (and must be so), and as soon as a developer needs a reduction or scan function that is even slightly different from the ones offered by the language, it suddenly becomes impossible for such a reduction or scan to be implemented with the same efficiency of the built-in ones, simply because the underlying hardware capabilities (necessary for the optimal implementation) are not available to the developer.

Thrust and Kahan summation

Interesting enough, I've hit a similar issue while working on a different code base, which makes use of CUDA rather than OpenCL, and for which we rely on the thrust library for the most common reduction operations.

The thrust library is a C++ template library that provides efficient CUDA implementations of a variety of common parallel programming paradigms, and is flexible enough to allow such paradigms to make use of user-defined operators, allowing for example reductions and scans with operators other than summation, minimum and maximum. Despite this flexibility, however, even the thrust library cannot move (easily) beyond stateless reduction operators, so that, for example, one cannot trivially implement a parallel reduction with Kahan summation using only the high-level features offered by thrust.

Of course, this is not a problem per se, since ultimately thrust just compiles to plain CUDA code, and it is possible to write such code by hand, thus achieving a Kahan summation parallel reduction, as efficiently as the developer's prowess allows. (And since CUDA exposes most if not all hardware intrinsics, such a hand-made implementation can in fact be as efficient as possible on any given CUDA-capable hardware.)

Local parallel reductions in OpenCL 2.0

The situation in OpenCL is sadly much worse, and not so much due to the lack of a high-level library such as thrust (to which end one may consider the Bolt library instead), but because the language itself is missing the fundamental building blocks to produce the most efficient reductions: and while it does offer built-ins for the most common operations, anything beyond that must be implemented by hand, and cannot be implemented as efficiently as the hardware allows.

One could be led to think that (at least for something like my specific use case) it would be “sufficient” to provide more built-ins for a wider range of reduction operations, but such an approach would be completely missing the point: there will always be variations of reductions that are not provided by the language, and such a variation will always be inefficient.

Implementor laziness

There is also another point to consider, and it has to do with the sad state of the OpenCL ecosystem. Developers that want to use OpenCL for their software, be it in academia, gaming, medicine or any industry, must face the reality of the quality of existing OpenCL implementations. And while for custom solutions one can focus on a specific vendor, and in fact choose the one with the best implementations, software vendors have to deal with the idiosyncrasies of all OpenCL implementations, and the best they can expect is for their customers to be up to date with the latest drivers.

What this implies in this context is that developers cannot, in fact, rely on high-level functions being implemented efficiently, nor can they sit idle waiting for the vendors to provide more efficient implementations: more often than not, developers will find themselves working around the limitations of this and that implementation, rewriting code that should be reduced to one liners in order to provide custom, faster implementations.

This is already the case for some functions such as the asynchronous work-group memory copies (from/to global/local memory), which are dramatically inefficient on some vendor implementations, so that developers are more likely to write their own loading functions instead, which generally end up being just as efficient as the built-ins on the platforms where such built-ins are properly implemented, and much faster on the lazy platforms.

Therefore, can we actually expect vendors to really implement the work-group reduction and scan operations as efficiently as their hardware allows? I doubt it. However, while for the memory copies an efficient workaround was offered by simple loads, such a workaround is impossible in OpenCL 2.0, since the building blocks of the efficient work-group reductions are missing.

Warp shuffles: the work-group reduction building block

Before version 2.0 of the standard, OpenCL offered only one way to allow work-items within a work-group to exchange informations: local memory. The feature reflected the capability of GPUs when the standard was first proposed, and could be trivially emulated on other hardware by making use of global memory (generally resulting in a performance hit).

With version 2.0, OpenCL exposes a new set of functions that allow data exchange between work-items in a work-group, which doesn't (necessarily) depend on local memory: such functions are the work-group vote functions, and the work-group reduction and scan functions. These functions can be implemented via local memory, but most modern hardware can implement them using lower-level intrinsics that do not depend on local memory at all, or only depend on local memory in smaller amounts than would be needed by a hand-coded implementation.

On GPUs, work-groups are executed in what are called warps or wave-fronts, and most modern GPUs can in fact exchange data between work-items in the same warp using specific shuffle intrinsics (which have nothing to do with the OpenCL C shuffle function): these intrinsics allow work-items to access the private registers of other work-items in the same warp. While warps in the same work-group still have to communicate using local memory, a simple reduction algorithm can thus be implemented using warp shuffle instructions and only requiring one word of local memory per warp, rather than one per work-item, which can lead to better hardware utilization (e.g. by allowing more work-groups per compute unit thanks to the reduced use of local memory).

Warp shuffle instructions are available on NVIDIA GPUs with compute capability 3.0 or higher, as well as on AMD GPUs since Graphics Core Next. Additionally, vectorizing CPU platforms such as Intel's can trivially implement them in the form of vector component swizzling. Finally, all other hardware can still emulate them via local memory (which in turn might be inefficiently emulated via global memory, but still): and as inefficient as such an emulation might be, it still would scarcely be worse than hand-coded use of local memory (which would still be a fall-back option to available to developers).

In practice, this means that all OpenCL hardware can implement work-group shuffle instructions (some more efficiently than others), and parallel reductions of any kind could be implemented through work-group shuffles, achieving much better performance than standard local-memory reductions on hardware supporting work-group shuffles in hardware, while not being less efficient than local-memory reductions where shuffles would be emulated.

Conclusions

Finally, it should be obvious now that the choice of exposing work-group reduction and scan functions, but not work-group shuffle functions in OpenCL 2.0 results in a crippled standard:

  • it does not represent the actual capabilities of current massively parallel computational hardware, let alone the hardware we may expect in the future;
  • it effectively prevents efficient implementation of reductions and scans beyond the elementary ones (simple summation, minimum and maximum);
  • to top it all, we can scarcely expect such high-level functions to be implemented efficiently, making them effectively useless.

The obvious solution would be to provide work-group shuffle instructions at the language level. This could in fact be a core feature, since it can be supported on all hardware, just like local memory, and the device could be queries to determine if the instructions are supported in hardware or emulated (pretty much like devices can be queried to determine if local memory is physical or emulated).

Optionally, it would be nice to have some introspection to allow the developer to programmatically find the warp size (i.e. work-item concurrency granularity) used for the kernel2, and potentially improve on the use of the instructions by limiting the strides used in the shuffles.


  1. since IA intrinsically depends on directed rounding, even if support for IA was provided without explicitly exposing directed rounding, it would in fact still be possible to emulate directed rounding of scalar operations by operating on interval types and then discarding the unneeded parts of the computation; of course, this would be dramatically inefficient. ↩

  2. in practice, the existing CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE kernel property that can be programmatically queried corresponds already to the warp/wave-front size on GPUs, so there might be no need for another property if it could be guaranteed that this is the work-item dispatch granularity. ↩

Colorize your man

The terminal, as powerful as it might be, has a not undeserved fame of being boring. Boring white (or some other, fixed, color) on boring black (or some other, fixed, color) for everything. Yet displays nowadays are capable of showing millions of colors, and have been able to display at least four since the eighties at least. There's a resurgence of “colorization” options for the terminal, from the shell prompt to the multiplexers (screen, tmux), from the output of commands such as ls to the syntax highlighting options of editors and pagers. A lot of modern programs will even try to use colors in their output right from the start, making it easier to tell apart semantically different parts of it.

One of the last strongholds of the boring white-on-black (or conversely) terminal displays is man, the manual page reader. Man pages constitute the backbone of technical documentation in Unix-like systems, and range from the description of the syntax and behaviour of command-line programs to the details of system calls and programming interfaces of libraries, passing through a description of the syntax of configuration files, and whatever else one might feel like documenting for ease of access.

The problem is, man pages are boring. They usually have all the same structures, with sections that follow a common convention both in the naming and in the sequence (NAME, SYNOPSIS, DESCRIPTION, SEE ALSO, etc.), and they are all boringly black and white, with a sprinkle of bold and italics/underline.

There's to be said that bold and underline don't really “cut it” in console: undoubtedly, things would stand out more if colors were used to highlight relevant parts of the manual pages (headings, examples, code, etc) rather than simply bold and underline or italics.

Thanks to the power and flexibility of the pagers used to actually visualize those man pages, a number of people have come out with simple tricks that colorize pages by just reinterpreting bold/italics/underline commands as colors. In fact, there's a pager (most) that does this by default. Of course, most is otherwise inferior in many ways to the more common less pager, so there are solutions to do the same trick (color replacement) with less. Both solutions, as well as a number of other tricks based on the same principle, are pretty well documented in a number of places, and can be found summarized on the Arch wiki page on man pages.

I must say I'm not too big a fan of this approach: while it has the huge advantage of being very generic (in fact, maybe a little too generic), it has a hackish feeling, which had me look for a cleaner, lower level approach: making man itself (or rather the groff typesetter it uses) colorize its output.

Colorizing man pages with *roff

The approach I'm going to present will only work if man uses (a recent enough version of) groff that actually supports colors. Also, the approach is limited to specific markup. It can be extended, but doing so robustly is non-trivial.

We will essentially do three things:

  • tell groff to look for (additional) macros in specific directories;
  • override some typical man page markup to include colors;
  • tell groff (or rather grotty, the terminal post-processor of groff) to enable support for SGR escapes.

Extending the groff search path

By default, groff will look for macro packages in a lot of places, among which the user's home directory. Since cluttering home isn't nice, we will create a ~/groff directory and put out overrides in there, but we also need to tell groff to look there, which is done by setting the GROFF_TMAC_PATH environment variables. So I have in my ~/.profile the following lines:

GROFF_TMAC_PATH="$HOME/groff"
export GROFF_TMAC_PATH

(Remember to source ~/.profile if you want to test the benefits of your override in your live sessions.)

Overriding the man page markup

The groff macro package used to typeset man pages includes an arbitrary man.local file that can be used to override definitions. For example, in Debian this is used to do some character substitutions based on whether UTF-8 is enabled or not, and it's found under /etc/groff. We will write our own man.local, and place it under ~/groff instead, to override the markup we want to colorize.

Sadly, most of the markup in man pages is presentational rather than semantic: things are explicitly typeset in bold/italic/regular, rather than as parameter/option/code/whatever. There are a few exceptions, most notably the .SH command to typeset section headers. So in this example we will only override .SH to set section headers to green, leaving the rest of the man pages as-is.

Instead of re-defining .SH from scratch, we will simply expand it by adding stuff around the original definition. This can be achieved with the following lines (put them in your ~/groff/man.local):

.rn SH SHorg
.de SH
. gcolor green
. SHorg \\$*
. gcolor
..

The code above renames .SH to .SHorg, and then defines a new .SH command that:

  1. sets the color to green;
  2. calls .SHorg (i.e. the original .SH), passing all the arguments over to it;
  3. resets the color to whatever it was before.

The exact same approach can be used to colorize the second-level section header macro, .SS; just repeat the same code with a general replacement of H to S, and tune the color to your liking.

Another semantic markup that is rather easy to override, even though it's only rarely used in actual man pages (possibly because it's a GNU extension), is the .UR/.UE pair of commands to typeset URLs, and its counterpart .MT/.ME pair of commands to typeset email addresses. Both work by storing the email address as the variable \m1, so all we need is to override its definition before it's actually used, in the second element of the pair; for example, if we want to typeset both URLs and email addresses in cyan, we would use:

.rn ME MEorg
.de ME
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. MEorg \\$*
..

.rn UE UEorg
.de UE
. ds m1 \\m[cyan]\\*(m1\\m[]\"
. UEorg \\$*
..

(Keep in mind that I'm not a groff expert, so there might be better ways to achieve these overrides.)

Enabling SGR escapes

The more recent versions of grotty (the groff post-processor for terminal output) uses ANSI (SGR) escape codes for formatting, supporting both colors and emboldening, italicizing and underlining. On some distributions (Debian, for example), this is disabled by default, and must be enabled with some not-well-document method (e.g. exporting specific environment variables).

Since we already have various overrides in our ~/groff/man.local, we can restore the default behavior (enabling SGR escapes by default, unless the environment variable GROFF_NO_SGR is set) with the lines:

.if '\V[GROFF_NO_SGR]'' \
.   output x X tty: sgr 1

Of course, you should also make sure the pager used by man supports SGR escape sequences, for example by making your pager be less (which is likely to be the default already, if available) and telling it to interpret SGR sequences (e.g. by setting the environment variable LESS to include the -R option).

Limitations and future work

That's it. Now section headers, emails and URLs will come out typeset in color, provided man pages are written using semantic markup.

It is also possible to override the non-semantic markup that is used everywhere else, such as all the macros that combine or alternate B, I and R to mark options, parameters, arguments and types. This would definitely make pages more colored, but whether or not they will actually come out decently is all to be seen.

A much harder thing to achieve is the override of commands that explicitly set the font (e.g. *roff escape sequences such as \fB, often used inline in the code). But at this point the question becomes: is it worth the effort?

Wouldn't it be better to start a work on the cleanup and extension of the man macro package for groff to include (and use!) more semantic markup, with built-in colorization support?

Bonus track: italics

If your terminal emulator truly supports italics (honestly, a lot of modern terminals do, except possibly the non-graphical consoles), you can configure grotty to output instructions for italics instead of the usual behavior of replacing italics with underline. This is achieved by passing the -i option to grotty. Since grotty is rarely (if ever) called directly, one would usually pass -P-i option to groff.

This can be achieved in man by editing your ~/.manpath file and adding the following two lines:

DEFINE  troff   groff -mandoc -P-i
DEFINE  nroff   groff -mandoc -P-i

And voilà, italicized rather than underlined italics.

R.I.P. Opera

Opera is dead. I decided to give it some time, see how things developed (my first article on the topic was from over two years ago, and the more recent one about the switch to Blink was from February last year), but it's quite obvious that the Opera browser some of us knew and loved is dead for good.

For me, the finishing blow was a comment from an Opera employee, in response to my complaints about the regressions in standard support when Opera switched from Presto to Blink as rendering engine:

You may have lost a handful of things, but on the other hand you have gained a lot of other things that are not in Presto. The things you have gained are likely more useful in more situations as well.

This was shotly after followed by:

Whether you want to use Opera or not is entirely up to you. I merely pointed out that for the lost standards support, you are gaining a lot of other things (and those things are likely to be more useful in most cases).

Other things? What other things?

But it gets even better. I'm obviously not the only one complainig about the direction the new Opera has taken. One Patata Johnson comments:

There used to be a time when Opera fought for open standards and against Microsoft's monopol with it's IE. Am I the only one who us concerned about their new path? Today Google / Chrome became the new IE, using own standards and not carrying about open web standards that much.

The reply?

Opera is in a much better position to promote open standards with Blink than with Presto. It's kind of hard to influence the world when the engine is basically being ignored.

Really? How does being another skin over Blink help promote open standards? It helps promote Blink experimental features, lack of standard compliance, and buggy implementation of the standards it does support. That does as much to promote open standards as the Trident skins did during the 90s browser wars.

As small as Opera's market share was before the switch, its rendering engine was independent and precisely because of that it could be used to push the others into actually fixing their bugs and supporting the standards. It might have been ignored by the run-of-the-mill web developers, but it was actually useful in promoting standard compliance by being a benchmark against which other rendering engines were compared. The first thing that gets asked when someone reports a rendering issue is: how does it behave in the other rendering engines? If there are no other rendering engines, bugs in the dominant one become the de facto standard, against the open standard of the specification.

With the switch to Blink, Opera has even lost that role. As minor a voice as it might have been, it has now gone completely silent.

And let's be serious: the rendering engine it uses might not be ignored now (it's not their own, anyway), but I doubt that Opera has actually gained anything in terms of user base, and thus weight. If anything, I'm seeing quite a few former supporters switching away. Honestly, I suspect Opera's survival is much more in danger now than it was before the switch.

The truth is, the new Opera stands for nothing that the old Opera stood for: the old Opera stood for open standards, compliance, and a feature-rich, highly-customizable Internet suite. The new one is anything but that.

At the very least, for people that miss the other qualities that made Opera worthwile (among which the complete, highly customizable user interface, and the quite complete Internet suite capabilities, including mail, news, RSS, IRC and BitTorrent suport) there's now the open-source Otter browser coming along. It's still WebKit-based), so it won't really help break the development of a web monoculture, but it will at least offer a more reliable fallback to those loving the old Opera and looking for an alterntive to switch to from the new one.

For my part, I will keep using the latest available Presto version of Opera for as long as possible. In the mean time, Firefox has shown to have most complete support for current open standards, so it's likely to become my next browser of choice. I will miss Opera's UI, but maybe Otter will also support Gecko as rendering engine, and I might be able to get the best of both world.

We'll see.

Amethyst: essential statistics in Ruby

Get the code for
Amethyst, a Ruby library/script for essential statistics:

GitHub
amethyst

Amethyst is a small Ruby library/script to extract some essential statistics (mean, median, mode, midpoint and range, quartiles) from series of (numerical) data.

While it can be used as a library from other Ruby programs, its possibly most interesting use is as a command line filter: it can read the data series to be analyzed from its standard input (one datum per line), and it produces the relevant statistics on its standard output. Typical usage would be something like:

$ produce_some_data | amethyst

For example, statistics on the number of lines of the source files for one of my ongoing creative works at the time of writing:

$ wc -l oppure/ganka/ganka[0-9]* | head -n -1 | amethyst
# count: 43
# min: 48
# max: 274
# mid: 161
# range: 226

# mean: 102.3953488372093
# stddev: 42.266122304343874

# mode(s): 48 51 59 75 79 86 93 102 104

# median: 97
# quartiles: 79 97 110
# IQR: 31

When acting as a filter, Amethyst will check if its standard output has been redirected/piped to another program, in which case, by default, it will also produce commands that can be fed to gnuplot to produce a visual representation of the distribution of the dataset, including a histogram and a box plot:

$ produce_some_data | amethyst | gnuplot -p

Command line options such as --[no-]histogram and --[no-]boxplot can be used to override the default choices on what to plot (if anything), and options such as --dumb can be used to let gnuplot output a textual approximation of the plot(s) on the terminal itself.

Integer math and computers

One would assume that doing integer math with computers would be easy. After all, integer math is, in some sense, the “simplest” form of math: as Kronecker said:

Die ganzen Zahlen hat der liebe Gott gemacht, alles andere is Menschenwerk

The dear God has made integers, everything else is the work of man

While in practice this is (almost) always the case, introducing the extent (and particularly the limitations) to which integer math is easy (or, in fact, ‘doable’) is the first necessary step to understanding, later in this series of articles, some of the limitations of fixed-point math.

We start from the basics: since we are assuming a binary computer, we know that n bits can represent 2^n distinct values. So an 8-bit byte can represent 256 distinct values, a 16-bit word can represent 65536 distinct values, a 32-bit word can represent 4,294,967,296, and a 64-bit word a whooping 18,446,744,073,709,551,616, over 18 (short) trillion. Of course the question now is: which ones?

Representation of unsigned integers

Let's consider a standard 8-bit byte. The most obvious and natural interpretation of a byte (i.e. 8 consecutive bits) is to interpret it as a (non-negative, or unsigned) integer, just like we would interpret a sequence of consecutive (decimal) digits. So binary 00000000 would be 0, binary 00000001 would be (decimal) 1, binary 00000010 would be (decimal) 2 binary 00000011 would be (decimal) 3 and so on, up to (binary) 11111111 which would be (decimal) 255. From 0 to 255 inclusive, that's exactly the 256 values that can be represented by a byte (read as an unsigned integer).

Unsigned integers can be trivially promoted to wider words (e.g. from 8-bit byte to 16-bit word, preserving the numerical value) by padding with zeroes.

This is so simple that it's practically boring. Why are we even going through this? Because things are not that simple once you move beyond unsigned integers. But before we do that, I would like to point out that things aren't that simple even if we're just sticking to non-negative integers. In terms of representation of the numbers, we're pretty cozy: n bits can represent all non-negative integers from 0 to 2^n-1, but what happens when you start doing actual math on them?

Modulo and saturation

Let's stick to just addition and multiplication at first, which are the simplest and best defined operations on integers. Of course, the trouble is that if you are adding or multiplying two numbers between 0 and 255, the result might be bigger than 255. For example, you might need to do 100 + 200, or 128*2, or even just 255+1, and the result is not representable in an 8-bit byte. In general, if you are operating on n-bits numbers, the result might not be representable in n bits.

So what does the computer do when this kind of overflow happens? Most programmers will now chime in and say: well duh, it wraps! If you're doing 255+1, you will just get 0 as a result. If you're doing 128*2, you'll just get 0. If you're doing 100+200 you'll just get 44.

While this answer is not wrong, it's not right either.

Yes, it's true that the most common central processing units we're used to nowadays use modular arithmetic, so that operations that would overflow n-bits words are simply computed modulo 2^n (which is easy to implement, since it just means discarding higher bits, optionally using some specific flag to denote that a carry got lost along the way).

However, this is not the only possibility. For example, specialized DSP (Digital Signal Processing) hardware normally operates with saturation arithmetic: overflowing values are clamped to the maximum representable value. 255+1 gives 255. 128*2 gives 255. 100+200 gives 255.

Programmers used to the standard modular arithmetic can find saturation arithmetic ‘odd’ or ‘irrational’ or ‘misbehaving’. In particular, in saturation arithmetic (algebraic) addition is not associative, and multiplication does not distribute over (algebraic) addition.

Sticking to our 8-bit case, for example, with saturation arithmetic (100 + 200) - 100 results in 255 - 100 = 155, while 100 + (200 - 100) results in 100 + 100 = 200, which is the correct result. Similarly, still with saturation arithmetic, (200*2) - (100*2) results in 255 - 200 = 55, while (200 - 100)*2 results in 100*2 = 200. By contrast, with modular arithmetic, both expressions in each case give the correct result.

So, when the final result is representable, modular arithmetic gives the correct result in the case of a static sequence of operations. However, when the final result is not representable, saturation arithmetic returns values that are closer to the correct one than modular arithmetic: 300 is clamped to 255, in contrast to the severely underestimated 44.

Being as close as possible to the correct results is an extremely important property not just for the final result, but also for intermediate results, particularly in the cases where the sequence of operations is not static, but depends on the magnitude of the values (for example, software implementations of low- or high-pass filters).

In these applications (of which DSP, be it audio, video or image processing, is probably the most important one) both modular and saturation arithmetic might give the wrong result, but the modular result will usually be significantly worse than that obtained by saturation. For example, modular arithmetic might miscompute a frequency of 300Hz as 44Hz instead of 255Hz, and with a threshold of 100Hz this would lead to attenuation of a signal that should have passed unchanged, or conversely. Amplifying an audio signal beyond the representable values could result in silence with modular arithmetic, but it will just produce the loudest possible sound with saturation.

We mentioned that promotion of unsigned values to wider data types is trivial. What about demotion? For example, knowing that your original values are stored as 8-bit bytes and that the final result has to be again stored as an 8-bit byte, a programmer might consider operating with 16-bit (or wider) words to (try and) prevent overflow during computations. However, when the final result has to be demoted again to an 8-bit byte, a choice has to be made, again: should we just discard the higher bits (which is what modular arithmetic does), or return the highest representable value when any higher bits are set (which is what saturation arithmetic does)? Again, this is a choice for which there is no “correct” answer, but only answers that depend on the application.

To conclude, the behavior that programmers used to standard modular arithmetic might find ‘wrong’ is actually preferable in some applications (which is why it is has been supported in hardware in the multimedia and vector extensions (MMX and onwards) of the x86 architecture).

Thou shalt not overflow

Of course, the real problem in the examples presented in the previous section is that the data type used (e.g. 8-bit unsigned integers) was unable to represent intermediate or final results.

One of the most important things programmers should consider, maybe the most important, when discussing doing math on the computer, is precisely choosing the correct data type.

For integers, this means choosing a data type that can represent correctly not only the starting values and the final results, but also the intermediate values. If your data fits in 8 bits, then you want to use at least 16 bits. If it fits in 16 bits (but not 8), then you want to use at least 32, and so on.

Having a good understanding of the possible behaviors in case of overflow is extremely important to write robust code, but the main point is that you should not overflow.

Relative numbers: welcome to hell

In case you are still of the opinion that integer math is easy, don't worry. We still haven't gotten into the best part, which is how to deal with relative numbers, or, as the layman would call them, signed integers.

As we mentioned above, the ‘natural’ interpretation of n bits is to read them as natural, non-negative, unsigned integers, ranging from 0 to 2^n-1. However, let's be honest here, non-negative integers are pretty limiting. We would at least like to have the possibility to also specify negative numbers. And here the fun starts.

Although there is no official universal standard for the representation of relative numbers (signed integers) on computers, there is undoubtedly a dominating convention, which is the one programmers are nowadays used to: two's complement. However, this is just one of many (no less than four) possible representations:

  • sign bit and mantissa;
  • ones' complement;
  • two's complement;
  • offset binary aka biased representation.

Symmetry, zeroes and self-negatives

One of the issues with the representation of signed integers in binary computers is that binary words can always represent an even number of values, but a symmetrical amount of positive and negative integers, plus the value 0, is odd. Hence, when choosing the representation, one has to choose between either:

  • having one (usually negative) non-zero number with no representable opposite, or
  • having two representations of the value zero (essentially, positive and negative zero).

Of the four signed number representations enumerated above, the sign bit and ones' complement representations have a signed zero, but each non-zero number has a representable opposite, while two's complement and bias only have one value for zero, but have at least one non-zero number that has no representable opposite. (Offset binary is actually very generic and can have significant asymmetries in the ranges of representable numbers.)

Having a negative zero

The biggest issue with having a negative zero is that it violates a commonly held assumption, which is that there is a bijective correspondence between representable numerical values and their representation, since both positive and negative 0 have the same numerical value (0) but have distinct bit patterns.

Where this presents the biggest issue is in the comparison of two words. When comparing words for equality, we are now posed a conundrum: should they be compared by their value, or should they be compared by their representation? If a = -0, would a satisfy a == 0? Would it satisfy a < 0? Would it satisfy both? The obvious answer would be that +0 and -0 should compare equal (and just that), but how do you tell them apart then? Is it even worth it being able to tell them apart?

And finally, is the symmetry worth the lost of a representable value? (2^n bit patterns, but two of them have the same value, so e.g. with 8-bit bytes we have 256 patterns to represent 255 values instead of the usual 256.)

Having non-symmetric opposites

On the other hand, if we want to keep the bijectivity between value and representation, we will lose the symmetry of negation. This means, in particular, that knowing that a number a satisfies a < 0 we cannot deduce that -a > 0, or conversely, depending on whether the value with no opposite is positive or negative.

Consider for example the case of the standard two's complement representation in the case of 8-bit bytes: the largest representable positive value is 127, while the largest (in magnitude) representable negative value is -128. When computing opposites, all values between -127 and 127 have their opposite (which is the one we would expect algebraically), but negating -128 gives (again) -128 which, while algebraically wrong, is at least consistent with modular arithmetic, where adding -128 and -128 actually gives 0.

A brief exposition of the representations

Let's now see the representations in some more detail.

Sign bit and mantissa representation

The conceptually simplest approach to represent signed integers, given a fixed number of digits, is to reserve one bit to indicate the sign, and leave the other n-1 bits to indicate the mantissa i.e magnitude i.e. absolute value of the number. By convention, the sign bit is usually taken to be the most significant bit, and (again by convention) it is taken as 0 to indicate a positive number and 1 to indicate a negative number.

With this representations, two opposite values have the same representation except for the most significant bit. So, for example, assuming our usual 8-bit byte, 1 would be represented as 00000001, while -1 would be represented as 10000001.

In this representation, the highest positive value that can be represented with n bits is 2^{n-1} - 1, and the lowest (largest in magnitude) negative value that can be represented is its opposite. For example, with an 8-bit byte the largest positive integer is 127, i.e. 01111111, and the largest (in magnitude) negative integer is its opposite -127, i.e. 11111111.

As mentioned, one of the undersides of this representation is that it has both positive and negative zero, respectively represented by the 00000000 and 10000000 bit patterns.

While the sign bit and mantissa representation is conceptually obvious, its hardware implementation is more cumbersome that it might seem at first hand, since operations need to explicitly take the operands' signs into account. Similarly, sign-extension (for example, promoting an 8-bit byte to a 16-bit word preserving the numerical value) needs to ‘clear up’ the sign bit in the smaller-size representation before replicating it as the sign bit of the larger-size representation.

Ones' complement representation

A more efficient approach is offered by ones' complement representation, where negation maps to ones' complement, i.e. bit-flipping: the opposite of any given number is obtained as the bitwise NOT operation of the representation of the original value. For example, with 8-bit bytes, the value 1 is as usual represented as 00000001, while -1 is represented as 11111110.

The range of representable numbers is the same as in the sign bit and mantissa representation, so that, for example, 8-bit bytes range from -127 (10000000) to 127 (01111111), and we have both positive zero (00000000) and negative zero (11111111).

(Algebraic) addition in modular arithmetic with this representation is trivial to implement in hardware, with the only caveat that carries and borrows ‘wrap around’.

As in the sign-bit case, it is possible to tell if a number is positive or negative by looking at the most-significant bit, and 0 indicates a positive number, while 1 indicates a negative number (whose absolute value can then be obtained by flipping all the bits). Sign-extending a value can be done by simply propagating the sign bit of the smaller-size representation to all the additional bits in the larger-size representation.

Two's complement

While ones' complement representation is practical and relatively easy to implement in hardware, it is not the simplest, and it's afflicted by the infamous ‘negative zero’ issue.

Because of this, two's complement representation, which is simpler to implement and has no negative zero, has gained much wider adoption. It also has the benefit of ‘integrating’ rather well with the equally common modular arithmetic.

In two's complement representation, the opposite of an n-bit value is obtained by subtracting it from 2^n, or, equivalently, from flipping the bits and then adding 1, discarding any carries beyond the n-th bit. Using our usual 8-bit bytes as example, 1 will as usual be 00000001, while -1 will be 11111111.

The largest positive representable number with n bits is still 2^{n-1}-1, but the largest (in magnitude) negative representable number is now -2^{n-1}, and it's represented by a high-bit set to 1 and all other bits set to 0. For example, with 8-bit bytes the largest positive number is 127, represented by 01111111, whose opposite -127 is represented by 10000001, while the largest (in magnitude) negative number is -128, represented by 10000000.

In two's complement representation, there is no negative zero and the only representation for 0 is given by all bits set to 0. However, as discussed earlier, this leads to a negative value whose opposite is the value itself, since the representation of largest (in magnitude) negative representable number is invariant by two's complement.

As in the other two representations, the most significant bit can be checked to see if a number is positive and negative. As in ones' complement case, sign-extension is done trivially by propagating the sign bit of the smaller-size value to all other bits of the larger-size value.

Offset binary

Offset binary (or biased representation) is quite different from the other representations, but it has some very useful properties that have led to its adoption in a number of schemes (most notably the IEEE-754 standard for floating-point representation, where it's used to encode the exponent, and some DSP systems).

Before getting into the technical details of offset binary, we look at a possible motivation for its inception. The attentive reader will have noticed that all the previously mentioned representations of signed integers have one interesting property in common: they violate the natural ordering of the representations.

Since the most significant bit is taken as the sign bit, and negative numbers have a most significant bit set to one, natural ordering (by bit patterns) puts them after the positive numbers, whose most significant bit is set to 0. Additionally, in the sign bit and mantissa representation, the ordering of negative numbers is reversed with respect to the natural ordering of their representation. This means that when comparing numbers it is important to know if they are signed or unsigned (and if signed, which representation) to get the ordering right. The biased representation is one way (and probably the most straightforward way) to circumvent this.

The basic idea in biased representation or offset binary is to ‘shift’ the numerical value of all representations by a given amount (the bias or offset), so that the smallest natural representation (all bits 0) actually evaluates to the smallest representable number, and the largest natural representation (all bits 1) evaluates to the largest representable number.

The bias is the value that is added to the (representable) value to obtain the representation, and subtracted from the representation to obtain the represented value. The minimum representable number is then the opposite of the bias. Of course, the range of representable numbers doesn't change: if your data type can only represent 256 values, you can only choose which 256 values, as long as they are consecutive integers.

The bias in an offset binary representation can be chosen arbitrarily, but there is a ‘natural’ choice for n-bit words, which is 2^{n-1}: halfway through the natural representation. For example, with 8-bit bytes (256 values) the natural choice for the bias is 128, leading to a representable range of integers from -128 to 127, which looks distinctly similar to the one that can be expressed in two's complement representation.

In fact, the 2^{n-1} bias leads to a representation which is equivalent to the two's complement representation, except for a flipped sign bit, solving the famous signed versus unsigned comparison issue mentioned at the beginning of this subsection.

As an example, consider the usual 8-bit bytes with a bias of 128: then, the numerical values 1, 0 and -1 would be represented by the ‘natural’ representation of the values 129, 128 and 127 respectively, i.e. 10000001, 10000000 and 01111111: flipping the most significant bits, we get 00000001, 00000000 and 11111111 which are the two's complement representation of 1, 0 and -1.

Of course, the ‘natural’ bias is not the only option: it is possible to have arbitrary offsets, which makes offset binary extremely useful in applications where the range of possible values is strongly asymmetrical around zero, or where it is far from zero. Of course, such arbitrary biases are rarely supported in hardware, so operation on offset binary usually requires software implementations of even the most common operations, with a consequent performance hit. Still, assuming the hardware uses modular arithmetic, offset binary is at least trivial to implement for the basic operations.

One situation in which offset binary doesn't play particularly well is that of sign-extension, which was trivial in ones' and two's complement represnetations. The biggest issue in the case of offset binary is, obviously, that the offsets in the smaller and larger data types are likely going to be different, although usually not arbitrarily different (biases are often related to the size of the data type).

At least in the case of the ‘natural’ bias (in both the smaller and larger data types), sign extension can be implemented straightforwardly by going through the two's complement equivalent representation: flip the most significant bit of the smaller data type, propagate it to all the remaining bits of the larger data type, and then flip the most significant bit of the larger data type. (In other words: convert to two's complement, sign extend that, convert back to offset binary with the ‘natural’ bias.)

What does a bit pattern mean?

We're now nearing the end of our discussion on integer math on the computers. Before getting into the messy details of the first common non-integer operation (division), I would like to ask the following question: what do you get if you do 10100101 + 01111111?

Divide and despair

To conclude our exposition of the joys of integer math on the computers, we now discuss the beauty of integer division and the related modulus operation.

Since division of the integer e by the integer o only gives an integer (mathematically) if e is a multiple of o, the concept of ‘integer division’ has arised in computer science as a way to obtain an integer d from e/o even when o does not divide e.

The simple case

Let's start by assuming that e is non-negative and o is (strictly) positive. In this case, integer division gives the largest integer d such that d*o ≤ e. In other words, the result of the division of e by o is truncated, or ‘approximated by defect’, however small the remainder might be: 3/5=0 and 5/3=1 with integer division, even though in the latter case we would likely have preferred a value of 2 (think of 2047/1024, for example).

The upside of this choice is that it's trivial to implement other forms of division (that round up, or to the nearest number, for example), by simply adding appropriate correcting factors to the dividend. For example, round-up division is achieved by adding the divisor diminished by a unit to the divident: integer divisoin (e + o - 1)/o will give you e/o, rounded up: (3+5-1)/5 = 7/5 = 1, and (5 + 3 - 1)/3 = 7/3 = 2.

Division by zero

What happens when o is zero? Mathematically, division by zero is not defined (although in some context where infinity is considered a valid value, it may give infinity as a result —as long as the dividend is non-zero). In hardware, anything can happen.

There's hardware that flags the error. There's hardware that produces bogus results without any chance of knowing that a division by zero happened. There's hardware that produces consistent results (always zero, or the maximum representable value), flagging or not flagging the situation.

‘Luckily’, most programming languages always treat a division by zero as an exception, which by default causes a program termination. Of course, this means that to write robust code it's necessary to sprinkle the code with conditionals to check that divisions will successfully complete.

Negative numbers

If the undefined division by zero may not be considered a big issue per se, the situation is much more interesting when either of the operands of the division is a negative number.

First of all, one would be led to think that at least the sign of the result would be well defined: negative if the operands have opposite sign, positive otherwise. But this is not the case for the widespread two's complement representation with modular arithmetic, where the division of two negative numbers can give a negative number: of course, we're talking about the corner case of the largest (in magnitude) negative number, which when divided by -1 returns itself, since its opposite is not representable.

But even when the sign is correct, the result of integer division is not uniquely determined: some implementations round down, so that -7/5 = -2, while others round towards zero, so that -7/5 = -1: both the choices are consistent with the positive integer division, but the results are obviously different, which can introduce subtle but annoying bugs when porting code across different languages or hardware.

Modulo

The modulo operation is perfectly well defined for positive integers, as the reminder of (integer) division: the quotient d and the reminder r of (integer) division e/o are (non-negative) integers such that e = o*d + r and r < o.

Does the same hold true when either e or o are negative? It depends on the convention adopted by the language and/or hardware. While for negative integer division there are ‘only’ two standards, for the modulo operation there are three:

  • a result with the sign of the dividend;
  • a result with the sign of the divisor;
  • a result that is always non-negative.

In the first two cases, what it means is that, for example, -3 % 5 will have the opposite sign of 3 % -5; hence, if one would satisfy the quotient/reminder equation (which depends on whether integer division rounds down or towards zero), the other obviously won't. In the third case, the equation would only be satisfied if the division rounds down, but not if the division rounds towards zero.

This could lead someone to think that the best choice would be a rounding-down division with an always non-negative modulo. Too bad that rounding-down division suffers from the problem that -(e/o) ≠ (-e)/o.

Summary

Integer math on a computer is simple only as far as you never think about dealing with corner cases, which you should if you want to write robust, reliable code. With integer math, this is the minimum of what you should be aware of:

Rounding modes in OpenCL

Introduction (history lost)

OpenCL 1.0 supported an OPENCL SELECT_ROUNDING_MODE pragma in device code, which allowed selection of the rounding mode to be used in a section of a kernel. The pragma was only available after enabling the cl_khr_select_fprounding_mode extension. Support for this extension and the relative pragma(s) has been removed from subsequent version of the standard, with the result that there is no way at all in the current OpenCL standard to have specific parts of a kernel use rounding modes different from the default, except in the explicit type conversion functions with the relevant _rt* suffix.

A consequence of this is that it is currently completely impossible to implement robust numerical code in OpenCL.

In what follows I will explore some typical use cases where directed rounding is a powerful, sometimes essential tool for numerical analysis and scientific computing. This will be followed by a short survey of existing hardware and software support for directed rounding. The article ends with a discussion about what must, and what should, be included in OpenCL to ensure it can be used as a robust scientific programming language.

Why directed rounding is important

Rationale #1: assessing numerical trustworthiness of code

In his paper How Futile are Mindless Assessments of Roundoff in Floating-Point Computation, professor William Kahan (who helped design the IEEE-754 floating-point standard) explains that, given multiple formulas that would compute the same quantity, the fastest way to determine which formulas are numerically trustworthy is to:

Rerun each formula separately on its same input but with different directed roundings; the first one to exhibit hypersensitivity to roundoff is the first to suspect.

Further along in the same paper, Kahan adds (emphasis mine):

The goal of error-analysis is not to find errors but to fix them. They have to be found first. The embarrassing longevity, over three decades, of inaccurate and/ or ugly programs to compute a function so widely used as ∠(X, Y) says something bleak about the difficulty of floating-point error-analysis for experts and nonexperts: Without adequate aids like redirected roundings, diagnosis and cure are becoming practically impossible. Our failure to find errors long suspected or known to exist is too demoralizing. We may just give up.

Essential tools for the error-analysis of scientific computing code cannot be implemented in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the impossibility of specifying the rounding direction.

Rationale #2: enforcing numerical correctness

Directed rounding is an important tool to ensure that arguments to functions with limited domain are computed in such a way that the conditions are respected numerically when they would be analytically. To clarify, in this section I'm talking about correctly rounding the argument of a function, not its result.

When the argument to such a function is computed through an expression (particularly if such an expression is ill-conditioned) whose result is close to one of the limits of the domain, the lack of correct rounding can cause the argument to be evaluated just outside of the domain instead of just inside (which would be the analytically correct answer). This would cause the result of the function to be Not-a-Number instead of the correct(ly rounded) answer.

Common functions for which the requirements might fail to be satisfied numerically include:

sqrt

when the argument would be a small, non-negative number; to write numerically robust code one would want the argument to sqrt be computed such that the final result is towards plus infinity;

inverse trigonometric functions (asin, acos, etc)

when the argument would be close to, but not greater than 1, or close to, but not less than -1; again, to write numerically robust code one would want the argument to be computed such that the final result is rounded towards zero.

A discussion on the importance of correct rounding can again be found in Kahan's works, see e.g. Why we needed a floating-point standard.

Robust coding of analytically correct formulas is impossible to achieve in OpenCL 1.1 or later (at least up to 2.0, the latest published specification) due to the lack of support for directed rounding.

Rationale #3: Interval Analysis

A typical example of a numerical method for which support for directed rounding rounding modes in different parts of the computation is needed is Interval Analysis (IA). Similar arguments hold for other forms of self-verified computing as well.

Briefly, in IA every (scalar) quantity q is represented by an interval whose extrema are (representable) real numbers l, u such that ‘the true value’ of q is guaranteed to satisfy l ≤ q ≤ u.

Operations on two intervals A = [al, au] and B = [bl, bu] must be conducted in such a way that the resulting interval can preserve this guarantee, and this in turn means that the lower extremum must be computed in rtn (round towards negative infinity) mode, while the upper extremum must be computed in rtp (round towards positive infinity) mode.

For example, assuming add_rtn and add_rtp represent additions that rounds in the suffix direction, we have that C = A + B could be computed as:

cl = add_rtn(al, bl);
cu = add_rtp(au, bu);

In OpenCL 1.0, add_rtn and add_rtp could be defined as:

#pragma OPENCL SELECT_ROUNDING_MODE rtn
gentype add_rtn(gentype a, gentype b) {
 return a + b;
}
#pragma OPENCL SELECT_ROUNDING_MODE rtp
gentype add_rtp(gentype a, gentype b) {
 return a + b;
}
/* restore default */
#pragma OPENCL SELECT_ROUNDING_MODE rte

The same functions could be implemented in C99, in FORTRAN, in MATLAB or even in CUDA (see below). In OpenCL 1.1 and later, this is impossible to achieve, even on hardware that supports rounding mode selection.

Applicative examples

From the rationales presented so far, one could deduce that directed rounding is essentially associated with the stability and robustness of numerical code. There are however other cases where directed rounding can be used, which are not explicitly associated with things such as roundoff errors and error bound estimation.

Rounding down for the neighbors list construction in particle methods

Consider for example an industrial application of mesh-less Lagrangian such as Smoothed Particle Hydrodynamics (SPH).

In these numerical methods, the simulation domain is described by means of ‘particles’ free to move with respect to each other. The motion of these particles is typically determined by the interaction between the particle and its neighbors within a given influence sphere.

Checking for proximity between two particles is done by computing the length of the relative distance vector (differences of positions), and the same distance is often used in the actual computation of the influence between particles. As usual, to avoid bias, both the relative distance vector and its length should be computed with the default round-to-nearest-even rounding mode for normal operations.

To avoid searching for neighbors in the whole domain for every operation, implementations often keep a ‘neighbors list’ of each particle, constructed by checking the proximity of candidate particles once, and storing the indices of the particles that fall within the prescribed influence radius.

Due to the mesh-less nature of the method, neighborhoods may change at every time-step, requiring a rebuild of the neighbors list. To improve performance, this can be avoided by rebuilding the neighbors list at a lower frequency (e.g. every 10 time-steps), assuming (only in this phase) a larger influence radius, taking into account the maximum length that might be traveled by a particle in the given number of time-steps.

When such a strategy is adopted, neighbors need to be re-checked for actual proximity during normal operations, so that, for maximum efficiency, a delicate balance must be found between the reduced frequency and the increased number of potential neighbors caused by the enlarged influence radius.

One way to improve efficiency in this sense is to round towards zero the computation of the relative distance vector and its length during neighbors list construction: this maximizes the impact of the enlarged influence radius by including potential neighbors which are within one or two ULPs. This allows the use of very tight bounds on how much to enlarge the influence radius, without loss of correctness in the simulations.

Directed rounding support

Hardware support for directed rounding

x86-compatible CPUs have had support for setting the rounding mode by setting the appropriate flags in the control registers (either the x87 control word for FPU, or the MXCSR control register for SSE). Similarly, on ARM CPUs with support for the NEON or VFP instruction set, the rounding mode can be set with appropriate flags in the FPCSR

AMD GPUs also have support for rounding modes selection, with the granularity of an ALU clause. As documented in the corresponding reference manuals, the TeraScale 2 and TeraScale 3 architectures support setting the general rounding mode for ALU clauses via the SET_MODE instruction; Graphics Core Next (GCN) architectures can control the rounding mode by setting the appropriate bits in the MODE register via the S_SETREG instruction.

Additionally, the following hardware is capable of directed rounding at the instruction level:

CUDA-enabled NVIDIA GPUs

as documented in the CUDA C Programming Guide, Appendix D.2 (Intrinsic Functions), some intrinsic functions can be suffixed with one of _rn,_rz,_ru,_rd to explicitly set the rounding mode of the function;

CPUs with support for the AVX-512 instruction set

the EVEX prefix introduced with AVX-512 supports the rounding mode to be set explicitly for any given instruction, overriding the MXCSR control register, as documented in the Intel® Architecture Instruction Set Extensions Programming Reference, section 4.6.2: “Static Rounding Support in EVEX”.

Software support for directed rounding

At the software level, support for the rounding mode at the processor level can be accessed in C99 and C++11 by enabling the STDC FENV_ACCESS pragma and using fesetenv() (and its counterpart fegetenv()).

In MATLAB, the rounding mode can be selected by the system_dependent('setround', ·) command.

Some FORTRAN implementations also offer functions to get and set the current rounding mode (e.g. IBM's XL FORTRAN offers fpgets and fpsets).

CUDA C exposes the intrinsic functions of CUDA-enabled GPUs that support explicit rounding modes. So, for example, __add_ru(a, b) (resp. __add_rd(a, b)) can be used in CUDA C to obtain the sum of a and b rounded up (resp. down) without having to change the rounding mode of the whole GPU.

Even the GNU implementation of the text-processing language Awk has a method to set the rounding mode in floating-point operations, via the ROUNDMODE variable.

All in all, OpenCL (since 1.1 on) seems to be the only language/API to not support directed rounding.

What can be done for OpenCL

In its present state, OpenCL 1.1 to 2.0 are lagging behind C99, C++11, FORTRAN, MATLAB and CUDA (at the very least) by lacking support for directed rounding. This effectively prevents robust numerical code to be implemented and analyzed in OpenCL.

While I can understand that core support for directed rounding in OpenCL is a bit of a stretch, considering the wide range of hardware that support the specification, I believe that the standard should provide an official extension to (re)introduce support for it. This could be done by re-instating the cl_khr_select_fprounding_mode extension, or through a different extension with better semantics (for example, modelled around the C99/C++11 STDC FENV_ACCESS pragma).

This is the minimum requirement to bring OpenCL C on par with C and C++ as a language for scientific computing.

Ideally (potentially through a different extension), it would be nice to also have explicit support for instruction-level rounding mode selection independently from the current rounding mode, with intrinsics similar to the ones that OpenCL defines already for the conversion functions. On supporting hardware, this would make it possible to implement even more efficient, yet still robust numerical code needing different rounding modes for separate subexpression.

Granularity

When it comes to the OpenCL programming model, it's important to specify the scope of application of state changes, of which the rounding mode is one. Given the use cases discussed above, we could say that the minimum requirement would be for OpenCL to support changing the rounding mode during kernel execution, and for the whole launch grid to a value known at (kernel) compile time.

So, it should be possible (when the appropriate extension is supported and enabled) to change rounding mode half-way through a kernel. The new:

kernel some_kern(...) {
    /* kernels start in some default rounding mode,
     * e.g. round-to-nearest-even. We do some calculations
     * in this default mode:
     */
    do_something ;

    /* now we change to some other mode. of course this
     * is just pseudo-syntax:
     */
    change_rounding_mode(towards_zero);

    /* and now we do more calculations, this time
     * with the new rounding mode enabled:
     */
    do_something_else;
}

The minimum supported granularity would thus be the whole launch grid, as long as the rounding mode can be changed dynamically during kernel execution, to (any) value known at compile time.

Of course, a finer granularity and a more relaxed (i.e. runtime) selection of the rounding mode would be interesting additional features. These may be made optional, and the hardware capability in this regard could be queried through appropriate device properties.

For example, considering the standard execution model for OpenCL, with work-groups mapped to compute units, it might make sense to support a granularity at the work-group level. This would be a nice addition, since it would allow e.g. to concurrently run the same code with different rounding modes (one per work-group), which would benefit applications geared towards the analysis of the stability of numerical code (as discussed in Rationale #1). But it's not strictly necessary.

Our lives are short

Some time ago someone on FriendFeed asked if anybody (else) would celebrate their kid's 1000th day. Among the negative answers, someone remarked that they'd rather celebrate the 1024th. And as hardened computer geek, that was my first thought as well.

But why stop there? Or rather: if your baby is young, why start there?

So we started collecting the other power-of-two dates (power-of-two-versaries1?) for our baby, and after asking about a few from WolframAlpha, I set up to write a Ruby script to do the computations for me, and the first question that arose was: what's the highest (integer) power of two that should be considered?

Since the script wasn't ready yet, I asked WolframAlpha again, to quickly discover that the best we can do comes significantly short of 216 days, which is almost 180 years (179 years, 5 months, and a few days, with the actual number of days depending on how many leap years are covered by the timespan)2.

Now, as it happens, 16 bits is the (minimum) width of the short data type in the C family of programming languages; in fact, on most platforms and systems, 16 bits its exactly the width of the short data type.

As it turns out, our lives are indeed (unsigned) short.


  1. if anybody has a better name for them, please tell. ↩

  2. most modern-day human adults can aspire at hitting three power-of-two-versaries at best: 213 (22 years, 5 months, and the usual bunch of days) as a young adult, 214 (44 years, 10 months and a week, day more day less) and finally 215 (89 years, 8 months and a half), for the lucky ones. ↩

Free versus Open: the Maps case

Foreword

(or skip this and jump straight to the actual content)

‘Free’ is an interesting word in English, particularly when pertaining to software and services. Although the original meaning of the word was pertaining to the concept of freedom, it has gained extensive use as a short form for “free of charge”, i.e. gratis, without requiring any (direct, obvious) payment.

Now, while context can usually help clarify which meaning is intended in modern usage, there are situations in which one of them is intended but the other is understood. For example, the sentence “that slave is now free” can mean that it has attained freedom, or that it is being given away (to another slave owner.)

A context where the ambiguity cannot be resolved automatically is that of software and services; in fact, the duplicity of the meaning has been plaguing the free software movement since its inception, which is why now the FLOSS acronym is often used: Free/Libre Open Source Software. The Libre is there to clarify what that Free is supposed to mean.

Of course, free (as in freedom) software and services tend to also be free of charge (and thus gratis), which is what makes them appealing to the large public who is not particularly interested in the ideology behind the software freedom movement.

And specifically to highlight how important the difference between the two concepts is, I'm going to discuss two distinct approaches to making worldwide maps available to the public: one which is free (as in gratis), but essentially closed, and the other which is free (as in libre), and open.

An introduction

Recently, Google has started offering “tourism boards, non-profit, government agencies, universities, research organizations or other entities interested” the possibility to borrow Google's Street View Trekker to help extend Google Maps' Street View.

To clarify, Google is “offering” these subjects the opportunity to expend the subjects' human resources to expand Google's own database. A company whose business is data gathering is giving other entities the “opportunity” to contribute to their data gathering efforts —for free1.

In simpler words, this is a bank asking people to donate money to them, but spinning it as an opportunity.

Google Maps

An objection that I'm sure will be raised is that this is not really like a bank asking for monetary donations, because banks' services are not free (of charge). For example, they loan money at a cost (the interest rate). By contrast, Google's services (and particularly Google Maps and its Street View feature) are free (of charge).

The objection is deeply flawed by the misconception about what Google services are, misconception driven by the inability to realize the difference between Google's consumers and Google's customers.

Google's consumer services (most publicly known Google services: Search, Mail, Maps, Picasa, now Plus) are as much a service as the cheese in a mouse trap is a meal. Their purpose is not to provide the service to the consumer, but to gather data about him or her.

Google is in the business of data gathering. Gathering data about people, gathering data about things people might be interested in. Selling this data to its customers (directly or indirectly) is what Google makes money off: the main income stream is advertisement, targeted advertisement that relies on your usage of Google's consumer services to see what you might be interested in. (And I'm not even getting into the NSA debacle because that would really steer us off topic.)

The key point is understanding who owns and controls the data, which in the case of Google Maps and Street View is Google. While the data is (currently) being made available back to Google's consumers free of charge, in post-processed form, that data remains solidly in the hands of Google, that may choose to use it (or not) as they see fit.

To the potential question “why would Google ever not make the data accessible?” (through its consumer services), the correct answer is, sadly, why not.

Google is in fact (in)famous for discontinuing services now and then, the most recent one being its Reader feed aggregator, the upcoming one being the personalized iGoogle page. But there is in fact one service that was discontinued so silently most people even failed to notice it got discontinued: wireless geolocation.

Google and wireless geolocation

Wireless geolocation is the computation of the location of a wireless device based on which wireless transmitters (of known location) it sees and the strength of the signal. This is typically used with cellphones, for example, based on which cell towers they see and the respective signal strength. It can be used with WiFi devices (laptops, smartphones, tablets, whatever) based on which wireless routers are visible —provided you know the position of the routers.

Now, as it happens, when Google was driving around in their Google Street View Cars snapping pictures for their Street View consumer service, they were also gathering information about wireless routers. The data thus gathered could be used by another Google consumer service, the geolocation API: you could ask Google “where am I if I see such and such routers with such and such signal strengths?” and Google would provide you with an approximate latitude and longitude.

(And let's skip the part where Google was collecting much more than the information needed for geolocation, something that got them in trouble in Germany, although it only led to a ridiculously low fine.)

The wireless geolocation service was available to the public, but that access has been discontinued since 2011, and access to a similar service is only available for business (essentially to allow Android to keep using it), with stricter controls on the its access. So Google still has the data, it still uses it, but the services based on it are not available to the general public anymore. What would you think if something like this happened to data you were “offered” to contribute?

User contributions

In fact, Google interest in external contributions is not limited to the recent offer to use a Trekker: Google Maps now has a number of options to allow users to contribute, ranging from changes and fixes to the map itself, to geotagged panoramic photos publicly shared on Picasa (which can be used for Street View).

I suspect that Google has learned from the experience of OpenStreetMap (which I will discuss later on) how powerful ‘crowdsourcing’ can be, while requiring much less resources on the company's side.

So you can contribute to make Google Maps better. The question is: should you? Or rather, would you? If you're willing to spend whatever small amount of time to contribute to global mapping, why would you do it for a company for which this is business?

Open Street Map

OpenStreetMap (Wikipedia article) was born in 2004 in the UK with the aim of providing a free (as in freedom) map of the world.

It's important to note right from the start the huge difference between how OSM is free versus how Google Maps is free: the latter provides a service that is available to consumer free of charge, the former provides mapping data which is not only available free of charge to anybody, but the use of which is also subject to very little restrictions (the actual license is the Open DataBase License, which, as explained here, essentially, allows anyone to access, modify and make derivatives of the data, provided proper attribution is given and derivatives are shared with the same liberal terms).

So there are two distinctive differences.

The first difference pertains what is made available. Google Maps provides a front-end to Google's data: the visualization (and related services), available mostly through their website and smartphone applications. By contrast, OpenStreetMap provides the actual mapping data, although a slip-map to access it in human-usable form is also provided on the website.

The second difference pertains the terms of use of what is made available. Although Google allows (in fact, encourages) embedding of the maps in other websites (free of charge within certain limits), the Terms of Service are otherwise pretty restrictive. By contrast, the license under which OSM data is made available is quite liberal, in that it only prevents misappropriation of the data, or the imposing of further restrictions on its use (I debate the paradox of restricting restrictions elsewhere in Italian).

OpenStreetMap, as any other collaborative effort for a knowledge base, is such that it benefits from anybody's contribution, but in perfect reciprocity anybody can benefit from it. This is in contrast to situations (such as that of Google Maps) where there is one main entity with dominant interests and control on the data (Google benefits from user contributions, and Google again benefits from consumers using its services, and it can arbitrarily limit their use by third parties).

There are commercial interests in OpenStreetMap. While some are essentially unidirectional (Apple, for example, used OSM data in its photo application for the iPhone —at first without attribution, thereby actually violating the license), others try to build a two-way relationship.

For example, at Flickr they use OSM data for (some of) their maps, and they also introduced OSM-related machine tags that can be used to associate photos to the places they were taken at. Yahoo (the company that owns Flickr) and Microsoft allow usage of their satellite and aerial photos for ‘armchair mapping’ (more on this later). MapQuest (formerly, the mapping website) has an Open alternative that relies on OSM, and they have contributed to the open-source software that drives OpenStreetMap (the renderer, the geocoding and search engine, the online editor), and they have funded the improvement of the actual data.

In some nations, OSM data comes (partially) from government sources, either directly (government-sponsored contributions) or indirectly (through volunteer work from government data). In some ways, it's actually surprising that governments and local administrations are not more involved in the project.

Considering that OSM contribution is essentially voluntary, the amount of information that has been added is actually amazing. Of course, there are large inhomogeneities: places that are mapped to an incredible detail, others where even the most basic information is missing: this site maps the density of information present throughout the world, showing this discrepancy in a spectacular fashion.

Why use OSM

Many (most, likely) end users are not particularly interested with the ideology of a project, nor with the medium and long term consequences on relying on particular vendors. For the most part, what end users are interested in is that a specific product delivers the information or service they seek in an appropriate manner.

In this sense, as long as the information they need is present and accessible, a user won't particularly care about using OpenStreetMap or Google Maps or any other particular service (TomTom, Garmin, Apple Maps, whatever): they will usually go with whatever they have available at hand, or with whatever their cultural context tends to favor.

On the other hand, there are a number of reasons, ranging from the ethical to the practical, why using an open, free (as in freedom) service such as OpenStreetMap should be preferred over opting-in to proprietary solutions. (I will not discuss the ethical ones, since they may be considered subjective, or not equally meaningful to everybody.)

On the practical side, we obviously have a win of OSM over paid proprietary solutions: being open and free (as in freedom), the OSM data is available free of charge as well.

But OSM also wins —somewhat unexpectedly— over other free-of-charge services such as Google Maps, as I found out myself, in a recent discovery that brought me back to OpenStreetMap after my initial, somewhat depressing experience with it over four years ago: the Android application for Google Maps does not offer offline navigation.

Finding that such an otherwise sophisticated application was missing such a basic function was quite surprising. In my case (I own a Transformer Infinity without cellphone functionality) it also rendered Google Maps essentially useless: the application allows you to download map data for offline usage2, which is useful to see where you are even when you can't connect to the Internet, but the functionality to get directions from one place to another is not actually present in the application itself: it's delegated to Google's server.

I was amazed by the discovery, and I'm still wondering why that would be the case. I can understand that optimal routing may depend on some amounts of real-time information, such as traffic conditions, that may only be available with an Internet connection, but why would the navigation features be completely relying on the online service?3

Since the lack of offline navigation meant the Google Maps app on Android was useless for me, I started looking for alternatives, and this is how I found out about, and finally settled for, OsmAnd, an open source4, offline navigator for Android that uses open data (from OpenStreetMap, but also e.g. from Wikipedia).

The existence of applications such as OsmAnd is excellent to explain the importance of open data: when Google Maps does not offer a particular service, it is basically impossible for anybody else to offer it based on their data. By contrast, OpenStreetMap offers no services by itself (aside from basic map rendering), but gives other projects the opportunity —and this time we really mean opportunity, not in the ironic sense we used when discussing Google's outreach to get manpower for free— to provide all possible kinds of services on top of their data.

There are in fact a number of applications, both commercial and not, that provide services based on OSM data. They all benefit from the presence and quality of the data, and they often, in one way or another, give back to OSM. The relevance of OSM is not just in it being a free world mapping website. It's also in the healthy ecosystem which is growing around it.

More interesting, OpenStreetMap sometimes wins over any (gratis or paid) services also in terms of quality and amount of mapped data. This happens whenever the local interest for good mapping is higher than the commercial interests of large external companies providing mapping services. Many small, possibly isolated communities (an example that was pointed out to me is that of Hella, in South Iceland) tend to be neglected by major vendors, as mapping them tends to be high-cost with very little or no return, while local mappers can do an excellent job just being driven by passion.

Why not use OSM

For the end user there are some equally obvious reason why they should not, or cannot, use OpenStreetMap, the most important being, unsurprisingly, lack or low quality of the data.

Although the OSM situation has distinctly improved over time, it's quite evident that there are still huge areas where Google and other proprietary providers of mapping services have more detailed, higher quality data than OSM. Of course, in such areas OpenStreetMap cannot be considered a viable alternative to services such as Google Maps.

It should be noted however that the OSM data is not intrinsically ‘worse’ than the data available from proprietary sources such as Google. In fact, Google itself is well aware of the fact that the data they have is not perfect, which is why they have turned to asking users for help: the amount of manpower required to refine mapping data and keep it up-to-date is far from trivial, and this is precisely where large amounts of small contributions can give their best results.

(Of course, the point then is, who would you rather help refine and improve their data?)

Another important point to be considered, as highlighted by the disclaimer on OSM's own website, is that their data should not be considered the end-all-and-be-all of worldwide mapping; there are use cases for which their data, as complete and detailed as it may be, should still not be used, as its reliability cannot be guaranteed, and it's in no way officially sanctioned. (Of course, similar disclaimers also apply to other map service providers, such as Google itself and MapQuest.)

There are finally types of data which OSM does not collect, because they are considered beyond the scope of the project: things such as Street VIew, or real-time information about public transport, or even the presence and distribution of wireless transmitters (for geolocation purposes). For this OSM obviously can't be used, but this doesn't necessarily mean that Google is the only viable alternative. (More on this later.)

Why (and how to) contribute to OSM

There is a very simple, yet important reason to contribute to OpenStreetMap: the more people are involved, the more everyone benefits from the improvements in the amount and quality of the data, in sharp contrast to the actual beneficiaries of your donated time and efforts to assist a company that thereafter gains control of the data you provide. In other words, if you plan on spending time in improving map data, it would be recommendable to do it for OpenStreetMap rather than a proprietary provider such as Google.

Moreover, contributing nowadays is much simpler than it was in the past, both because of the much more extensive amount of data already available (yes, this makes contributing easier) and because the tools needed to actually provide new data or improving the existing ones are more generally available and easier to use.

I first looked into OpenStreetMap around 2008 or 2009, at a time in which the state of the database was still abysmal (in my whereabouts as in most of the rest of the world). Contributing also required nontrivial amounts of time and resources: it required a GPS device which satisfied some specific conditions in terms of interoperability and functionality, and the use of tools that were everything but refined and easy to use. I gave up.

Things now are much different: if you are in the northern hemisphere (or at least one of the ‘western’ countries), chances are that most of your whereabouts have already been mapped to a high level of detail, so that your efforts can be more focused and integrated. Moreover, dedicated tools such as JOSM or even in-browser editors are available and (relatively) user-friendly (considering the task at hand). Finally, data is much easier to collect, with GPS receivers built in most common smartphones and numerous applications specifically designed to assist in mapping.

Indeed, while trying out the aforementioned OsmAnd to see how viable a navigation app it would have been, I found out a couple of places in my whereabouts where the data was not accurate (e.g. roundabouts not marked as such) or was out of date (former crossing recently turned into roundabouts). This was what finally got me into OSM contribution, as fixing things turned out to be quite easy, when starting from the data already present.

There are a number of ways to contribute to OpenStreetMap, with varying degree of required technological prowess, time investment and relevance of the changes.

The simplest way to contribute to OSM, Notes, has been introduced quite recently; in contrast to other methods it doesn't even require an account on OSM, although having one (and logging in) is still recommended.

The purpose of Notes is to leave a marker to report a problem with a specific location in the map, such as missing or wrong data (such as a one-way street not marked as such or with the opposite direction). Notes are free-form contributions that are not an integral part of the actual map data. Rather, more experienced mappers can use Notes to enact the actual necessary changes on the data, thereby ‘closing’ the Note (for example, fixing the one-way direction of the street).

Notes are a powerful feature since they allow even the less experienced users to contribute to OSM, although of course manual intervention is still needed so that the additional information can be merged with the rest of the data.

Any other contribution to OSM requires an account registered with the site, and the use of an editor to change or add to the actual map data. The website itself offers an online editor (two of them, actually), which can be practical for some quick changes; more sophisticated processing, on the other hand, are better done with external editors such as the aforementioned JOSM.

The simplest change that can be done to map data is the addition or correction of information about Points of Interest (POIs): bars and restaurants, hotels, stations, public toilets, newsstands, anything that can be of interest or useful to residents and tourists alike.

POIs are marked using tags, key-value combinations that describe both the kind of Point and any specific information that might be relevant. For example, amenity=restaurant is used to tag a restaurant, and additional tags may be used to specify the type of cooking available, or the opening hours of the business.

Tagging is almost free-form, in the sense that mappers are free to choose keys and values as they prefer, although a number of conventions are used throughout the map: such common coding is what allows software to identify places and present them to the end-user as appropriate. Most editors come with pre-configured tag sets, allowing less experienced user to mark POIs without detailed knowledge of the tag conventions.

In fact, tags are used everywhere around OSM, since the spatial data itself only comes in two forms: points, that mark individual locations, and ‘ways’, ordered collections of points that can mark anything from a road to a building, so that tags are essential to distinguish the many uses of these fundamental types5.

Contributing to the insertion and improvement of POIs is mostly important in areas where most of the basic information (roads, mostly) has already been mapped.

In less fortunate places, where this information is missing, the best way to contribute is to roll up your sleeves and start mapping. This can be done in two ways.

The preferred way is to get ‘on the ground’ with some kind of GPS receiver (nowadays, most smartphones will do the job nicely) and some way to record your position over time, as you walk or drive around. The GPS tracks thus collected can then be imported into an OSM-capable editor, cleaned up, tagged appropriately and uploaded to OpenStreetMap.

Lacking such a possibility, one can still resort to ‘armchair mapping’, tracing satellite or aerial maps for which this kind of usage has been allowed (e.g. those by Yahoo and Microsoft). Of course, the information thus tracked is more likely to be inaccurate, for example because of incorrect geolocation of the imagery, or because the imagery is simply out of date. Such an approach should thereby only be chosen as a last resort.

Who should contribute to OSM

The obvious answer to such a question would be ‘everybody’, although there quite a number of possible objections.

For example, an interesting paradox about OSM is that the ones better suited to generate the data are not necessarily those that would actually benefit from it: locals have the best ground knowledge about their whereabouts, but exactly because of this they are also the least likely to need it from OSM.

This is where the reciprocity in the benefits of using OpenStreetMap comes into play: with everyone taking care of ‘their curb’, users benefit from each other's contributions.

Of course, there are some parties, such as local administrations and tourism boards, for which accurate mapping is beneficial per se; yet, there aren't many cases in which they are directly involved in the improvement of OSM data. While this may seem surprising, there are many possible explanations for this lack of involvement.

There are, of course, legal reasons: aside from possible licensing issues that the administrations would have to sort out (due to the liberal licensing of OSM data), there is also the risk that an involvement of the administrations could somehow be misrepresented as an official sanctioning of the actual data, a dangerous connotation for content which still maintains a high degree of volatility due to possible third party intervention6.

There is also the issue of knowledge about the existence of OpenStreetMap not being particularly widespread; as such, there is lack of a strong motivation in getting involved. (This, of course, is easily solved by spreading the word.) What's worse, even when the existence of OSM is known, the project is lightly dismissed as an amateurish knock-off of more serious services such as Google Maps.

As an aside, the latter problem is not unique to OSM, and is shared by many open projects in their infancy7 —think e.g. how the perception of Linux has changed over the years. The problem is that this triggers a vicious circle: the less complete OpenStreetMap is, the less it's taken seriously; the less it's taken seriously, the less people contribute to it, making it harder to complete.

This is another reason why every contribution counts: the need to break out of the vicious circle, reach a critical mass such that people will consider it normal to look things up in OpenStreetMap (rather than on other, proprietary services) and eventually fix or augment it as appropriate.

The easiest way to start getting involved is with the addition of Points Of Interest that are personally ‘of interest’. You have a preference for Bitcoins? Help map commercial venues that accept them. You have kids? Help map baby-friendly restaurants and food courts. Are you passionate about Fair trade? Guess what you can help mapping. You get tired easily while walking around? Map the benches. Did you just book a few nights in a hotel which is missing from the map? Add it.

And most of all, spread the world. Get people involved.

Other open map-related services

OpenStreetMap has a rather specific objective, which excludes a number of map-related information. For example, OSM does not provide nor collects street-level imagery, and thus cannot replace StreetView. It also doesn't provide or collect information about wireless transmitters, and thus cannot be used for wireless geolocation. It also doesn't provide or collect real-time information about traffic or public transport, and thus cannot be used for adaptive routing.

As such, OSM cannot be considered an integral replacement for Google Maps (or other non-open mapping services), even when the actual ground map data is on par or even superior (yes it happens). This is where other services, —similarly open, and often integrated with OSM itself— can be of aid, although their current status and quality is often significantly inferior both compared to the current status and quality of OpenStreetMap itself and (of course) compared to the proprietary solutions.

Wireless geolocation

For wireless geolocation, there are actually a number of different solutions available. The largest WiFi mapping project (WiGLE) provides data free of charge, but under a very restrictive license, and thus cannot be considered open by any standard, so we will skip over that.

OpenBMap, active since 2009, can be considered open by most standards: it provides client and server software under the GPL, and it provides the collected data (both raw and processed) under the same Open Database License as OpenStreetMap.

At the time of writing, the OpenBMap database is not very strong (less than 900K data points are present in the processed files that can be downloaded from the website). In itself, this is an issue that is easily remedied, since data gathering for wireless networks is trivial (when compared to e.g. ground mapping) and can be fully automated: improving the database is therefore just a matter of spreading the word and having more people contribute.

As driven by the best intentions as the project can be, however, contributions to it are brought down by an overall amateurish presentation, both at the website level (the aesthetics and layout could use some refinement) and at the software level: albeit open source, its development is not managed as openly as it could be8, which makes collaboration harder.

A more recent project is OpenWLANmap. This project also provides open source software for wireless geolocation, but the openness of its database is more dubious. The license is not clearly indicated anywhere on the website (although the GNU Free Documentation License is distributed with the database downloads), and the downloadable database is only a subset of the data used by the website (about 1.6M of more than 5M data points), leading to the question about what happens to the user-submitted data. In this sense, its openness could be challenged, until these issues are resolved.

Yet another similar project is Geomena, started more or less at the same time as OpenBMap, but with a different licensing (Creative Commons instead of the ODBL). This project seems to be particularly focused on presenting an easy-to-use API both for querying and for contributing to the database. However, quite a few links are broken and the project doesn't seem to have moved much forward both in terms of application development and in terms of database growth (at the time of writing, just about 25K access points are claimed to be available, and the link to download the data is not functioning).

This fragmentation, with application bits and partial data bases spread out across different projects, none of which manages to provide a complete, well-organized, functional solution, is probably the most detrimental situation we could have. Getting these project together, sharing the data base as well as the efforts to provide accessible data and applications would be beneficial to all.

Street-level imagery

While gathering information about wireless networks can be trivially automatized thanks to the widespread diffusion of smartphones and similar devices, the kind of street-level imagery that would be useful to provide an open alternative to Google StreetView is quite laborious to take without specialized hardware. Photo stitching applications and camera software with automatic support for 360° photography can come in handy, but having to do this manually every few meters remains a daunting task.

Additionally, pictures taken may need to be cleaned up by masking out sensitive data such as faces, car license plates or whatever else might need masking depending on where the photo was taken.

These are probably —currently— the most significant obstacles to the creation of a competitive StreetView alternative. Despite them, a few projects that try to provide street-level imagery have been born more or less recently.

We have for example the most obviously-named OpenStreetView, with strong ties to OpenStreetMap and the aim to become a repository of open-licensed street-level imagery. Other projects, such as Geolocation.ws, use a different approach, acting as a mapping hub of open-licensed, geolocalized photos hosted by other services (Panoramio, Flickr, etc).

While the considerable lack of imagery and the difficulty in obtaining it are undoubtedly the biggest issues these projects face, the unrefined user interfaces and consequent reduced usefulness aren't exactly of assistance.

Real-time traffic and public transport

This is probably the data which is hardest (possibly impossible) to obtain without direct involvement of the interested parties. While routes, stops, and even expected timetables can be mapped and integrated into the standard OpenStreetMap database, real-time information such as actual bus departure time and route progress, or temporary issues such as strikes, abnormal high traffic conditions, or roadworks are completely outside the scope of the OpenStreetMap data and impossible to maintain without a continuous stream of information coming from somewhere or someone.

Talking about openness for such volatile data which can almost only be provided by a central controller is also less important in some ways. A more interesting subject for this topic would be some form of common standard to have access to this data, in place of the plethora of proprietary, inhomogeneous APIs made available by a variety of transport systems throughout the world.

Still, it would be interesting if something was cooked up based on a principle similar to that used for Waze, the crowd-sourced crowd-avoidance navigation system recently acquired by Google9. In fact, it wouldn't be a bad idea if an open alternative to Waze was developed and distributed: enhancing it to include alternative transportation methods (on foot, by bus, by bicycle10) would have the potential of turning it into a viable tool, even surpassing Waze itself.

Heck, it would even be possible to use the last open-source version of the Waze source code as a starting point; of course, openness of the collected data would this time become a strong point in competing against the proprietary yet still crowd-sourced alternative, especially when combined with smart integration with OpenStreetMap and its flourishing ecosystem.

There's good potential there to set up the infrastructure for a powerful, open routing engines with real-time information providers. It just needs someone with the courage to undertake the task.

Some conclusions

There is little doubt, in my opinion, on the importance of open mapping data. The maturity reached by OpenStreetMap over the years is also an excellent example of how a well-focused, well-managed, open, collaborative project can achieve excellent results.

The power of crowd-sourcing is such that OSM has often reached, when not surpassed, the quality of proprietary mapping services, and this has become so evident that even these proprietary mapping services are trying to co-opt their consumers into contributing to their closed, proprietary databases by disguising this racking up of free manpower as an opportunity for the volunteers to donate their time to somebody else's profit.

The biggest obstacle to OpenStreetMap, as with any collaborative project, is getting people involved. The improvements in technology have made participation much easier than it was at the project inception, and the increasing amount and quality of base ground data makes it also much easier to get people interested, as the basic usability of the project is much higher (it's easier to get started by fixing small things here and there than starting from scratch an uncharted area).

There are also other map-related data which is not collected by OpenStreetMap, since it is deemed outside its scope. While other projects have tried stepping in to cover those additional aims, their success so far has been considerably inferior, due sometimes to fragmentation and dispersal of efforts, sometimes to hard-to-overcome technical issues. It is to be hoped that in time even these project will find a way to come together and break through just as OSM managed, disrupting our dependency on commercial vendors.


  1. You will notice an Italian tag for this article. Sorry, I couldn't miss the chance. ↩

  2. update (2013-07-10): apparently in the just-recently released version 7 of the app, offline maps feature has been ‘almost removed’: you can cache the area which is being shown with a completely non-obvious command (“OK maps”, who the fsck was the genius that came up with this), but the sophisticated management of offline maps that was present in earlier versions is gone. ↩

  3. this cannot be about computational power of the devices, since other applications offering offline routing for Android are available; this cannot be about the algorithm depending on some pre-computed optimal partial routes, since these could be downloaded together with the offline data. One possible explanation could be that offline routing is more likely to find less optimal routes due to lack of useful real-time information and limited computational power of the devices, and Google would rather offer an ‘all or nothing’ service: either you get the optimal routing computed on Google servers, or you don't get any routing at all —but this sounds stupid, since a simple warning that the route could not be optimal when offline would be sufficient. Another possible explanation is that offline routing prevents the kind of data gathering that Google is always so eager about; sure, Maps could still provide that information on the next chance to sync with Google, but that would make the data gathering obvious, whereas Google is always trying to be discreet about it. ↩

  4. while the application is open source, it is distributed for free only with reduced functionality, and only the paid version has all features built in, unless you compile application for yourself. ↩

  5. points and ways can additionally be collected in ‘relations’, which are used to represent complex information such as bus routes or collections of polygons that represent individual entities, but delving this deep into the more advanced features of OSM is off-topic for this article. ↩

  6. in other words, even if the data contributed by official sources could be considered official, against the OSM disclaimer, it could still be as easily subject to editing from other users, and thus the metadata about who made changes and when would rise to a much higher importance than what it has now; this could be solved with approaches such as ‘freezing’ these kind of official contributions, but this would require a change in the licensing terms, and go against the hallmark of OpenStreetMap, its openness. ↩

  7. and make no mistake that OpenStreetMap is still in its infancy, although it has been running already for almost 10 years now and is now reasonably usable in large parts of the world. Mapping is a daunting task, and the amount of information that can still be collected is orders of magnitude higher than what has already been done. ↩

  8. the project seems to follow a strategy where the source code is released with (when not after) the public release of the applications, in contrast to the more open approach of keeping the entire development process public, to improve feedback and cooperation with external parties. ↩

  9. in fact, the acquisition of Waze by Google is another indication of how much Google values the opportunity (for them) to use crowd-sourced data gathering to improve the (online-only) routing services they offer to their consumers, thereby further locking in consumers into using their services. ↩

  10. of course, cyclists already have OpenCycleMap, the OpenStreetMap-based project dedicated to cycling routes. ↩

RaiTV, Silverlight e i video

Get the code for
UserJS/Greasemonkey per usare HTML5 invece di Silverlight su RaiTV:

gitweb
raitv.user.js
git
raitv.user.js

‘Mamma’ RAI, la concessionaria del servizio pubblico radiotelevisivo in Italia, è entrata nel terzo millennio rendendo le proprie trasmissioni disponibili online sul sito RaiTV, sia in streaming sia in archivio.

Quando la piattaforma RaiTV è stata costruita, la scelta sul software da utilizzare per la distribuzione è caduta su Silverlight, un prodotto con cui la Microsoft mirava a soppiantare il già esistente (e ben più supportato) Macromedia Flash.

La caratteristica principale di Silverlight è che si appoggia alla piattaforma Microsoft .NET che, nonostante le presunte potenziali intenzioni, non è multipiattaforma: è pur vero che esistono implementazioni incomplete e non perfettamente funzionanti di .NET per Linux e Mac, ma in entrambe queste altre piattaforme, guarda caso, Silverlight (o equivalenti) non funziona(no) bene.

Senza andare quindi ad indagare sulle possibili motivazioni dietro la scelta di questa piattaforma, il risultato netto è che il sito non è (perfettamente) fruibile senza avere Windows. È interessante notare che questa mancanza di generalità nella fruibilità del sito non è direttamente frutto (come in altri casi) della monocultura web di cui mi sono trovato a parlare in altri casi (il plugin per Silverlight funziona anche in altri browser, in Windows), ma è comunque legata al (sempre meno) dominante monopolio della Microsoft, dentro quanto fuori dal web.

Resta comunque il fatto che il sito istituzionale di un servizio pubblico non è (‘universalmente’) fruibile, e se questo è sempre qualcosa da contestare, lo è particolarmente quando il famoso canone RAI è sulla buona strada per diventare una tassa ‘a tappeto’, a prescindere dal fatto non solo che il cittadino voglia vedere trasmissioni RAI, ma persino dal fatto che possa: per esempio, in una casa come la mia, dove non vi sono televisioni ed i computer sono tutti con Linux, le trasmissioni RAI sono semplicemente inaccessibili.

L'effeto mobile

Che la dipendenza da Silverlight in RaiTV sia sostanzialmente spuria, almeno per quello che riguarda i video d'archivio, è evidente già semplicemente guardando il codice delle pagine: gli indirizzi dei video, in vari formati (WMV, MP4, H264) sono messi in bella vista, spesso già nello head del documento.

Volendo, quindi, si può procedere ‘a manina’, aprendo il sorgente del documento (cosa che si può fare in qualunque browser), cercando il videourl apposito, e passarlo al proprio programma preferito per poterlo finalmente vedere. Non proprio quello che si definisce un web ‘accessibili’.

Sul sito ‘genitore’ Rai.it, inoltre, studiando anche solo superficialmente il codice JavaScript con cui le pagine reagiscono ad alcune richieste (come vedere l'ultimo telegiornale o ascoltare l'ultimo giornale radio) si nota subito che il codice in questione prevede la possibilità che Silverlight non sia disponibile, ma limitatamente alle piattaforme mobile: iOS (sui gadget Apple) o Android: se il browser dice di essere su una tale piattaforma, il codice JavaScript provvede ad usare i tag video e audio introdotti con l'HTML5 e disponibili su queste piattaforme mobile.

La domanda è quindi: perché questo tipo di accesso ai contenuti non è reso disponibile anche sui normali desktop? Perché imporre l'uso di Silverlight in questi casi? La scelta è discutibile non solo per la chiusura della piattaforma di distribuzione della Microsoft, ma anche insensata: se il contenuto c'è, perché tenerlo nascosto?

(Peraltro, ad esempio, uno dei (presunti) benefici di Silverlight è la possibilità di adattare lo streaming video alla banda disponibile (abbassando la qualità per evitare salti o interruzioni in caso di “internet lenta”): paradossalmente, questa funzione sarebbe ben più utile su mobile (dove Silverlight generalmente non può essere utilizzato —non ricordo sui due piedi se Windows su mobile lo supporta), che non su desktop, dove generalmente si è collegati con un'ADSL (che si suppone) funzionante.)

È anche vero che appoggiarsi esclusivamente al supporto moderno per audio e video in HTML5 escluderebbe comunque chi (per un motivo o per un altro) ha un vecchio browser che non supporta questi tag. La risposta è semplice, e viene dalla possibilità offerta dal supporto multimediale di HTML5 di ‘ricadere’ su altre scelte quando i tag (o i formati!) non sono supportati al browser.

La struttura dovrebbe quindi essere la seguente: tag audio o video con le appropriate source multiple, in modo che ciascuna piattaforma (desktop o mobile) possa scegliere la più adatta, con un fallback alla situazione corrente: questo permetterebbe ai contenuti da essere fruibili quasi universalmente, per di più con il beneficio di una maggiore uniformità di codice, senza necessità di imporre manualmente i ‘casi speciali’ come viene attualmente fatto attraverso il JavaScript presente sulle pagine.

User JavaScript

Se (o finché) alla Rai non avranno il buon senso di applicare il suddetto suggerimento, la visione dei siti Rai in Linux (e Mac) richiede un intervento manuale (andarsi a cercare le URL dei video, da scaricare o aprire in programmi esterni).

Ma possiamo fare di meglio, sfruttando la possibilità offerta dalla maggior parte dei browser moderni (Chrome e derivati, Opera, Firefox e derivati —questi ultimi con l'ausilio dell'estensione GreaseMonkey) di concedere a script utenti di manipolare la pagina.

È a questo scopo che nasce questo user script, che agisce (quasi) esattamente come suggerito in chiusura del precedente paragrafo. Quasi, perché per una scelta personale il fallback su Silverlight è soppresso: lo script si prendere quindi briga di sostituire l'oggetto Silverlight con l'appropriato tag multimediale HTML5, con gli indirizzi pubblicizzati sulla pagina stessa.

Con la versione corrente dello script, dovrebbero essere finalmente fruibili senza intoppi non solo i video d'archivio su RaiTV (almeno su browser che supportino i codec utilizzati), ma anche alcune pagine del sito ‘madre’ Rai.it.

Buona visione.

A horizontal layout

This is a pure
CSS challenge:
no javascript,
no extra HTML

Webpages, mostly for historical reasons, are assumed to develop vertically: they have a natural horizontal extent that is limited by the viewport (screen, page) width, and an indefinite vertical extent realized through scrolling or pagination.

These assumptions on the natural development of the page are perfectly natural when dealing with standard, classic text layouts (for most modern scripts), where the lines develop horizontally, legibility is improved by limiting the line length and stacking of multiple, shorter lines is preferred to single long lines. In a context where it isn't easy to flow long texts into columns, the vertical development with constrained width is the optimal one.

However, these assumptions become instead an artificial constraint when columns with automatic text flow are possible, and this is exactly what this challenge is about: is it possible to achieve, purely with CSS, a layout that is vertically constrained to the viewport height, but can expand horizontally to fit its content?

The solution that doesn't work

Let's say, for example, that we have the following situation: there's a #page holding a #pageheader, #pagebody and #pagefooter ; the #pagebody itself normally contains simply a #content div that holds the actual page content. We want the #pageheader to sit on top, the #pagefooter to sit at the bottom, and the #pagebody to sit in the middle, with a fixed height, and extending horizontally to arbitrary lengths.

In principle, this should be achievable with the following approach: set the #content to use columns with a fixed width (e.g., columns: 30em) and to have a fixed height (e.g., height: 100%), which lets the #content grow with as many columns as needed, without a vertical overflow: by not constraining the width of any of the #content parents, I would expect it to grow indefinitely. The CSS would look something like this:

html, body, #page {
    height: 100%;
    width: auto;
    max-width: none;
}

#content {
    width: auto;
    max-width: none;
    height: 100%;
    columns: 30em;
}

#pagebody {
    width: auto;
    max-width: none;
    height: 100%;
    padding-top: 4em;
    padding-bottom: 8em;
}

#pageheader {
    position: fixed;
    top: 0;
    width: 100%;
    height: 4em;
}

#pagefooter {
    position: fixed;
    bottom: 0;
    height: 8em;
}

Why (and how) doesn't this work? Because there is an implicit assumption that the content will always be limited by the viewport. By setting the html, body, #page and #pagebody heights to 100%, we do manage to tell the layout engine that we don't want the layout to extend arbitrarily in the vertical direction, but even by specifying width: auto and max-width: none, we cannot prevent the width from being limited by the viewport.

What happens with this solution is that the #content overflows its containers, instead of stretching their width arbitrarily (which is what would happen instead in the vertical direction): this is clearly visible when setting up ornaments (e.g. borders) for #page and/or #pagebody. While this results in a readable page, the aesthetics are seriously hindered.

The challenge proper

How do you ‘unclamp’ the HTML width from the viewport width, with pure CSS? How do you allow elements to stretch indefinitely in the horizontal direction?

A potential alternative

In fact, there is a potential alternative to the unclamped width, which relies on the experimental ‘paged’ media proposed by Opera achievable, in the experimental builds of Opera supporting this feature, by setting #content { overflow-x: -o-paged-x} or #content { overflow-x: -o-paged-x-controls}.

In this case, the page would still be clamped to the viewport width (and in this sense this cannot be considered an actual solution to the challenge), but the content would be browsable with ease, and the aesthetics would not be spoiled by the mismanagement of the overflow.

An Opera Requiem? Part II

The news this time aren't as bad as the previous rumor. And this time it's news, not rumors: an official announcement has been made on the Opera Developer News that the Opera browser will switch to WebKit as rendering engine and to the V8 JavaScript engine.

Despite the nature of Opera as minority browser (very small userbase compare to other browsers, despite the announcement of ‘over 300 million users’ done in the same article), this announcement has started making quite some noise on the web, with lots of people siding with/against the choice, for a variety of reasons.

Honestly, I think this is ‘bad news’, not so much for Opera itself, but for the (future, upcoming) ‘state of the web’ in general.

On its impact for Opera

Let's start from the beginning, and summarize a few things.

So far, Opera's rendering engine(s) (currently Presto) has been the fourth (third, before the surge of WebKit) major rendering engine, beyond Trident (the one in Internet Explorer) and Gecko (the one in Mozilla Firefox). It was completely developed in-house, independently from the others, and it has for long held the crown as being the fastest, most lightweight and most standard-compliant renderer (I'll get back to this later).

Of course, Opera is not just its rendering engine: there are lots of user interface ideas that were pioneered in this browser (but made famous by other browsers), and the browser itself is actually a full-fledged Internet suite, including a mail and news client, an IRC client, a BitTorrent client, and so on and so forth. Amazingly, this has always been done while still keeping the program size and memory footprint surprisingly small.

With the announcement of the migration to WebKit as rendering engine, one of the things that used to make Opera unique (the speed, lightness and compliance of its rendering engine) disappears. This doesn't mean that the reason for Opera to exist suddenly ceases: in fact, its UI, keyboard shortcuts, development tools etc are the things that make me stick with Opera most of all.

Opera can still maintain (part of) its uniqueness, and even continue innovating, because of this. Not everybody agrees, of course, because the general impression is that by switching to WebKit Opera just becomes “another Chrome skin” (forgetting the WebKit was born from Konqueror's KHTML engine, and became WebKit after being re-released by Apple, cleaned up and refactored, as Safari's rendering engine).

And in fact, its UI and tools will keep being for me a primary reason to stick to Opera.

While the argument “if Opera is just a skin over Chrome or Chromium, why would I choose to use that instead of the original?” is probably going to make Opera lose a bunch of its 300 million users, I feel that the focus is being kept on the wrong side of the coin, because it focuses specifically on the impact of the choice for Opera itself, which is not the major problem with the migration, as I will discuss later.

There are also a number of people that remark how the change is a smart move on Opera's part, because it means less development cost and higher website compatibility.

On its impact on the web

And this, which an astoundingly high number of people see as a good thing, is actually what is horribly bad about Opera's migration to WebKit. The “it's good” argument goes along the line that the reduction in the number of rendering engines is good for Opera itself (breaks on less sites), and (more importantly) it's also good for all the web developers, that have to check their site against one less rendering engine.

And this, that they see as good, is bad. Actually, it's worse than bad, it's horrible. It's the kind of lazy thinking that is fed by, and feeds, a vicious circle, monocultures (sorry, it's in Italian). It's the kind of lazy thinking that leads to swamping.

Unsurprisingly, this is exactly the same thing that IE developers warn about: now that they find themselves on the losing side of the (new) browser wars, they suddenly realize how standard compliance is a good thing, and how browser-specific (or engine-specific, in this case) development is bad. And they are right.

Monoculture, and the consequent swamping, is what happened when IE6 won the last browser war, and it is what will happen —again— now that WebKit is becoming the new de facto “standard implementation”. There are a number of websites that fail to work (at all or correctly) when not used with WebKit: this is currently due to their use of experimental, non standard features, but it's still problematic.

It's problematic because it violates one of the most important tenets of web development, which is graceful degradation. It's problematic because it encourages the use of a single, specific rendering engine. It's problematic because as this ‘convention’ spreads, other rendering engines will start to be completely ignored. It's problematic because getting out of monocultures is slow and painful, and falling into one should be prevented at all costs.

Some people think that the current situation cannot be compared with the IE6 horror because the new emerging monoculture is dominated by an open-source engine. And I have little doubt that an open-source monoculture is better than a closed-source one, but I also think that this barely affects the bad undersides of it being a monoculture.

Monocultural issues, take II

The biggest issue with monoculture is that it triggers a vicious circle of lazyness.

This may seem paradoxical in the current context, because one of the reasons WebKit-specific websites choose WebKit is because it has lots of fancy, fuzzy, funny, interesting experimental features. People choose WebKit because it innovates, it introduces new things, it provides more power and flexibility! Which is true, as it was true about Trident when IE6 won the war.

But these experimental features don't always mature into properly-functioning, accepted standards. Building entire websites around them is dangerous, and wrong when no fallback is provided. (Unless the entire purpose of the website is to test those features, of course, but we're talking about production here.)

One of the complaints I've read about Opera is that the support of its Presto engine for the more recent CSS3 standard is lacking, as in it implements less of it than WebKit. This is quite probably true, but in my experience the compliance of the Presto engine to the parts of the standard that it actually implements is much better than any other rendering engine.

What is happening with WebKit is that hundreds of ‘new features’ are being poorly implemented, and their poor implementation is becoming the de facto standard. As people target WebKit, they only care about working around the WebKit bugs. This is bad for two reasons: one, it prevents other engines that don't have those bugs from presenting those pages correctly; two, it demotivates bugfixing in WebKit itself.

As I happen to use Opera as my primary browser, I tend to design for Presto first, and then check against WebKit (with Chromium) and Gecko (with Firefox) if the page still renders correctly. So far, whenever I've found unexpected rendering in either of the other browsers, it has been because those browsers violate the standard. In fact, checking the rendering of a webpage in other browser is still the best way to check where the bug lies. (Just an example of a bug in Chromium that I came across while working on this page.) Still now, when reporting bugs, the first question that is often posed is “how does this thing behave in other browsers?”

If all browsers use WebKit, there is suddenly no more motivation in fixing the bug, even when the standard claims a different behavior should be expected, and the different behavior is the sane one. If there is only one implementation, even if it's wrong and this creates a problem, it is quite likely that the bug will go unfixed, especially if fixing it will break sites that have adopted the wrong behavior as being the ‘good’ one.

The loss of Presto as a rendering engine is not something that should be cherished with relief. It's something that should be cried with desperation, because it's a nail in the coffin of web development.

My hope, at this point, is that Opera will do something really helpful, and it is to fix all of the horrible messy non-compliant stuff that currently cripples WebKit. They started already. Let's at least hope that this continues, and that their patches will make their way upstream: and even if they don't, let's at least hope that they will keep the fixes in their implementation.

(But if the switch is motivated (also) by the need to cut development resources —an hypothesis which is likely to be at least partially true— how much can we hope that these kind of contribution will continue?)

Transformer Infinity: an expensive toy

In late 2012 I decided it was finally time to gift myself with a Transformer Infinity, a piece of hardware (or rather a class of hardware) I had had my eyes on for sometime. After a couple months of usage, I can finally start writing up my thoughts on its hardware and software.

This ‘review’ should be read keeping in mind that I'm not an ‘average user’: I'm not the ‘intended target’ for this class of devices, I'm a power user that prefers control to “kid-proof interfaces”, and I have some very specific needs for what I could consider the ultimate device. This is particularly important, and will be stressed again, in discussing software. And all of this will come together in the conclusions (there's also a TL;DR version for the lazy).

The hardware

One of the things, if not the thing, I like the most about the Transformer line is the brilliant idea of enhancing the practical form of the tablet with the possibility of converting it to a netbook: after all, one of the things that had been bothering me about the whole tablet concept was the impracticality of the on-screen keyboard, stealing reading estate to offer something on which typing for long periods is not exactly the most comfortable experience.

While it is possible to use external (bluetooth, typically) keyboards with other tablets, the simple yet brilliant idea in the Transformer is to have the keyboard as an integral (yet detachable, and separately bought) part of the tablet, additionally acting as a cover. The idea was very poorly copied also by the Microsoft Surface, that however misses some of the key points that make the Transformer so good, such as the fact that the TF keyboard is actually a full-fledged docking station, also offering extra battery and additional connectors.

This feature has thus been the major selling point of the TF while I was evaluating which tablet to get, once I was settled on getting one. The other factors that came into play were screen resolution and connectivity. The next competitor in line was Google's Nexus, that while sporting a much better resolution than the TF700T, was missing any kind of external media support: the Transformer, in addition to a micro-SD slot on the tablet also features an USB port on the docking station (yes, you can plug USB pen drives into it). Oh, and of course the fact that the 10" version of the Nexus was not actually available in my country also influenced the decision.

In the end I decided that the TF700T had a high enough resolution for my tastes. In fact, screen resolution is the reason why I finally got the Transformer Infinity rather than the Padphone, this other brilliant idea from Asus of having a smartphone that gets embedded in a tablet which is itself essentially like a Transformer (supporting the same keyboards/docking stations). If the tablet component of the Padphone didn't lag behind the actual Transformer line in terms of hardware, I would have definitely shelled the extra euros to get it.

And while we're talking about screen, I'm among those that doesn't like the wide formats (16:9, 16:10) which is currently standard (if not the only option) for monitors; however, I do believe that these formats are a good idea compared to the 4:3 ratio of auld times and modern iPads when it comes to tablets.

Indeed, the most annoying part about widescreen monitors (for computers) is that a lot of the available screen estate is wasted for many common usages (everything that revolve around text, essentially), and while they do come handy with multiple text windows side by side or when forced to read long-ass lines (such as some wide tables and stuff like that), they are not really a good alternative to just a bigger, higher-resolution (4:3) display, as the widescreen allows reading longer text lines at the expense of the number of text lines. And like or not, much of our computer usage (even if it's just social networking) still revolves around text.

However, what is annoying in computer monitors is actually a bonus point on tablets: since these devices are often used in portrait mode, with the longer dimension being kept vertical, their widescreen format is actually a long screen format, keeping more text lines in view and requiring less page-flipping. And it's not only about text: when reading full-page comics, the widescreen (longscreen) format actually wastes less screen estate, typically, than the 4:3 format, at least in my experience (but then again it might depend on what format the comics you read are in).

Stuff I don't like

Although I'm overall pretty satisfied with the Transformer Infinity hardware, there are a few things I don't like.

The first issue I have is with the glossy display. I have an issues with glossy displays in general, not just on this tablet. I hate glossy displays. I find it astounding that after years of efforts to make computer monitors anti-glare, no-reflection and overall less straining for the eyes of long-term users (when computers meant office space and work), the last 10 years have seen this fall back to displays that can only be decently used in optimal lighting conditions.

And no, no IPS (plus or nonplus) or other trick is going to solve the problem of a horrendously reflective surface. It helps, but it doesn't solve the problem. Even my colleague, die-hard Mac fan, finally had to acknowledge that the purportedly unreflective glossy display in the latest MacBook Pros is still more straining than the crappiest matte display in suboptimal lighting conditions, i.e. almost always.

But sadly, matte displays don't seem to be an option on tablets, that I can see (ebook readers do have them, though). So regardless of how much it bothers me, there seems to be no alternative to watching myself in the mirror when looking at darker content. Seriously, can some manufacturer come up and offer matte displays please? I'm even willing to spend a couple of extra euros for that (not 50, but up to 10 I would accept, even knowing that the process costs just a few cents per display).

The second thing I don't like about the Transformer Infinity is the connector. When I unpacked the TF700T (which I got before the docking keyboard), I was seriously pissed. What the heck Asus, I spend my days mocking iPad users for their ass proprietary connector and you play this dirty trick on me? Not cool.

Then I realized that the connector is actually the same connector that ties the tablet to the docking keyboard, and I realized that a standard USB port would have not made sense. While the external USB port of the dock, its SD card slot and the keyboard could possibly all have been made accessible to the tablet by presenting the dock as an unpowered USB hub, it would have been impossible to also allow charging the pad from the dock, which is in fact one of the most useful feature the Transformer has.

Tightly related to the connector issue is the power issue: although the other end of the power cable for the Transformer is a standard USB cable, you can't typically charge it from a standard power source, except for slow trickle charging with the device off or at least in standby, due to the power draw. Asus' own “wall wart” (the wall plug/USB adapter), on the other hand, detects the presence of the Transformer and can feed it more current (15V, 2A if I'm not mistaken) thereby allowing faster charging and making it possible to charge the device even while in use (very useful for bedside use at the end of the day, when the device battery is likely to be close to exhaustion)

I honestly wouldn't mind if the industry came up with a common standard that worked around the current limitations of USB, at least for device charging: even the N900, Nokia's best although now obsolescent phone, is a little picky on its power source, refusing to charge from some low-cost ‘universal’ chargers (it does work correctly with the wall wart that shipped with an HTC smartphone we bought a couple of years ago, so it does work with non-Nokia chargers).

Finally, not really necessary but a bonus point of docking keyboard would have been an Ethernet port: Asus themselves have found a very smart way to keep the port ‘thin’ when not in use, a solution that they use in their latest ultrabooks (or whatever you want to call thin, 11" laptops), a solution that I believe could be employed also in the docking keyboard of the Transformer. Its absence is not really a negative point (how often are you going to need to use a network cable with a device as portable as a tablet), but would have been a nice addition for its use in netbook form.

The software

The Transformer Infinity is an Android tablet. Most of what I'm going to say here is therefore about Android in general, except for the few things that have been ‘enhanced’ or otherwise changed by Asus.

Android, for me, is interesting, because it's probably the first (successful) example of large-scale ‘macroscopic’ deployment of the Linux kernel beyond the ‘classic’ server or workstation use (only recently trickled down into domestic use with Ubuntu and related distributions). (By macroscopic I am here referring to the systems with which the user interacts frequently, thereby excluding embedded systems —think of the many Linux-based ADSL modem/routers.)

While Android shares a very important part of its core with ‘classic’ Linux distributions (and even there, not really, since the Linux kernel in Android is heavily modified and it has only been recently that its changes have started trickling upstream into the main Linux source), the userspace part of Android, and specifically the middleware, the software layer between the Linux kernel and the actual user applications, is completely different.

Because of this, Android is actually the first system that suddenly motivates the FSF insistence on having the classic Linux systems be called GNU/Linux rather than simply Linux. On the other hand, the userspace in classic Linux system is not just GNU (and it's not like the X server, or desktop environments such as KDE, are insignificant components), so isn't just GNU/Linux just a little arrogant?

But I digress. The fact that Android is not a classic Linux distribution, however, is an important point, especially for someone like me, for reasons that I'm going to explain in the following.

Android, much like iOS, is an operating system designed for devices whose main target use is (interactive) consumption rather than production. Sure, there are applications available for both systems that can exploit the device features in creative ways, but even these are mostly focused on personal entertainment than anything else.

It's not like the operating systems actively prevent more sophisticated and heavy-duty usages: it's just that they don't particularly encourage it, since the usually limited hardware of the devices they run on wouldn't make productivity particularly comfortable.

After all, even I, a tinkerer and power user, finally bought the tablet having comic book reading in mind as its primary use (although admittedly 700€ is a little too much for just that).

For this intended target, Android is exceptionally well-designed. Thanks also to the very tight integration with the wide range of ‘cloud’ services offered by Google, it provides a very functional environment right from the start; and since it's “all in the cloud”, you don't even have to worry about synchronization among devices. All very fine and dandy —as long as you don't care having all your data in the hands of a single huge company whose main interest is advertising.

As if selling your soul to Google wasn't enough, Asus adds some of its own, by keeping track of your device with ridiculously extreme precision, even when geolocation services are disabled. I would recommend not carrying your Asus-branded Android device when committing crimes (I don't actually know if this “phoning home” thing is Asus-specific or general for Android), but the feature could come in handy if somebody stole it.

Aside from these creepy aspects, as I was saying, Android is actually quite nice. The software choice so far has also been rather satisfactory: aside from the games that I bought from the Humble Indie Bundles, the must-have Simon Tatham's Puzzles collection (which I have everywhere) and a few others available for free (many with ads), the most important piece of software I took care of installing was Perfect Viewer, which I obviously used mostly as a comic book reader.

In fact, Perfect Viewer is an excellent example to introduce what I really hate about Android: control. Perfect Viewer has a very useful feature, which is the ability to access files on remote machines. Why is this feature useful? Because Android doesn't provide it by default.

This, in my opinion, is a horrible failure on the part of the operating system: it should be its duty, after all, to provide a unified method to access remote files, which would be transparently available to all applications. Why should every application reimplement this basic functionality? Result: you can't peruse your home-server-stored media collection from VLC on Android, but you can peruse your graphics novel collection because one application went the extra mile to implement the feature.

The failure of Android to actually provide a built-in method to access remote directories is particularly grave considering that there is no practical reason why this shouldn't be available: the kernel (Linux) is very apt at mounting remote filesystems with a variety of protocols, and Android itself is already designed to expose mount points transparently to applications (easily seen when making use of the (micro-)SD card slots available on devices that provide them).

So not providing the possibility to mount remote shares is actually a design choice that require disabling feature Linux can provide. And I find it interesting that the web offers a plethora of tutorials to guide people through the gimmicks necessary to make the feature available (gimmicks that include ‘rooting’ your device to gain complete control of it). I find this interesting because it shows that it's not just ‘power users’ like me that need this feature (unsurprisingly, as media collections on home servers are common, and growing in popularity, and tables are good for media consumption —if you have a way to actually access the stupid media).

One is taken to wonder why is this feature not available in Android's stock builds. Sadly, the only reason I can think for this is that this forces people to unnecessarily use online services (such as —oh right— the ones offered by Google) that provide selective, targeted, and pricier alternatives to the widespread home server approach.

But I see this as a single instance of a more general problem with Android, i.e. the lack of control from the user. There's a lot in Android happening ‘behind the scenes’, and much of it is something which is intentionally hidden from the user, and which the user is actively prevented from operating on.

While the devices where Android runs are general purpose devices (like all computers), the operating system is designed to only allow exposure to selected features in selected ways. And even though it's not as bad as the competition (for example, in contrast to iOS, enabling “out of band” installations, i.e. installation of applications not downloaded from “official” channels like the Google Play Store, is a simple option in the settings), it's still a strong contribution to the war against general purpose computing (a topic on which I have a lot of things to say but for which I haven't yet found the time to patiently write them down).

Compare this with Maemo, the stock operating system of the N900 (ah, sorry, that's in Italian only for the time being): a full-fledged Debian-based Linux distribution; while the device is still very easy to operate, the underlying power and flexibility of the (GNU and more) classic Linux userspace remains accessible for those that want it. Maemo showed pretty clearly that you don't need to sacrifice power and flexibility to offer ease of use —unless that's what you actually want to do. And the fact that you do want to do that is for me an extremely negative sign.

There are efforts to make Android more power-user friendly, even without requiring hacks or rooting, the most significant probably being applications such as Irssi ConnectBot and the Terminal IDE. However, there's only so much they can do to work around some intrinsic, intentional deficiencies in the operating system.

Some conclusions

Ultimately, I was actually hoping to be able to exploit the convertible nature of the Transformer Infinity to make the device supplant my current ‘bedtime computer’, a Samsung N150 netbook that has been faithfully serving us since we bought it when we married.

At the hardware level, the TF700T could quite easily do it: it has the same screen size, with higher resolution (although the Samsung display does have the benefit of being matte), the keyboard is similarly sized, the battery (especially when docked) lasts longer, the CPU is better, the GPU is better, the webcam is better (and there are two of them), the amount of RAM is the same. There are some things in which the Transformer falls behind, such as having a smaller hard-disk, or less connectors, but these are things that don't normally have a weight in the usage I have for my netbook.

Where the Transformer falls really behind, though, is in the software space. The netbook has Windows preinstalled (and I'm keeping it that way so for those rare emergencies where an actual Windows installation might be needed), but I have a nifty USB pendrive with a Linux distribution on it, which I boot from to have exactly the system that I need: all of tools are there, it's configured to behave the way I want it, and so on and so forth. On the TF700T, I can't boot from the USB pendrive (I'd have to prepare a new one anyway, because of the different hardware architecture, but that wouldn't be difficult). I wouldn't even need to, in fact, if only Android wasn't such a crippled Linux.

It's no surprise that there are efforts underway to be have both the Android and the more classical Linux userspaces are your hands, such as this one, that gives you both an Android and Debian systems running on the same (Android) kernel. It's not perfect (for example, it still relies heavily on the Android subsystem for keyboard handling, and the X server must be accessed ‘remotely’), but it's a step in the right direction.

My ideal system? A Dalvik virtual machine running on something like Maemo or Meego. There actually was a company (Myriad Group) working on something like this (what they call “Alien Dalvik”): not open source, though, nor universally accessible (it's for OEMs, apparently). Pity.

TL;DR

I like (with a couple of caveats) the hardware of the Transformer Infinity (TF700T) and its capability of becoming a netbook by adding the mobile dock. I wish the software (Android) were friendlier to power users, though, to better exploit this.

GPGPU: what it is, what it isn't, what it can be

A little bit of history

My first contact with GPGPU happened during my first post-doc, although for reasons totally unrelated to work: an open-source videogame (an implementation of the Settlers of Catan boardgame) I was giving small contributions to happened to have a header file which was (legitimately) copied over from the GPGPU programming project.

This was 2006, so the stuff at the time was extremely preliminary and not directly supported by the hardware manufacturers, but it did open my eyes to the possibility of using graphic cards, whose main development was geared towards hard-core gaming, for other computational purposes, and particularly scientific ones.

So where does GPGPU come from?

The term GPU (Graphic Processing Unit) emerges in the mid-90s, to describe graphic cards and other video hardware with enough computational power to take care of the heavy-duty task of rendering complex, animated three-dimensional scenes in real time.

Initially, although GPUs were computationally more gifted than their predecessors whose most complex task was blitting (combining rectangular pixel blocks with binary operators such as AND, OR or XOR), their computational power was limited to a set of operations which is nowadays knows as the “fixed-functions pipeline”.

The barebone essentials you need to render a three-dimensional scene is: a way to describe the geometry of the objects, a way to describe the position of the light(s), and a way to describe the position of the observer. Light and observer positions are little more than points in three-dimensional space (for the observer you also need to know which way is ‘up’ and what his field of view is, but those are details we don't particularly care about now), and geometries can be described by simple two-dimensional figures immersed in three-dimensional space: triangles, squares. Of course, since simple colors will not get you far, you also want to paint the inside of these triangles and squares with some given pictures (e.g. something that resembles cobblestone), a process that is called ‘texturing’.

Once you have the geometry (vertices), lights and observer, rendering the scene is just a matter of doing some mathematical operations on them, such as interpolation between vertices to draw lines, or projections (i.e. matrix/vector products) from the three-dimensional space onto the two-dimensional visual plane of the observer. Of course, this has to be done for every single triangle in the scene (and you can have hundreds, thousands, hundreds of thousands or even millions of triangles in a scene), every time the scene is rendered (which should be at least as often as the screen refreshes, so at least some 50, nowadays 60, times per second).

Fixed-function pipelines in GPUs are therefore optimized for very simple mathematical operations, repeated millions (nowadays even billions) of times per second. But as powerful as you can get, there are limits to where simple triangles and a naive lighting model can get you: and this is why, by the end of the XX century, hardware support for shaders started popping up on GPUs.

Shaders are programs that can compute sophisticated lighting effects (of which shadows are only a small part). Since the effects that may be achieved with shaders are very varied, they may not be implemented within the classic fixed-function pipeline. Dedicated computational hardware that could execute these programs (called kernels) had to be introduced.

And suddenly, video cards were not fixed-function devices anymore, but had become programmable, even though still with limitations and peculiar behavior: shader kernels are programs that gets executed on each vertex of the geometry, or on each pixel of the scene, and only a limited number of computational features were initially available, since the hardware was still designed for the kind of manipulation that would be of interest for 3D rendering.

However, with all their limitations, GPUs now had a very interesting feature: you could tell them to do a specific set of operations on each element of a set (vertex, pixel). The essence of parallel programming, with hardware designed specifically for it. So why not abuse this capability to do things which have nothing to do with three-dimensional scene rendering?

This is where GPGPU started, with some impressive (for the time) results. Of course, it was all but trivial: you had to fake a scene to be rendered, pass it to the card, ask it to render it and manipulate the scene data with some shader kernels, and then get the resulting rendered scene and interpret it as the result of the computation. Possible, but clumsy, so that a number of libraries and other development tools started appearing (such as Brook) to make the task easier.

As usage of GPUs for non-graphical tasks spread, hardware manufacturers started to realize that there was an opportunity in making things easier for developers, and the true power of GPGPU was made available.

The first ‘real’ GPGPU solutions started appearing between 2006 and 2007, when the two (remaining) major GPU manufacturer (ATi —shortly after acquired by AMD— and NVIDIA) realized that with minimal effort it was possible to expose the shader cores of the GPU and make them available beyond the simple scope of scene rendering.

Although buffers, texture engines and shader cores were now made accessible outside of the rendering pipeline, their functional behavior was not altered significantly, something that has a significant impact on their optimal usage patterns and some behavior peculiarities that inevitably arise during the use of GPUs as computing devices.

The GPU is (not) my co-processor

Before the Pentium, Intel (and compatible) CPUs had very limited (floating point) math capabilities, since they were deemed unnecessary for the common market. If you were a scientist or other white-collar worker that needed fast floating-point computations, you could however shell money for an FPU (Floating-point Unit), an auxiliary processor specialized in floating-point operations; these units were marked with a numerical code which was the same as the CPU, except for the final digit, a 7 instead of a 6: so you would have the 8087 next to an 8086, or a 387 next to a 386; and by ‘next’ I mean physically next to it, because the socket where the FPU had to be inserted was typically adjacent to the socket of the CPU.

The FPU as a co-processor started disappearing with the 486, which had two variants, whose high-level one (the 486DX) had an FPU integrated in the actual CPU. With the introduction of the Pentium, the FPU started being a permanent component of the CPU, and it started evolving (it had remained essentially unchanged since the inception) to support the famous extended ‘multimedia’ instruction sets (MMX, 3DNow!, the various SSE generations, up until the latest AVX extension) of subsequent CPUs. (And by the way, the fact that the FPU and MMX functionalities were implemented in the same piece of hardware had a horrible impact on performance when you used both at the same time. But that's a different topic.)

One of the tenets of GPGPU (marketing) is that the GPU can be used as a co-processor of the CPU. However, there are some very important differences between a co-processor like the FPU, and the GPU.

First of all, the FPU was physically attached to the same bus as the CPU, and FPU instructions were part of the CPU instruction set: the CPU had to detect the FPU instructions and either pass control to the FPU or decode the instructions itself and then pass the decoded instruction to the FPU. Secondly, even though the FPU has a stack of registers, it doesn't have its own RAM.

By contrast, the GPU is more like a co-computer: it has its own RAM, and its own instruction set of which the CPU is completely unaware. The GPU is not controlled by the CPU directly, as it happens with a co-processor, but rather the software driver instructs the CPU to send the GPU specific bits which the GPU will interpret as more or less abstract commands such as “load this program (kernel)”, “copy this data”, “execute this other program (kernel)”.

Since all communication has to go through the PCI bus, naively using the GPU as a coprocessor is extremely inefficient: most of the time would be spent just exchanging commands and data; this, in fact, was one of the reasons why the old GPGPU approach based on the graphics stack ended up consistently underperforming with respect to the expectable GPU speed.

The most efficient use of the GPU is therefore as an external machine, communication with which should be limited to the bare minimum: upload as much data as possible at once, load all the programs, issue the programs in sequence, and don't get any (intermediate) data back until it's actually needed on the CPU. It's not just a matter of offloading heavy computations to the GPU: it's about using a separate, complex device for what it was designed for.

How much faster is a GPU?

When the GPGPU craze found its way into marketing (especially with NVIDIA's push for their new CUDA technology), the GPUs were boasted as cheap high-performance co-processors that would allow programs to reach a speed-up of two orders of magnitude (over a hundred times faster!), and a large collection of examples showcasing these incredible benefits started coming up. The orders of magnitude of speed-ups even became the almost only topic of the first published ‘research’ papers on the subject.

Although such incredible speed-ups are quite possible when using GPUs, the reality is quite more complex, and a non-negligible part of these speed-ups are actually possible even on standard CPUs. To understand more in detail what practical speed-ups can be expected, we have to look at the fundamental areas where GPUs perform (potentially) much better than CPUs (computational power and memory bandwidth), and the conditions under which this better performance can actually be achieved.

Faster memory (?)

Let us look at memory first. It's undeniably true that GPU memory is designed to have a much higher bandwidth than the RAM normally mounted on the computer motherboard (hereafter referred to as ‘CPU memory’): in 2007, when the GPGPU started being officially supported by hardware manufacturers, GPUs' memory had peak theoretical bandwidths ranging from 6.4 GB/s (on low-end GPUs using DDR2 chips for memory) to over 100 GB/s (on high-end cards using GDDR3 chips). By comparison, CPUs usually had DDR2 chips, whose performance ranges from 3.2 GB/s to 8.5 GB/s. Now (2012) GPUs can reach bandwidths of almost 200 GB/s with GDDR5 memory, whereas the best CPUs can hope for is less than 20 GB/s on DDR3.

Since the bandwidth is almost consistently an order of magnitude higher on GPUs than on CPUs, one should expect an order of magnitude in speed-up for a problem that is memory-bound (does a lot of memory access and very little computations), assuming it can get close to the theoretical bandwidth peak and assuming the data is already on the device.

We'll talk about the problem of putting data on the device later on, but we can mention a few things about reaching the peak bandwidth already, without getting into too much details.

The silver lining in the higher bandwidth on GPUs is latency. While CPU to (uncached) RAM access latency is usually less than 100ns, on GPUs this is 3 to 5 times higher; and the first GPUs had no cache to speak of (except for textures, but that's a different matter, since textures also have lower bandwidth). Of course, GPUs have specific methods to cover this high latency: after all, a GPU is optimized for moving large slabs of data around, as long as such data is organized ‘appropriately’, and the memory access are designed accordingly.

Therefore, memory-bound GPU algorithms have to be designed in such a way that they make as much as possible use of these latency reduction techniques (coalescing on NVIDIA GPUs, fastpath usage on AMD GPUs), lest they see their performance drop from being 10 times faster than on CPU to being no more than 2 or 3 times faster. These remarks are particularly important for the implementation of scatter/gather or sorting algorithms.

Faster computing (?)

Of course, where GPUs really shine is not in juggling data around, but in doing actual computations on them: gamer GPUs passed the (theoretical) teraFLOPS barrier in 2008 (Radeon HD 4850), when the best (desktop) CPU of the time fell short of achieving some theoretical 60 gigaFLOPS, and most common ones couldn't dream of getting half as much.

But from 20 gigaFLOPS to 1 teraFLOPs there's only a factor of 50: so where do the claimed two orders of magnitude in speedup come from? Unsurprisingly, the difference comes from a consistent underutilization of the CPUs. We'll leave that aside for the moment, though, and focus instead on the impressive (theoretical) performance sported by GPUs.

The first thing that should be mentioned about GPUs is that they are not designed to be fast in the sense that CPUs are fast. For years, CPU performance was strongly tied to the frequency at which it operates, with a theoretical upper limit of one instruction per cycle, which would mean that a CPU running at 1GHz couldn't do more than one (short) billion operations per seconds. These days, the fastest desktop CPUs run at over 3GHz, while the fastest GPUs have computing clocks which are at about 1GHz, or even less.

However, GPUs are designed for massively parallel tasks, such as running a specific sequence of instructions on each element of a set of vertices or pixels, with each element being processed independently (or almost independently) from the other. The shaders in GPUs are made up by a large number of processing elements collected in multiprocessors, with each multiprocessor capable of executing the same single instruction (sequence) on a large number of elements at once.

In some sense, GPUs can be seen as a collection of SIMD (Single Instruction, Multiple Data) processors (typically, 10 or more), each with a very wide vector width (typically 32 for NVIDIA, 64 for AMD); while modern CPUs are also SIMD-capable, with their MMX and SSE instructions, and can also sport multiple cores, they have less SIMD lanes (typically 4 or 8), and less cores (2 to 6) than GPUs.

The GPU programming tools and languages expose this massive parallel computing capability, and make it very easy to exploit it. The simplest GPU programs consist in a kernel, i.e. sequence of instructions that are to be executed (typically) on a single element of a set, which is given to the GPU to be run on all the elements of a given set.

By contrast, exploiting the vector capabilities of modern CPUs and their multiple cores require complex programming techniques, special instructions which are barely more abstract than their hardware counterparts, and complex interactions with the operating system to handle multiple threads to be distributed across the cores.

In other words, it's much easier to exploit the massively parallel nature of the GPUs than it is to exploit the available parallel computing capabilities of the CPUs. And this is where the two orders of magnitude in performance difference come from: CPUs are rarely used as more than single-core scalar processors.

Still, even when comparing well-designed CPU programs with well-designed GPU programs it's not surprising to see minutes of runtime be reduced to seconds. If you see less than that, you're probably doing something wrong, and if you're seeing much more than that, your CPU program is probably far from being optimal.

The question then becomes: how hard is it to write a well-designed GPU program as opposed to a well-designed CPU program? But this is a question I'll leave for later. For the time being, let's just leave it at: non-trivial.

Up and down (loads)

As previously mentioned, GPUs normally have their own memory, separate from the system memory (this is not true for integrated GPUs, but they deserve a separate paragraph). Therefore, using the GPU involves transferring data to it, and then retrieving the results when the kernel(s) have finished their operation.

The time spent uploading data to the GPU and downloading data from it is not necessarily insignificant: through a PCI express 2.0 ×16 link you can hope for an 8 GB/s transfer rate, but 5 or 6 GB/s are more likely; and this is pretty close to being the top of the line. When compared to the GPU or even the CPU memory bandwidth, this can be a very significant bottleneck.

This, combined with the relatively small amount of memory available on GPUs (less than a gigabyte when GPGPU started, slightly over the gigabyte four years later), poses an interesting paradox on the convenience of GPUs.

On the one hand, GPUs are most convenient when processing large amounts of data in parallel: this ensures, together with well-designed algorithms, that the GPU hardware is used full-scale for an adequate amount of time.

On the other hand, there's a limit to the amount of data you can load at once on the GPU: desktops today are commonly equipped with 4 gigabytes of RAM or more (dedicated workstations or servers can easily go in the tens of gigabytes), so they can typically hold larger amounts of data. The only way to process this on standard desktop GPUs is to do it in chunks, which means uploading the first chunk, processing it, downloading the result, uploading the new chunck, and so on.

Luckily enough, GPUs are typically equipped with asynchronous copy engines, so the situation is not as dreary as it would be. In many cases, especially with modern devices, it is possible to overlap computations and host/device data transfers so as to hide the overhead of the data exchange. This, in fact, is one of the many ways in which GPGPU programming can become complex when optimal performance is sought.

Is it still worth it?

Even if two orders of magnitude may not be possible to achieve without extensive programming efforts to produce extremely optimized code, the simple order of magnitude one can get for most trivially parallelizable problems is most often worth the time necessary to reimplement computationally-heavy code to use GPGPU.

One of the most interesting feature of the shared memory parallel programming approach needed for GPGPU is that, when it can be employed, it's a much more future-proof way of coding than serial execution. The reason is clear: while serial execution can only improve by using faster processors (and there are upper physical limits which are getting closer by the year to how fast a scalar processor can go), parallel algorithms can get faster by ‘just’ adding more computationa units. In theory, a perfectly parallel algorithm will take half the time to run on twice the cores, and while the reality is less ideal, the performance gain is still quite perceptible.

The hidden benefits of GPGPU

In many ways, the most important contribution of GPGPU to the domain of computer science and software engineering has not been the actual performance benefits that a lot of fields have seen from the availability of cheap parallel computing platforms.

There's indeed a much longer-term benefit, that will be reaped over the next years, and it's precisely the shift from serial to parallel programing we just mentioned. Before GPGPU, parallel programming was left to the domain of expensive computational clusters and sophisticated programming techniques; GPGPU has shown that there are huge opportunities for shared-memory parallel programming even on the lower end of the market.

The reborn interest in parallel programming triggered by GPGPU is gradually leading to the development of an entirely new mentality both in terms of software development and hardware realities. Although it is still to be seen how it will ultimately pan out, there are significant signs that we're only starting to scratch the surface of technologies that can revolutionize computing to an extent that could only be compared with the effects of the commercialization of the Internet twenty years ago, and the introduction of the first microcomputers twenty years before that.

GPGPU is bleeding out of the GPU market, in an interesting combination of paradoxical feedbacks and returns. There are people that have implemented ray-tracing using GPGPU: the GPUs go back to their intended task, but through features that were designed to make them usable outside of their domain. At the same time, CPUs gain more powerful parallel computing features, and integrated CPU/GPU solutions bring the GPU more in line with the standard co-processor role marketing wanted to sell GPGPU for.

We are starting to see a convergence in technolgy. At this point, the only danger to the rich potential of this dawning era is the petty commercial interest of companies that would rather see the market collapse under fragmentation than prosper without their dominance.

Let us hope that this won't happen.

Linux and the desktop

Discussing this after the recent debate that involved big names such as Linus Torvalds and Miguel de Icaza may seem a little inappropriate, but I guess I'll have to count this against my usual laziness for writing stuff up when I think it instead of waiting for it to become a fashion.

Introduction

The reason why the topic has recently re-emerged (as it periodically does) has been a write-up by the afore-mentioned Miguel de Icaza, titled What killed Linux on the desktop.

According to the author of the rant, there are two main reasons. The first was the generally disrespect for backwards compatibility, in the name of some mystic code purity or code elegance:

We deprecated APIs, because there was a better way. We removed functionality because "that approach is broken", for degrees of broken from "it is a security hole" all the way to "it does not conform to the new style we are using".

We replaced core subsystems in the operating system, with poor transitions paths. We introduced compatibility layers that were not really compatible, nor were they maintained.

paired with a dismissing attitude towards those for which this interface breakage caused problems. The second problem has been, still according to de Icaza, incompatibility between distributions.

Miguel de Icaza then compares the Linux failure with the success of Apple and its operating system, which apparently did things “the way they should have been done” (my words), highlighting how Mac OSX is a UNIX system where things (audio, video, support for typical content formats) works.

Karma whoring

There's little doubt that “Linux on the desktop” is a hot topic with easy polarizing and bandwagoning. By mentioning three of the most famous problematic areas in the most widely adopted Linux distribution(s) (audio support, video support and support for proprietary audio and video formats), de Icaza basically offered the best bait for “me-too”-ing around the Internet.

And unsurprisingly this is precisely what happened: a lot of people coming up with replies to the article (or referencing the article) with little more than “yeah! exactly! these were exactly the problems I had with Linux!”.

An optimist could notice how many people have reacted this way, combine with those that have reacted the opposite way (“BS, I never had a problem with Linux!”) and be happy about how large the pool of (desktop) Linux users and potential users is. On the other hand, the whole point of the article is to (try and) discuss the reasons why many of these are only potential users, why so many have been driven off Linux despite their attempts at switching over to it.

Linus, Linux and The CADT model

The first point of de Icaza's critique is nothing new. It's what Jamie Zawinski coined the term CADT, Cascade of Attention-Deficit Teenagers, for. However, the way in which de Icaza presents the issue has two significant problems.

One is his use of «we», a pronoun which is somehow supposed to refer to the entire Linux developer community; someone could see it as a diplomatic way of not coming out with the specific names and examples of developers and project that break backwards compatibility every time (which would be ‘bad form’), while at the same time putting himself personally in the number of people that did so.

The other is how he tries to follow the dismissing attitude back to Linus Torvalds, which by position and charisma may be considered the one that «sets the tone for our community», assuming that Linus (and the kernel development community) feeling free to break the internal kernel interfaces even at minor releases somehow give userspace developers the entitlement to do the same about external interfaces.

These two points have sparked a debate in which Linus himself (together with other important Linux personalities) intervened, a debate that has made the news. And the reasons for which the debate sparked is that these two points are among the most critical issues indicating what's wrong with the article. Since in the debate I find myself on the opposite camp of Miguel de Icaza (and, as I found out later, mostly in Linus' camp), I'm going to discuss this in more detail, in a form that is more appropriate for an article than for a comment, as I found myself doing so far.

Kernel, middleware and user space

I'm going to start this explanation with a rough, inadequate but still essential description of the general structure of a modern operating system.

First of all, there's the kernel. The kernel is a piece of software that sits right on top of (and controls and receives signal from) the hardware. It abstracts the hardware from the rest of the operating systems, and provides interfaces to allow other pieces of the operating system to interact with the hardware itself. Linux itself is properly only the kernel, which is why a lot of people (especially the GNU guys) insist on calling it GNU/Linux instead; after all, even Android uses the Linux kernel: it's everything else that is different.

By application one usually means the programs that are executed by the user: web browsers, email clients, photo manipulation programs, games, you name it. These user space applications, which is what users typically interact with, don't usually interact directly with the kernel themselves: there's a rather thick layer of libraries and other programs that ease the communication between user space applications and the kernel. Allow me to call this layer ‘middleware’.

Example middleware in Linux and similar systems includes the first program launched by the kernel when it finished loading (typically init), the C library (libc, in Linux often the one written by the GNU project) and things that manage the graphical user interface, such as the X Window System (these days typically provided by the X.org server in Linux).

All the components of the various layers of the operating system must be able to communicate with each other. This happens through a set of interfaces, which are known as APIs (Application Programming Interfaces) and ABIs (Application Binary Interfaces), some of which are internal (for example, if a kernel module has to communicate with something else inside the kernel, it uses an internal kernel API) while others are external (for example, if the C library needs to communicate with the kernel, it does so using an external kernel API).

Interface stability and application development

Let's say that I'm writing a (user space) application: a photo manipulation program, an office suite, whatever. I'm going to develop it for a specific operating system, and it will be such a ‘killer app’ that everybody will switch to that operating system just for the sake of using my application.

My application will use the external interfaces from a number of middleware libraries and applications (for example, it may interface with the graphics system for visualization, and/or with the C library for file access). My application, on the other hand, does not care at all if the internal interfaces of the kernel, or of any middleware component, change. As long as the external interfaces are frozen, my application will run on any future version of the operating system.

A respectable operating system component never removes an interface: it adds to them, it extends them, but it never removes them. This allows old programs to run on newer versions of the operating system without problems. If the developers think of a better way to do things, they don't change the semantics of the current interface; rather, they add a new, similar interface (and maybe deprecate the old one). This is why Windows APIs have call names with suffixes such as Extended and whatever. This is why we still have the (unsafe) sprintf alongside the (safe) snprintf in the POSIX C library specification.

Let me take the opportunity to highlight two important things that come from this.

One: the stability of internal interfaces is more or less irrelevant as far as user space applications are concerned. On the other hand, stability of external interfaces is extremely important, to the point that it may be considered a necessary condition for the success of an operating system.

Two: it may be a little bit of a misnomer to talk about interface stability. It's perfectly ok to have interfaces grow by adding new methods. What's important is that no interface or method is removed. But we'll keep talking about stability, simply noting that interfaces that grow are stable as long as they keep supporting ‘old style’ interactions.

Are Linux interfaces stable?

Miguel de Icaza's point is that one of the main reasons for the failure of Linux as a desktop operating system is that its interfaces are not stable. Since (as we mentioned briefly before) interface stability is a necessary condition for the success of an operating system, his reasoning may be correct (unstable interfaces imply unsuccessful operating system).

However, when we start looking at the stability of the interfaces in a Linux environment we see that de Icaza's rant is misguided at best and intellectually dishonest at worst.

The three core component of a Linux desktop are the kernel, the C library and the X Window System. And the external interfaces of each of these pieces of software are incredibly stable.

Linus Torvalds has always made a point of never breaking user space when changing the kernel. Although the internal kernel interfaces change at an incredible pace, the external interface is a prime example of backwards compatibility, sometimes to the point of stupidity. { Link to round table with Linus mentioning examples of interfaces that should have never be exposed or had issues, but were still kept because programs started relying on the broken behavior. }

A prime example of the interface stability is given by the much-critiqued sound support, which is an area where the Linux kernel has had some drastic changes over time. Sound support was initially implemented via the ironically-named Open Sound System, but this was —not much later— replaced by the completely different Advanced Linux Sound Architecture; yet OSS compatibility layers, interfaces and devices have been kept around since, to allow old applications using OSS to still run (and produce sound) on modern Linux versions.

This, by the way, explains why Linus was somewhat pissed off at de Icaza in the aforementioned debate: if developers in the Linux and open source worlds had to learn anything from Linus, it would have been to never break external interfaces.

Another good example in stability is given by the GNU C Library. Even though it has grown at an alarming pace, its interface has been essentially stable since the release of version 2, 15 years ago, and any application that links to libc6 has forward-compatibility essentially guaranteed, modulo bugs (for example, the Flash player incorrectly used memcpy where they should have used memmove, and this resulted in problems with audio in Flash movies when some optimizations where done to the C library; this has since been fixed).

But what is the most amazing example of stability is the X Window System. This graphical user interface system is famous for having a client/server structure and being network transparent: you can have ‘clients’ (applications) run on a computer and their user interface appear on another computer (where the X server is running). X clients and server communicate using a protocol that is currently at version 11 (X11) and has been stable for 25 years.

The first release of the X11 protocol was in 1987, and an application that old would still play fine with an X11 server of today, even though, of course, it wouldn't be able to exploit any of the more advanced and sophisticated features that the servers and protocol have been extended with. The heck, Linux didn't even exist 25 years ago, but X.org running on Linux would still be perfectly able to support an application written 25 years ago. How's that for stability?

If the three core components of a Linux desktop operating system have been so stable, why can Miguel de Icaza talk about “deprecating APIs”, “removing functionality” and “replacing core subsystem”, and still be right? The answer is that, of course, there have been some very high-profile cases where this has happened.

The prime example of such developer misbehavior is given by GNOME, a desktop environment, something that sits on top of the graphical subsystem of the operating system (X, in the Linux case) and provides a number of interfaces for applets and applications to present a uniform and consistent behavior and graphical appearance, and an integrated environment to operate in.

Applications can be written for a specific desktop environment (there are more than one available for Linux), and for this it's important for the desktop environment (DE, for short) to provide a stable interface. This has not been the case with GNOME. In fact, the mentioned CADT expression was invented specifically for the way GNOME was developed.

We can now start to see why Linus Torvalds was so pissed off at Miguel de Icaza in the mentioned debate: not only the Linux kernel is one of the primary examples of (external) interface stability, so trying to trace CADT to Linus is ridiculous, but GNOME, of which Miguel de Icaza himself has been a prominent figure for a long time, is the primary example of interface instability.

The «we» Miguel uses to refer to the open source and Linux community as a whole now suddenly sounds like an attempt to divert the blame for a misbehavior from the presenter of the argument itself to the entire community, a generalization that has no basis whatsoever, and that most of all can't call for Linus as being the exemplum.

Ubuntu the Breaker

Of course, the GNOME developer community is not the only one suffering from CADT, and in this Miguel is right. Another high-profile project that has had very low sensitivity to the problem of backwards compatibility in the name of “the new and the shiny” is Ubuntu.

This is particularly sad because Ubuntu started with excellent premises and promises to become the Linux distribution for the ‘common user’, and hence the Linux distribution that could make Linux successful on the desktop. And for a few years, it worked really hard, and with some success, in that direction.

But then something happened, and the purpose of Ubuntu stopped being to provide a solid desktop environment for the ‘common user’, and it started being the playground for trying exciting new stuff. However, the exciting new stuff was brought forward without solid transition paths from the ‘old and stable’, with limited if any backwards compatibility, and without any solidification process that would lead the exciting new stuff to actually be working before gaining widespread usage.

This, for example, is the way PulseAudio was brought in, breaking everybody's functioning audio systems, and plaguing Ubuntu (and hence Linux) with the infamous idea of not having a working audio system (which it still has: ALSA). Similar things happened with the other important subsystems, such as the alternatives to the System V init traditionally used (systemd and upstart); and then with the replacement of the GNOME desktop environment with the new Unity system; And finally with the ‘promise’ (or should we say threat) of an entirely new graphical stack, Wayland, to replace the ‘antiquate’ X Window System.

It's important to note that none of these components are essential to a Linux desktop system. But since they've been forced down the throat of every Ubuntu user, and since Ubuntu has already gained enough traction to be considered the Linux distribution, a lot of people project the abysmal instability of recent Ubuntu developments onto Linux itself. What promised to be the road for the success of Linux on the desktop became its worst enemy.

Common failures: getting inspiration on the wrong side of the pond

There's an interesting thing common to the persons behind the two highest-profile failures in interface stability in the Linux world: their love for proprietary stuff.

Miguel de Icaza

Miguel de Icaza founded the GNOME project (for which we've said enough bad things for the moment), but also the Mono project, an attempt to create an open-source implementation of the .NET Framework.

His love for everything Microsoft has never been a mystery. Long before this recent rant, for example, he blamed Linus for not following Windows example of a stable (internal) kernel ABI. At the time, this was not because it ‘set the wrong example’ for the rest of the community, but because it allegedly created actual problems to hardware manufacturer that didn't contribute open source drivers, thereby slowing down Linux adoption due to missing hardware support.

As you can see, the guy has a pet peeve with Linus and the instability of the kernel ABI. When history proved him wrong, with hardware nowadays gaining Linux support very quickly, often even before release, and most vendors contributing open source drivers (more on this later), he switched his rant to the risible claim that instability of the kernel ABI set ‘a bad example’ for the rest of the community.

It's worse than that, in fact, since the stability of the Windows kernel ABI is little more than a myth. First of all, there are at least two different families of Windows kernels, the (in)famous Win9x series and the WinNT series. In the first family we have Windows 95, Windows 95 OSR2, Windows 98, Windows ME (that family is, luckily, dead). In the second family we have the old Windows NT releases, then Windows 2000 (NT 5.0), Windows XP (NT 5.1), Windows Vista (6.0), Seven (6.1). And not only are the kernel families totally incompatible with each other, there are incompatibilities even within the same series: I have pieces of hardware whose Windows 98 drivers don't work in any other Win9x kernel, earlier or later, and even within the NT series you can't just plop a driver for Windows 2000 into Windows 7 and hope it'll work without issues, especially if it's a graphic driver.

However, what Windows has done has been to provide a consistent user space API (Win32) that essentially allows programs written for it to run on any Windows release supporting it, be it either of the Win9x family or of the WinNT family.

(Well, except when they cannot, because newer releases sometimes created incompatibilities that broke older Win32 applications, hence the necessity for things such as the “Windows XP” emulation mode present in later Windows releases, an actual full Windows XP install within Windows, sort of like WINE in Linux —and let's not talk about how the new Metro interface in the upcoming Windows 8 is going to be a pain for everybody. We'll talk about these slips further down.)

But WINE and Mono will be discussed later on in more details.

Mark Shuttleworth

Mark Shuttleworth is the man behind Canonical and ultimately Ubuntu. Rather than a Microsoft fan, he comes out more on the Apple side (which is where Miguel de Icaza seems to have directed his attention now too). It's not difficult to look at the last couple of years of Ubuntu transformations and note how the user interfaces and application behavior has changed away from a Windows-inspired one to one to mimics the Mac OSX user experience.

This is rather sad (someone could say ‘pathetic’), considering Linux desktops have had nothing to envy of Mac OSX desktops for a long time: in 2006 a Samba developer was prevented from presenting on his own computer, because it was graphically too much better than what Mac OSX had to offer at the time.

But instead of pushing in that direction, bringing progressive enhancements to the existing, stable base, Ubuntu decided to stray from the usability path and shift towards some form of unstable ‘permanent revolution’ that only served the purpose of disgruntling existing users and reducing the appeal to further increase its user base.

The number of Ubuntu derivatives that have started gaining ground simply by being more conservative about the (default) choice of software environment should be playing all possible alarm bells, but apparently it's not enough to bring Canonical back on the right track.

The fascination with proprietary systems

So why are such prominent figures in the open source figures so fascinated with proprietary operating systems and environments, be it Microsoft or Apple? That's a good question, but I can only give tentative answers to it.

One major point, I suspect, is their success. Windows has been a monopolistically dominant operating system for decades. Even if we only start counting from the release of Windows 95, that's still almost 20 years of dominion. And the only thing that has managed to make a visible dent in its dominance has been Apple's Mac OSX. There is little doubt that Apple's operating system has been considerably more successful than Linux in gaining ground as a desktop operating system.

While there's nothing wrong with admiring successful projects, there is something wrong in trying to emulate them by trying to ‘do as they do’: even more so when you actually fail completely at doing what they really did to achieve success.

Windows' and Mac OSX' success has been dictated (among other reasons which I'm not going to discuss for the moment) thanks to a strong push towards a consistency between different applications, and the applications and the surrounding operating systems. It has never been because of this or that specific aesthetic characteristic, or this or that specific behavior; it has been for the fact that all applications behaved in a certain way, had certain sets of common controls, etc.

This is why both operating systems provide extensive guidelines to describe how applications should look and behave, and why both operating systems provide interfaces to achieve such looks and behavior —interfaces that have not changed with time, even when they have been superseded or deprecated in favour of newer, more modern ones.

Doing the same in Linux would have meant defining clear guidelines for application behavior, providing interfaces to easily follow those guidelines and then keeping those interfaces stable. Instead, what both GNOME (initially under Miguel de Icaza's guide) and Ubuntu (under Mark Shuttleworth's guide) tried to do was try to mimic this or that (and actually worse: first this then that) behavior or visual aspect of either of the two other operating systems, without any well-defined and stable guideline, and without stable and consistent interfaces: they tried to mimic the outcome without focusing on the inner mechanisms behind it.

In the mean time, every other open source project whose development hasn't been dazzled by the dominance of proprietary software has managed to chug along, slowly but steadily gaining market share whenever the proprietary alternatives slipped.

One important difference between dominant environments and underdogs is that dominants are allowed to slip: dominants can break the user experience, and still be ‘forgiven’ for it. Microsoft has done it in the past (Ribbon interface anyone? Vista anyone?), and it seems to be bound to do it again (Metro interface anyone?): they can afford it, because they are still the dominant desktop system; Apple is more of an underdog, and it's more careful about changing things that can affect the user experience, but they still break things at times (not all applications written for the first Mac OSX release will run smoothly —or at all— on the latest one). But the underdogs trying to emulate either cannot afford such slips: if they're going to be incompatible as much as the dominants are, why shouldn't a user stick with the dominant one, after all?

Linux and the desktop

And this leads to the final part of this article, beyond a simple critique to Miguel de Icaza's article. Two important questions arise here. Can Linux succeed in the desktop? And: does it actually matter?

Does it matter?

There has been a lot of talking, recently, about whether the desktop operating system concept itself is bound to soon fall into oblivion, as other electronic platforms (tablets and ‘smart’ phones) raise into common and widespread usage.

There is a reason why the so-called ‘mobile’ or ‘touch’ interfaces have been appearing everywhere: the already mentioned Metro interface in Windows 8 is a bold move into the direction of convergence between desktop and tablet and mobile interfaces; Mac OSX itself is getting more and more similar to iOS, the mobile operating system Apple uses on its iPods and iPads; even in the Linux world, the much-criticized Unity of the latest Ubuntu, and its Gnome Shell competitor, are efforts to build ‘touch-friendly’ user interfaces.

Unsurprisingly, the one that seems to be approaching this transition better is Apple; note that this is not because the Mac OSX and iOS user interfaces are inherently better, but simply because the change is happening gradually, without ‘interface shocks’. And there are open source projects that are acting in the same direction the same way, even though they don't try to mimic the Apple interface particularly.

The most significant example of an open source project that is handling the desktop/touch user interface convergence more smoothly is KDE, a desktop environment that in many ways has often tried (albeit sadly not always successfully) to be more attentive of the user needs. (In fact, I'd love to rant about how I've always thought that KDE would have been a much superior choice to GNOME as default desktop environment for Ubuntu, and about how history has proven me right, but that's probably going to sidetrack me from the main topic of discussion.)

If everything and everyone is dropping desktops right and left and switching to phones and tablets, does it really matter if Linux can become ‘the perfect desktop operating system’ or not?

I believe it does, for two reasons, a trivial one and a more serious one.

The trivial reason is that Linux, in the sense of specifically the Linux kernel, has already succeeded in the mobile market, thanks to Android, which is built on a Linux kernel. I'm not going to get into the debate on which is better, superior and/or more successful between Android and iOS, because it's irrelevant to the topic at hand; but one thing can be said for sure: Android is successful enough to make Apple feel threatened and to let them drop to the most anticompetitive practices and underhanded assaults they can legally (and economically) afford to avert such threat.

But there is a reason why the success of an open Linux system is important: when the mobile and tablet crazes will have passed, people will start realizing that there were a lot of useful things their desktops could do that their new systems cannot do.

They'll notice that they can't just plug a TV to their iPad and watch a legally-downloaded movie, because the TV will not be ‘enabled’ for reproduction. They'll start noticing that the music they legally bought from online music stores will stop playing, or just disappear. They'll notice that their own personal photos and videos can't be safely preserved for posterity.

They will start noticing that the powerful capability of personal computers to flatten out the difference between producer and consumer has been destroyed by the locked-in systems they've been fascinated by.

The real difference between the information technology up to now and the one that is coming is not between desktop and mobile: it's between open and locked computing.

Up until now, this contrast has been seen as being about “access to the source”, ‘proprietary’ software versus open source software. But even the closed-source Windows operating systems allows the user to install any program they want, and do whatever they want with the data; at worst it allowed you to replace the operating system with a different one.

This is exactly what is changing with the mobile market: taking advantage of the perception that a tablet or smartphone is not a computer, vendors have built the systems to prevent users from installing arbitrary software and doing whatever with their data. But the same kind of constraints are also being brought onto the desktop environment. This is where the Mac OSX market comes from, this is why Microsoft is doubling their efforts to make Windows 8 unreplaceable on hardware that wants to be certified: Secure Boot is a requirement on both mobile and desktop system that want to claim support for Windows 8, and on the classical mobile architecture (ARM), it must be implemented in such a way that it cannot be disabled.

Why this difference between ARM and non-ARM? Because non-ARM for Windows means the typical Intel-compatible desktop system, and this is where the current Linux distributions have waged wars against the Secure Boot enforcement.

And this is specifically the reason why it's important for an open system to be readily available, up to date and user-accessible: it offers an alternative, and the mere presence of the alternative can put pressure on keeping the other platforms more open.

And this is why the possibility for Linux to succeed matters.

Can Linux succeed?

From a technical point of view, there are no significant barriers for a widespread adoption of Linux as a desktop operating system. The chicken and egg problem that plagued it in the beginning (it doesn't have much support, so it doesn't get adopted; it's not adopted, so it doesn't get much support), in terms of hardware support, has long been solved. Most hardware manufacturer acknowledge its presence, and be it by direct cooperation with kernel development, be it by providing hardware specifications, be it by providing closed, ‘proprietary’ drivers, they allow their devices to be used with Linux; even though Linux is far from being the primary target for development, support for most hardware comes shortly if not before the actual hardware is made available.

There are exceptions, of course. NVIDIA, for example, is considered by Linus Torvalds the single worst company they've ever dealt with, due to their enormous reticence in cooperating with open source. The lack of support (in Linux) for the Nvidia Optimus dual-card system found in many modern laptops is a result of this attitude, but Linus' publicity stunt (“Fuck you, Nvidia!”) seems to have moved things in the right direction, and Nvidia is now cooperating with X.org and kernel developers to add Optimus support in Linux.

In terms of software, there are open source options available for the most common needs of desktop users: browsers, email clients, office suites. Most of these applications are in fact cross-platform, and there are versions available also for Windows and Mac OSX, and the number of people using them on those operating systems is steadily growing: for them, a potential transition from their current operating system to Linux will be smoother.

Some more or less widespread closed-source applications are also available: most notably, the Skype VoIP program (even though its recent acquisition by Microsoft has been considered by some a threat for its continuing existence in Linux) and the Opera web browser.

The WINE in Linux

There are, however, very few if any large commercial applications. A notable exception is WordPerfect, for which repeated attempt were made at a Linux version. Of the three attempts (version 6, version 8, and version 9), the last is a very interesting one: rather than a native Linux application, as was the case for the other two versions, Corel decided to port the entire WordPerfect Office suite to Linux by relying on WINE, an implementation of the Win32 API that tries to allow running Windows program under Linux directly.

The choice of using WINE rather than rewriting the applications for Linux, although a tactically sound strategy (it made it possible to ship the product in a considerably short time), was considered by many a poor choice, with the perception that it was a principal cause in the perceived bloat and instability of the programs. There are however two little-known aspects of this choice, one of which is of extreme importance for Linux.

First of all, WordPerfect Office for Linux was not just a set of Windows applications that would run under WINE in an emulated Windows environment: the applications where actually recompiled for Linux, linking them to Winelib, a library produced by the WINE project specifically to help port Windows applications to Linux. The difference is subtle but important: a Winelib application is not particularly ‘less native’ to Linux than an application written specifically to make use of the KDE or GNOME libraries. Of course, it will still look ‘alien’ due to its Windows-ish look, but no less alien than a KDE application on a GNOME desktop or vice versa, especially at the time (2000, 2001).

The other important but little-known aspect of Corel's effort is that it gave an enormous practical push to the WINE project. At the time when the port was attempted, the WINE implementation of the Win32 API was too limited to support applications as sophisticated as those of the WordPerfect Office suite, and this led Corel to invest and contribute to the development of WINE. The result of that sponsorship are quite evident when the status of WINE before and after the contribution is considered. Since Corel was trying their own hand at distributing Linux itself with what was later spun off as Xandros, the improved WINE was to their benefit more than just for the ability to support the office suite.

In the Linux world, WINE is a rather controversial project, since its presence is seen as an obstacle to the development of native Linux applications (which in a sense it is). However, I find myself more in agreement with the WINE developers, seeing WINE as an opportunity for Linux on the desktop.

It's not hard to see why. Desktop users mostly don't care about the operating system; they could be running PotatoOS for all they care, as long as it allows them to do what they want to do, and what they see other people doing. What users care about are applications. And while it's undoubtedly true that for many common applications there are open source (and often cross-platform) alternatives that are as good when not better of the proprietary applications, there are still some important cases where people have to (or want to) use specific applications which are not available for Linux, and possibly never will. This is where WINE comes in.

Of course, in some way WINE also encourages ‘laziness’ on the part of companies that don't want to put too much effort in porting their applications to Linux. Understandably, when Linux support is an afterthought it's much easier (and cheaper) to rely on WINE than to rewrite the program for Linux. And even when getting started for the first time, it might be considered easier to write for Windows and then rely on WINE that approaching cross-platformness with some other toolkit, be it using Qt, whose licensing for commercial applications makes it a pricey option, be it using GTK, whose Windows support is debatable at best, be it using wxWidgets, one of the oldest cross-platform widgets, or any less-tried option. In some sense, the existence of WINE turns Win32 into a cross-platform API, whose Windows support just happens to be extremely superior to that of other platforms.

It's interesting to observe that when LIMBO was included in the cross-platform Humble Indie Bundle V, it caused a bit of an uproar because it wasn't really cross-platform, as it relied on WINE. Interesting, Bastion, that builds on top of the .NET Framework (and thus uses the aforementioned Mono on Linux), didn't cause the same reaction, despite being included in the same package. Yet, to a critical analysis, an application written for the .NET Framework is not any more native to Linux than one written for the Win32 API.

If anything, the .NET Framework may be considered “not native” in any operating system; in reality, it turns out to be little more than a different API for Windows, whose theoretical cross-platformness is only guaranteed by the existence of Mono. It's funny to think that Mono and its implementation of the .NET Framework is seen in a much better light than WINE and its implementation of the Win32 API, even though under all respects they are essentially the same.

The lack of commercial applications

In some way, what Miguel de Icaza's rant tried to address was specifically the problem of the missing commercial applications, on the basis that no applications implies no users, and therefore no success on the desktop market. While the instability of the interfaces of some high-profile environments and the multitude of more or less (in)compatible distributions are detrimental and discouraging for commercial developers, the overall motivations are much more varied.

There is obviously the software chicken and egg problem: Linux doesn't get widespread adoption due to the lack of applications, applications don't support Linux because it doesn't have widespread adoption.

Another important point is the perceived reticence of Linux users to pay for software: since there are tons of applications available for free, why would Linux users think about buying something? Windows and Mac OSX users, on the other hand, are used to paying for software, so a commercial application is more likely to be bought by a Windows or Mac user than by a Linux user; this further reduces the relevance of the potential Linux market for commercial interests.

This line of reasoning is quite debatable: the Humble Indie Bundle project periodically packages a number of small cross-platform games which users can buy by paying any amount of their choice, and the statistics consistently show that even though there are significantly more bundles sold for Windows than for Linux, the average amount paid per platform is distinctly higher in Linux than in Windows: in other words, while still paying what they want, Linux users are willing to pay more, on average, which totally contradicts the perception about Linux users and their willingness to pay.

There's more: if piracy is really as rampant in Windows (starting from the operating system itself) as many software companies want us to believe, one should be led to think that Windows users are not particularly used to pay: rather, they are used to not paying for what they should be paying, in sharp contrast with Linux users who would rather choose applications and operating systems they don't actually have to pay for. In a sense, choosing free software rather than pirated one becomes an indicator of user honesty, if anything. But still, the perception of Linux users as tightwads remains, and hinders the deployment of applications for the platform.

It's only at this point, in third place so to say, that technical reasons, such as the instability of interfaces or the excessive choice in distributions and toolkits, become an obstacle. Should we target Linux through one of the existing cross-platform toolkits, or should we go for a distinct native application? Should this be targeting a specific desktop environment? And which toolkit, which desktop environment should be selected?

However, the truth is that these choices are not really extremely important. For example, Skype simply opted for relying on the Qt toolkit. Opera on the other hand, after various attempts, decided to go straight for the least common denominator, interfacing directly with Xlib. And of course, for the least adventurous, there's the possibility to go with WINE, in which case just contributing to WINE to help it support your program might be a cheaper option than porting your program to Linux; this, for example, is the way Google decided to go for Picasa.

Finally, of course, there are applications that will never be ported to Linux, Microsoft Office being a primary example of this. And for this there is sadly no direct hope.

Pre-installations

There is one final issue with Linux on the desktop, and this is pre-installation. Most users won't go out of their way to replace the existing operating system with some other, because, as already mentioned, users don't usually care about the operating system.

This is the true key for Linux to the desktop: being the default operating system on machines as they are bought. However, none of the major desktop and laptop computer companies seem particularly interested in making such a bold move, or even make an official commitment at having full Linux support for their hardware.

One notable exception in this has been Asus, whose Eee PC series initially shipped with Linux as the only option for the operating system, even though later strong-arming from Microsoft led to it shipping both Linux and Windows machines (with the Windows ones having inferior technical specifications to comply with Microsoft's request that Windows machine shouldn't cost more than Linux ones).

It's interesting, and a little sad, that there are vendors that sell desktops, laptops and servers with Linux preloaded (see for example the Linux Preloaded website). The question is: why don't major vendors offer it as an option? And if the do, why don't they advertise the option more aggressively?

I doubt it's a matter of instability. It wouldn't be hard for them to limit official support to specific Linux distributions and/or versions that they have verified to work on their hardware: it wouldn't be different than the way they offer Windows as an option. And this makes me suspect that there's something else behind it; is Microsoft back to their tactics of blackmailing vendors into not offering alternatives?

Get Rid Of CD Images

A recent message on the Ubuntu mailing list proposing to drop the “alternate CDs” for the upcoming release of Ubuntu (12.10) had me thinking: why are modern Linux distributions still packaged for install in the form of CD images?

There are obvious historical reasons for the current situation. CDs and DVDs are the most common physical support to distribute operating systems, and Linux is no exception to this. Before broadband Internet got as widespread as it is today, the only or the cheapest way to get Linux was to buy an issue of a Linux magazine, as they commonly offered one or more CDs with one or more Linux distributions to try out and/or install.

Today, “install CDs” are still the most common, or one of the most common ways, to do a clean Linux install, even though one doesn't usually get them from a Linux magazine issue, but rather by downloading a CD image (.iso) from the Internet and then either burning it on an actual CD or putting it on a USB pendrive. In fact, I'd gather that using a pendrive is more common than actually burning a CD these days, and there's even a website dedicated to the fine art on using a pendrive to hold one or more install images for a number of different Linux distributions.

In contrast to Windows installation media, Linux install images, be they on CD or on pendrives, are useful beyond the plain purpose of installing the operating system: most of them are designed in such a way that they can also be used ‘live’, providing a fully-functional Linux environment that can be used, for example, as a rescue system (there are, in fact, images that are tuned for this), or just to try out the latest version of the operating system without actually installing anything (useful to test hardware compatibility or whatnot).

This is actually one more reason why pendrives are a superior option to CDs. Not only they are usually faster, and being rewritable can be reused when a new install comes out (and you are not left with hundreds of old CDs laying around): since they are not read-only they can be used to store permament data (and customizations) on the pendrive when the image is used as a ‘live’ system.

Finally, CDs (and CD images) have obvious, significant size constraints: This, for example, is one of the reasons why Debian, from which Ubuntu is derived, recently replaced with XFCE as the default desktop environemtn: the former was too large to fit on the (first) CD.

There is in fact one and only one reason I can think of why a CD would be preferrable to a pendrive, and that's when you want to install Linux on a machine that is so old that its BIOS doesn't allow booting from USB disks (or on a machine that doesn't have USB ports at all). So why would a modern Linux distribution (that probably won't even run on such an old system) still design its install images around CDs?

I suspect that, aside from legacy (“we've always done it like this”, so habit, toolchains, and general knowledge is geared towards this), one of the most important reasons is simplicity: a user can download an .iso image, double-click on the downloaded file and, whatever their current operating system (Windows, Mac OS X, Linux), they will be (probably) offered the option to burn the image on CD. That is, of course, assuming your computer does have a CD or DVD burner, which a surprisingly high number of more recent computers (esp. laptops) don't actually have.

By contrast, using a pendrive requires tools which are usually not as readily available on common operating systems (and particularly on Windows), or available but non-trivial to use (for example the diskimage utility on Mac OS X). Compare for example the instructions for creating a USB image for Ubuntu, and compare them with [the ones about CD installs]. Or consider the existence of websites such as the aforementioned PenDriveLinux, or LiLi (the latter being geared specifically towards the set up of live systems on USB keys).

There is also the matter of price: blank CDs and DVDs are still cheaper, per megabyte, than a USB flash disk, with prices that hover around 10¢ per CD and 50¢ per DVD. By contrast, the cheapest 1GB USB flash drives go for 1€/piece when bought in bulk: much more expensive for physical mass distribution.

The situation is a little paradoxical. On the one hand, USB would offer the chance to reduce the number of images, but on the other hand it would make downloads heavier, making physical distribution more convenient.

Consider this: a distribution like Ubuntu currently makes available something between 5 and 10 different installation image: there's the standard one, the alternative text-based (which they want to get rid of), the server one and the ‘Cloud’ one, for each of the available architectures (Intel-compatible 32-bit and 64-bit at least, without counting the ones that are also available for ARM); and then there's the localized DVD images.

A CD image is between 200 and 300MB smaller than the smallest USB flash drive that can hold it (CDs hold about 700MB, USB keys come in either 512MB or 1GB sizes); since CD images often share most of the data (especially if they are for the same architecture), an image could be specifically prepared to fit a 1GB USB disk and still be equivalent to two or three different images, and maybe even more, for example offering the functionality of both the standard and alternative CDs, or the server and Cloud images; I wouldn't be surprised if all four of them could fit in 1GB, and I'm sure they can all fit in a 2GB image.

Of course, this would mean abandoning the CD as a physical media for physical distribution, since such an image could only be placed on a USB disk or burned on DVD. It would also mean that downloading such an image would take more time than before (at the expense of those who would be satisfied with the feature of a single disk, but in favour of those that found themselves needing two different images).

If I remember correctly, Windows 98 was the first PC operating system that required a CD drive, since it shipped in the form of a floppy plus bootable CD (in contrast to the ridiculously large amount of floppies its predecessor came in). Modern Windows, as far I know, come in DVD format. I think it's time for modern Linux distributions to go beyond plain CDs as well, but in a smarter way than just “ok, we'll go to DVDs”.

I know that Linux can be made to run in the stranges of places, and it has been often used to revive older systems. But the question is: does the particular distribution you're still designing CD installation images for actually support such old systems? As an example, I think of two tightly related, but still significantly different, Linux distributions: Debian and Ubuntu. It may make sense to have a CD installation image for Debian, but I really don't see why it would be useful to have one for Ubuntu. Why?

Ubuntu is designed to be easily installable by the most ignorant user, its installer is heavily designed around graphic user interfaces, and the desktop environment it installs requires a decent video card: can a user without particular knowledge really install it (from CD) and use it on a system that doesn't have a DVD drive and doesn't support for booting from USB disks? I doubt it.

On the other hand, Debian is a ‘geekier’ distribution, it has a less whizzy installer, and by default it installs a much lighter desktop environment, if you really want to install one. You're much more likely to have success in running it on older hardware ‘out of the box’, or with minimal care during installation. A CD image to install this on such a system is a sensible thing to have; even a floppy disk to allow installation on systems that have a CD drive but can't boot from it would be sensible for Debian. For Ubuntu? Not so much.

Now, Ubuntu development these days seems to be all about revolutionizing this or that part of the user experience or the system internals: Ubiquity, Unity, Wayland, you name it, often with debatable results. Why don't they think about revolutionizing the installation media support instead? (Actually, I think I know: not enough visual impact on the end user.)

Printers, costs and content distribution

I've recently come across a rant about printers. The rant mixes in a number of complaints ranging from technology (e.g. paper jamming) to economics (e.g. the cost of ink cartridges). Since it's a rant, it should probably not be considered too serious, but it does raise some interesting points that I'd like to discuss. Of course, I'm going to skip on the obviously ridiculous complaints (such as the need to get out of bed), and go straight to the ones worth answering.

Let's start easy.

The first complaint is about all the stuff that has to be done before you can actually get to printing, such as having to plug the printer into the laptop and wait for startup times and handshakes and stuff like that.

This is actually an easy matter to solve, and that has been addressed by people that replied to the rant: get a network printer. The printer we have at home is not networked, but we have a small server, and the printer is plugged there and shared to all computers our small home network.

The paper jam complaint also doesn't carry much weight. As far as I know, even professional printing equipment jams from time to time, although of course miniaturized mechanisms such as the ones found on desktop printers are bound to jam more easily. The frequency of jams could probably be reduced by using better materials, but I doubt they could be eliminated altogether.

The biggest complain is, obviously, about ink cartridges and their cost (which the author of the rant hyperbolicly, but not even that much, states as being ten times more than diesel fuel). The author of the rant also remarks (correctly) that this is

how Epson, Lexmark, Canon, Brother et al. make money: They make shitty low-end printers the break easily so they need to be regularly replaced and make the ink cost ten times more than diesel fuel (a hyperbole that is close to accurate, btw) so they can have a steady flow of cash from those printers that do work.

Except that printer manufacturers don't make shitty low-end printers only; they also have mid-range and high-end products. So one would be taken to wonder, if the author of the rant is aware of this, why doesn't he invest a little more money in getting a better product that is going to give him less problems?

Getting a complex piece of hardware like a printer for less than 50€ (including the famous “initial cartridges”) and expecting it to last forever, with cheap consumables to go with it, is naive at best: the cheap printers are obviously sold at a loss, with money recouped by the sale of consumables. It's up to the buyer to be aware (and wary) of the mechanism, and choose accordingly, because there are alternatives; of course, they do require a higher initial investment.

The author of the rant goes on to compare the printer matter with video rental:

It’s bullshit the way Blockbuster was bullshit with late fees and poor customer service and high rental prices and yearly membership fees. Remember how Netflix and it’s similar services worldwide practically destroyed them? I really want some hipster engineer at Apple or Microsoft or anywhere to make a printer that Netflixes the fuck out of the consumer printing market.

The comparison, I'm afraid, is quite invalid. I'm not going to discuss the Blockbuster vs Netflix thing in detail, although I will mention that the Blockbuster model was not ‘bullshit’ (remember, you have to compare renting a DVD with buying a movie or going to the cinema; also, at least in my country Blockbuster had no membership fees, although the late fees were outrageous): it has been, however, obsoleted by the Netflix model.

The chief difference between the Blockbuster and Netflix models? The difference is that Blockbuster was on demand, while Netflix is a subscription. But is a subscription model intrinsically superior to an on-demand model?

The answer is, I'm afraid, no. The convenience (for the user) of one model over the other is entirely related to the cost ratio of the two services compared to the amount of service ‘use’ they get. If I only watch a movie once every two or three months, I'm much better off renting stuff one-shot when needed and then forget about shelling any money for the rest of my time. (Of course, most people watch way more movies per month, hence a subscription is usually better.)

Now let's think about this for a moment: the way Netflix disrupted Blockbuster was by offering a subscription service that was more convenient than the on-demand service offered by Blockbuster, for a lot of people.

The question is: would such a model really be applicable to printing? How would it even work? For 10 bucks a month you get a new (used) printer delivered to your door?

In fact, I really think that the current cheap printing business is the closest you can get to a subscription model, considering the enormous differences that exist between simple content distribution (which is what Blockbuster and Netflix do) and the production of complex devices such as printers.

In other words, the “Netflix revolution” has already happened in the printing business, except that it went in totally the opposite way for the consumer, while still being extremely profitable for the provider (as most subscription models are).

So what the author of the rant should probably aim at is to break out of the subscription model and go back to something more on demand. Can this be achieved?

There are many ways to work in that direction.

The cheaper way is to make heavy use of the various DIY cartridges refill kits, or referring to the knock-off or ‘regenerated’ cartridges instead of buying the official ones. However, most people probably know by experience that the quality of prints degrades in this case, something which in my opinion shows how there is actually a reason why ink cartridges are expensive, regardless of how overcharged they are.

Another, possibly smarter but more expensive (especially in the short run) solution has already been mentioned: don't buy a 30€ printer every six months; buy a mid-range printer and save in the long run. You can get professional or semi-professional color laser networked multifunction (including fax/scan capability) printers for around 500€. If you're willing to sacrifice on wireless and fax, even less than that. The toner cartridges aren't that more expensive than the ink ones (esp. in terms of price/pages), and they are much more durable (no more throwing away a cartridge because you didn't use the printer for a month and the ink dried up).

And finally, Print On Demand. You send them the file, they mail you the printed stuff. I've always been curious about this particular kind of service, and I see it making a lot of sense for some very specific cases. But I doubt the domestic use cases the author of the rants probably based his rant on would fit in this.

High resolution

When I got my first laptop about 10 years ago, I wanted something that was top of the line and could last several years with a modicum of maintenance. I finally opted for a Dell Inspiron 8200 that I was still using in 2008, and whose total price, considering also the additional RAM, new hard disk, replacement batteries and replacement cooling fans I opted or had to buy the course of that long period, was in the whereabouts of 3k€, most of which was the initial price.

One of the most significant qualities of that laptop, and the one thing that I miss the most still today, was the display, whose price accounted for something like half of the money initially spent on the laptop. We're talking about a 15" matte (i.e. non-glossy) display at a 4:3 aspect ratio, with 1600×1200 resolution (slightly more than 133 dots per linear inch), 180 cd/m² brightness at maximum settings. In 2002.

When six years later I had to get a new one for work reasons (the GeForce 2 Go on that machine was absolutly not something you could use for scientific computing, whereas GPGPU was going to be the backbone of all of my future work), I was shocked to find out there was no way to get my hands on a display matching the quality of what I was going to leave.

This was not just a matter of price: there were simply no manufacturer selling laptops with a high-quality display like the one of my old Dell Inspiron 8200. All available displays had lower resolution (110, 120 dpi top) and the only one that provided matte displays was Apple (that offered the option with a 50€ addition to the price tag).

I can tell you that dropping from the aptly named TrueLife™ Dell display to the crappy 1280×800 glossy display featured on my current laptop (an HP Pavilion dv5, if anybody is interested) was quite a shock. Even now, four years after the transition, I still miss my old display; and how could I not, when I see myself mirrored on top of whatever is on screen at the time, every time I'm not in perfect lighting conditions, i.e. most of the time?

Not that the desktop (or stand-alone) display was going any better: somehow, with the ‘standardization’ of the 1920×1080 as the ‘full HD’ display resolution (something that possibly made sense in terms of digital TV or home-theater digital media like DVD or BluRay discs, but is purely meaningless otherwise), all monitor manufacturer seemed to have switched to thinking ‘ok, that's enough’.

The dearth of high-resolution, high-quality displays has always surprised me. Was there really no market for them (or not enough to justify their production), or was there some other reason behind it? What happened to make the return-on-investment for research in producing higher quality, higher density, bright, crisp displays too low to justify ‘daring’ moves in that direction? Did manufacturers switch to just competing in a downwards spiral, going for cheaper components all around, trying to scrape every single possible cent of margin, without speding anything in research and development at all?

Luckily, things seem to be changing, and the spark has been the introduction of the Retina display in Apple products.

The first time I heard the rumors about a Retina display being available on the future MacBook Pro, I was so happy I even pondered, for the first time in my life, the opportunity to get an Apple product, despite my profound hate and despise for the company overall (a topic which would be off-topic here).

My second, saner thought, on the other hand, was a hope that Apple's initiative would lead to a reignition of the resolution war, or at least push the other manufacturers to offer high-resolution displays again. Apparently, I was right. Asus, for example, has announced a new product in the Transformer line (the tablets that can be easily converted into netbooks) with a 220 dpi resolution; Dell has announced that HD displays (1920×1080) on their 15.4" products will be available again.

How I wish Dell had thought about that four years ago. My current laptopt would have been a Dell instead of an HP. Now however, as the time for a new laptop approaches, I've grown more pretentious.

How much can we hope for? It's interesting to note that the Apple products featuring the higher resolution displays actually have a decreasing pixel density, with the iPhone beating the iPad beating the just-announced MacBook Pro. This should not be surprising: when the expected viewing distance between user and device is taken into consideration, the angular pixel density is probably the same across devices (see also the discussion about reference pixels on the WWW standard definition of CSS lengths).

However, as good as the Retina is, it still doesn't match the actual resolution of a human retina. Will the resolution war heat up to the point where devices reach the physical limits of the human eyes? Will we have laptop displays that match printed paper? (I don't ask for 600 dpi or higher, but I wouldn't mind 300 dpi).

An Opera Requiem?

Opera has been my browser of choice for a long time. As far as I can recall, I started looking at it as my preferred browser since version 5, the ad-supported version, released around the turn of the millennium. It wasn't until two or three versions later, however, that I could stick to it as my primary browser, given the number of websites that had troubles with it (some due to bugs in Opera, but most often due to stupidity from the web designer and their general lack of respect for the universality of the web).

While a strong open-source supporter, I've always considered myself a pragmatist, and the choice for Opera as my preferred browser (together with other software, such as WordPerfect, which is off topic here) has always been a clear example of this attitude of mine: when a close-source proprietary software is hands-down superior to any open-source alternative, I'd rather use the closed-source software, especially when it's free (gratis).

One of the reasons why I've always preferred Opera is that it has been the pioneer, when not the inventor, of many of the web technologies and user interface choices that are nowadays widely available in more popular browsers. I've often found myself replying with “oh, they finally caught up with features Opera has had for years?” when people started boasting of “innovative” features like tabs being introduced in browsers such as Firefox.

Opera was the first browser to have decent CSS support, and except for brief moments in history, has always been leading in its compliance to the specification (to the point of resulting in ‘broken’ rendering when CSS was written to fit the errors in other, more common, implementations).

Opera has sported a Multiple Document Interface (a stripped-down version of which is the tabbing interface exposed by all major browsers today) since its earlier release.

Opera has had proper support for SVG as an image format before any other browser (still today, some browsers have problems with it being used e.g. as a background image defined by CSS).

Opera has a much more sophisticated User JavaScript functionality than that offered by the GreaseMonkey extension in Firefox. (This is partly due to the fact that the same technology is used to work around issues in websites that autodetect Opera and send it broken HTML, CSS or JavaScript.)

In fact, Opera hasn't had any support for extensions until very recently, since it managed to implement most of the features provided by extensions to other browsers, in a single package, while still remaining relatively lightweight in terms of resource consumption and program size.

When the Mozilla suite was being re-engineered into separate components (Firefox the browser, Thunderbird for mail and news, etc) in the hopes of reducing bloat, Opera managed to squeeze all those components in a single application that was smaller and less memory-hungry than anything that ever came out of Mozilla.

When Firefox stopped supporting Windows 98, Opera could still run (even if just barely) on a fairly updated Windows 95 on quite old hardware. When Firefox started having troubles being built on 32-bit systems, Opera still shipped a ridiculously large amount of features in an incredibly small package. Feed discovery? Built-in. Navigation bar? Built-in. Content blocking? Built-in. Developer tools? Built-in.

Opera pioneered Widgets. Opera pioneered having a webserver built in the browser (Opera Unite). I could probably go on forever enumerating how Opera has constantly been one when not several steps ahead of the competition.

Additionally, Opera has long been available on mobile, in two versions (a full-fledged browser as well as a Mini version); and even the desktop version of the browser has the opportunity to render things as if it were on mobile (an excellent feature for web developers). Opera is essentially the only alternative to the built-in browser of the N900 (or more in general for Maemo). Opera is also the browser of the Wii, and the browser present in most web-enabled television sets.

Despite its technical superiority and its wide availability, Opera has never seen a significant growth in user share on the desktop, consistently floating in the whereabouts of a 2% usage worldwide (that still amounts to a few hundred million, possible more than half a billion, users).

I'm not going to debate on the reasons for this lack of progress (aside from mentioning people's stupidity and/or laziness, and marketing), but I will highlight the fact that Opera users can be considered “atypical”. Although I'm sure some of them stick to the browser for the hipster feeling of using something which is oh so non-mainstream, I'd say most Opera aficionados are such specifically because of its technical quality and its general aim towards an open, standard web.

Although the overall percentage of Opera users has not grown nor waned significantly in the last ten years or so, it's not hard to think of scenarios that would cause an en masse migration away from the browser (sadly, there aren't as many scenarios where people would migrate to it).

One of these scenarios is Opera being bought by Facebook, a scenario that may become a reality if the rumor that has been circulating these past days has any merit. I've even been contacted about this rumor by at least two distinct friends of mine, people that know me well as an Opera aficionado and Facebook despiser.

One of them just pointed me to a website discussing the rumor in an email aptly titled “Cognitive dissonance in 3, 2, 1, …”. To me, the most interesting part of that article were the comments, with many current Opera fans remarking how that would be the moment they'd drop Opera to switch to some other browser.

These are exactly my own feelings, in direct contrast with what the other friend of mine claimed with a Nelson-like attitude (“ah-ha, you'll become a ‘facebooker’ even against your will”), pointing to another website discussing the same rumor. But that would be like saying that I'd become a Windows user because of the Nokia/Microsoft deal, while I've just ditched Nokia for good.

In fact, an Opera/Facebook deal would have a lot of similarities with the Nokia/Microsoft deal that has sealed Nokia's failure in the smartphone market. For Facebook (resp. Microsoft), getting their hands on an excellent browser (resp. hardware manufacturer) like Opera (resp. Nokia) is an excellent strategic move to enter a market that they would have (resp. have had) immense troubles penetrating otherwise; for Opera (resp. Nokia), on the other hand, and especially for its users, the acquisition would be (resp. has been) disastruous.

In many ways, Facebook on the web represents the opposite of what Opera has struggled for; where Opera has actively pursued an open, interoperable web based on common standards and free from vendor lock-in, Facebook has tried to become the de facto infrastructure of a ‘new web’, with websites depending on Facebook for logins and user comments, and where the idea itself of websites starts losing importance, when just the “Facebook page” is sufficient. This is essentially the server-side (‘cloud’) counterpart of what Microsoft was done years ago with the introduction of the ActiveX controls and other proprietary ‘web’ features in Internet Explorer.

I'm not going to spend too many words on how and why this is a bad thing (from the single point of failure to the total loss of control over content, its management and freedom of expression), but even just the rumor of Opera, a browser that was actually going in the opposite direction by providing everybody with a personal web server to easily share content with other people from their own machine, is enough to send chills down my spine.

It's interesting to note that Opera has halted development of their Unite platform (as well as their Widgets platform) citing resource constraints (between Unite, Widgets, and the recently introduced Extensions, they did look a little as if they were biting off more than they could chew). And unless there will be Opera extensions developed to match the features formerly provided by the Unite and Widgets platform, their end of life marks a very sad loss for the “more power to the user” strategy that has made Opera such an excellent choice for the cognoscenti.

There are forms of partnership possible between Facebook and Opera that could be designed to benefit both partners without turning up as a complete loss for one at the benefit of the other, in pretty much the same way as Opera has built partnership with many hardware vendors to ship their browser. And since these markets are exactly what Facebook is interested in, I doubt that they would settle for anything less than buying out Opera altogether.

On a purely personal level, I'm going to wait and see this rumor through. If it does turn out to be true, I'll have no second thoughts in considering the Opera I like at its end of life, and I'll start looking into alternatives more in tune with my personal choices of server- and client-side platforms. (And what a step would that be, from actually considering applying for a job at Opera!)

Judging from the comments to the rumor on the Opera fora, I'm not the only Opera user thinking along those lines. One would wonder if Opera would still be worth that much to Facebook after its market share suddenly drops to something around zero, but if the only thing Facebook is interested in is the market penetration obtained by Opera pre-installations on mobile devices and ‘smart’ TV sets, how much would they even care for the desktop users that actually switched to that browser by choice?

{ This article will be updated when the rumor will be definitely confirmed or refuted. }

The CSS filling menu, part 2

This is a pure
CSS challenge:
no javascript,
no extra HTML

Consider the container and menu from the first part of this challenge. The idea now is to find a solution that not only does what was require in the first part, but also allows wrapping around when the natural width of the menu is wider than the width of the container (which, remember is a wrapping container for other content: we don't want the menu to disturb its layout).

Of course, when the menu doesn't fit in a single line, we want each row to fill up the container nicely, and possibly we want the items to be distributed evenly across lines (for example, 4 and 4 rather than 7 and 1 or 6 and 2, or 3 and 3 and 2 when even more lines are necessary).

Can this be achieved without extra HTML markup hinting at where the splits should happen?

The CSS filling menu

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container with a non-fixed width (for example, the CSS wrapping container from another challenge). Inside this container we have a menu (which can be done in the usual gracefully degrading unordered list), for which the number of items is not known (i.e. we don't want to change the CSS when a new item is added to the menu). The requirement is that the menu should be laid out horizontally, filling up the total width of the container, flexibly.

Extra credit

Additionally, we would like the menu entries to be uniformly spaced (we're talking here of the space between them, not the whitespace inside them, nor large enough borders), with the same spacing occurring all around the menu.

A (suboptimal?) solution

There is, in fact, a solution for this challenge. The idea is to use tables behind the scene, without actually coding tables into the HTML. If the menu is coded with the standard gracefully degrading unordered list, the ul element is set to display: table; width: 100% and the items are set to display: table-cell.

Note that this solution should not necessarily be considered ‘quirky’: the purpose of getting rid of tables from the HTML was to ensure that tables were not being used for layout, but only to display tabular content. There is no reason to not use the display: table* options to obtain table-like layouts from structural markup!

Additionally, we can also get the extra credit by using something like border-collapse: separate; border-spacing: 1ex 0, which is almost perfect, except for the fact that it introduces an extra spacing (1ex in this case) left and right of the menu. This can be solved in CSS3 using the (currently mostly unsupported) calc() operator, by styling the ul with width: calc(100% + 2ex); margin-left: -1ex.

Of course, in this case, to prevent the mispositioning of the menu in browsers that do not support calc, the margin is better specified as margin-left: calc(-1ex), that has exactly the same effect but only comes into action if the calc()ed width was supported as well.

The challenge proper

While I'm pretty satisfied with this solution, it's not perfect:

  • there are people that cringe at the use of tables, even if just in CSS form; a solution without CSS tables would thus be better;
  • when the container is smaller than the overall natural length of the menu, the menu will overflow instead of wrapping.

Note that without using the CSS tables, the uniform spacing is quite easy to set up (something as trivial as margin: 0 1ex for the entries, for example), but having the entries adapt their size to make the ul fill its container is rather non-trivial.

I'll actually consider the way to make it wrap nicely a different challenge.

The CSS shrinkwrapping container

This is a pure
CSS challenge:
no javascript,
no extra HTML

We have a container (e.g. an ‘outer’ div) and inside this container we have N boxes with constrained width (i.e. width or max-width is specified). We want to lay out the boxes side by side, as many as fit inside the viewport, and we want the outer container to wrap these boxes as tightly as possible (considering, of course, all padding and margins). The container (and thereby the collection of boxes inside it) should be centered inside the viewport.

The problem here is that we want to lay out the inner boxes (almost) without taking the outer box into consideration, and then lay out the outer box as if it had {width: 100%; margin-left: auto; margin-right: auto}.

A suboptimal solution

There is, in fact, a suboptimal solution for this challenge. The idea is to fix the container width based on the width necessary to wrap the actual number of boxes that would fit at a given viewport width, and then let the box fill the container as appropriate (I prefer a display: inline-block, without floats, since this spaces out the boxes evenly).

For example, if we know that, considering padding and margins, the container would have to be large 33em when holding only one box, and that only one box would fit with a viewport smaller than 66em, and that only two boxes would fit with a viewport smaller than 98em, etc, we could use something like the following:

@media (max-width: 98em) {
    #content {
        width: 66em;
    }
}
@media (max-width: 66em) {
    #content {
        width: 33em;
    }
}

Now, the reason why this is a horrible solution. To work perfectly, it requires the following things to be known:

  • the number of boxes (one media query for each ‘additional box’ configuration is needed),
  • the width of each box (note that the solution works regardless of whether the boxes have the same or different widths, as long as each box width is known, in the order in which they are to be laid out).

The challenge proper

The question is: is it possible to achieve this effect without knowing the number of boxes and without writing an infinite (nor a ‘reasonably high’) number of media queries? Even a solution in the case of fixed, equal-width boxes would be welcome.

The CSS challenges

In the beginning was HTML, and HTML mixed structure and presentation. And people saw that this was a bad thing, so an effort was made to separate content structure from layout and presentation.

This resulted in the deprecation of all HTML tags and tag attributes whose main or only purpose was to change the presentation of the text, and in the birth of Cascading Style Sheets (CSS), to collect the description of the presentation and layout descriptions.

This was a very good thing. And in fact, CSS succeeded fairly well in achieving the separation of content from styling: it is now possible, using only structural (‘semantic’) HTML and CSS to achieve an impressive richness of colors, font styles and decorations.

However, while one of the purposes of CSS was to get rid of the use of ‘extra’ HTML (infamously, tables, but not just that) to control the layout, i.e. the positioning of elements on the page and with respect to each other, this has been an area where CSS has failed. Miserably.

So miserably, in fact, that sometimes it's not even sufficient to just add extra markup (container elements whose only purpose is to force some layout constraints): it might be necessary to resort to JavaScript just for the sake of obtaining the desired layout. And this, even before taking into consideration the various errors and deficiencies in the CSS implementations of most common layout engines.

I'm going to present here a number of challenges whose main purpose is to highlight limitations in the current CSS specifications: the things I'm going to ask for are going to be hard, if not impossible, to achieve regardless of the quality of the implementation, i.e. even on a layout engine that implemented everything in the current specification, and did it without any bugs whatsoever.

These challenges should be solved using only HTML and CSS, without any hint of JavaScript, and possibly without having to resort to non-structural markup in the HTML.

(Attentive people will notice that some these challenges have a remarkably close affinity with some of the features of this wok. This is not by chance, of course: one of the purposes of this wok is to act as my personal HTML testing ground for sophisticated features.)

{ And here, I might add in the future some further considerations and remarks which would not be considered challenges. }

Vettoriale manuale

Gioie e dolori del codificare a mano le immagini SVG

Ho recentemente scoperto la bellezza dell'hand-editing dell'SVG: un po' come lo scrivere a mano l'HTML delle pagine web, ma assai più laborioso e spesso molto meno gratificante, soprattutto se, come nel mio caso, manca un senso estetico di appoggio all'abilità tecnica.

A dirla tutta, scrivere a mano questi verbosi formati di markup è estremamente tedioso, anzi faticoso, e pesa parecchio su polsi e sulle dita. La cosa non dovrebbe sorprendere: sono formati intesi più per la produzione e la consumazione da parte di macchine, che non per la modifica diretta da parte degli esseri umani. (In realtà, il discorso per l'HTML è leggermente più complesso.)

Per di più, scrivere SVG a mano significa fare grafica (SVG, dopo tutto, vuol dire scalable vector graphics) senza vederla. Abituati come siamo ad un mondo ‘punta e clicca’, anche per del semplice testo1, quanto più può sembrare strano, se non assurdo, fare grafica senza (un'interfaccia) grafica?

Ovviamente, l'opportunità o meno di lavorare senza un feedback grafico immediato dipende pesantemente dal tipo di grafica che si deve fare (oltre che, ovviamente, dall'attitudine individuale). Per un rapido schizzo estemporaneo un classico programma di grafica (vettoriale) come Inkscape è certamente lo strumento ideale; ma vi sono alcuni casi (che discuteremo a breve) in cui lavorare ‘a mano’ è nettamente superiore.

Beninteso, anche quando si lavora a mano il feedback è necessario, per assicurarsi di aver scritto giusto, per controllare il risultato ed eventualmente migliorarlo; quando lavoro su un SVG, ad esempio, tengo sempre il file aperto anche in una finestra del browser, aggiornandolo quando finisco una iterazione di modifiche.

Quali sono dunque i casi in cui la stesura manuale di un verboso XML è preferibile ad una interfaccia grafica? Le risposte sono due, e benché antipodali sono strettamente legate da due fili conduttori: quello dell'eleganza e quello dell'efficienza.

L'SVG può essere considerato come un linguaggio estremamente sofisticato e complesso per la descrizione di figure in due dimensioni (figure descritte da segmenti, archi di cerchio e cubiche di Bézier), con ricche opzioni stilistiche su come queste figure (descritte geometricamente) devono apparire (colori, frecce, riempimenti).

In effetti, l'SVG è talmente complesso che è ben possibile che i programmi visuali a nostra disposizione semplicemente non supportino l'intera ricchezza espressiva del linguaggio; in tal caso, la possibilità di modificare l'SVG a mano si può rivelare preziosa (il già citato Inkscape, ad esempio, che usa una versione bastarda dell'SVG come formato nativo, permette anche modifiche manuali al codice interno dell'immagine).

Il caso opposto è quello di un disegno estremamente semplice: perché prendersi la briga di aspettare i lunghi minuti che spesso i programmi di grafica impiegano all'avvio, quando un semplice editor di testo può bastare?

Il più grosso vantaggio della codifica manuale rispetto all'uso di un classico programma per la grafica vettoriale è la netta semplificazione del file stesso: anche l'immagine più semplice, infatti, salvata da un programma di grafica, si trova infatti sommersa da una immensa e spesso ingiustificata tonnellata di informazioni supplementari che sono inessenziali, ma che riproducono le strutture di controllo utilizzate internamente dal programma stesso.

Così ad esempio, ho potuto ottenere una versione vettoriale del logo Grammar Nazi che occupa meno di metà dello spazio su disco rispetto a quella a cui è ispirata, senza perdere minimamente né in qualità né in informazione. Anzi, il mio approccio alla descrizione della G stilizzata risulta essere ben più comprensibile, essendo disegnato ‘in piano’ e poi ruotato/riscalato opportunamente.

Questo è proprio un altro vantaggio della scrittura manuale rispetto al disegno grafico: la possibilità di esprimere già a livello di codifica la distinzione tra il design della singola componente e le trasformazioni geometriche necessarie per la sua integrazione con il resto del disegno.

Benché questo sia anche una possibilità spesso offerta dai programmi visuali, l'informazione viene spesso sfruttata sul momento per deformare/riposizionare le componenti come richieste, ma non è preservata nel salvataggio su file, ed è quindi ‘persa’ dopo la sua applicazione: non essenziale per la versione definitiva di un progetto, ma alquanto scomodo nel periodo di design.

Scrivere a mano risulta quindi in file non solo più efficienti (cosa che può avere un impatto per l'utente medio, con ridotti tempi di caricamento o meno fatica da parte del computer nella rasterizzazione dell'immagne), ma anche più eleganti: un'esigenza un po' ‘segreta’ (in quanti si ritrovano abitualmente a guardare il codice sorgente di un file, piuttosto che il suo risultato?) e che per l'utente medio in genere non ha impatto (anche se può risultare talvolta opposto a quello dell'efficienza, richiedendo maggiori calcoli in fase di rasterizzazione).

La codifica manuale non è ovviamente la panacea: oltre ad essere (per qualcuno ingiustificatamente) laboriosa, ad esempio, non può sopperire ai limiti intrinseci del formato. L'SVG, ad esempio, manca della capacità di esprimere le dimensioni e le posizioni delle componenti in rapporto l'una all'altra, se non in casi molto semplici e ricorrendo a sofisticati artifici con raggruppamenti e dimensionamenti fatti con fattori di scala; in più, le costanti numeriche devono essere espresse in forma decimale e quindi, per valori quali π/3 o la sezione aurea, approssimate.

L'SVG, d'altro canto, non è l'unico linguaggio per la grafica vettoriale: programmi come MetaPost ed il suo progenitore MetaFont sono nati come linguaggi di programmazione per la grafica vettoriale, sono stati scritti con un occhio di riguardo per gli aspetti numerici della matematica della grafica vettoriale, e no soffrono dei limiti suenunciati dell'SVG; d'altronde, un paragone diretto tra MetaPost ed SVG è altamente inappropriato, tanto per le rispettive caratteristiche quanto per i rispettivi dominî di applicazione per cui sono stati intesi.

Il MetaFont nasce dalla mente follemente geniale di Donald Ervin Knuth con lo scopo di permette la generazione matematica di famiglie di caratteri per la stampa. I caratteri di un MetaFont sono descritte da cubiche di Bézier opportunamente parametrizzate e combinate, e questo principio (trasformando i caratteri in immagini e l'output rasterizzato in output vettoriale in formato PostScript) sarà pure la componente fondamentale del MetaPost.

MetaFont e MetaPost indicano tanto i programmi in sé quanto il linguaggio di programmazione (molto simile per entrambi) che permette agli utenti di sviluppare famiglie di caratteri o immagini vettoriali, con descrizioni di tipo matematico e relazionale (sono permesse descrizioni del tipo: traccia una curva dall'intersezione di queste altre due curve ai due terzi di quell'altra curva). Un file MetaPost è come il sorgente di un qualunque linguaggio di programmazione, e va compilato per la produzione di una o più immagini.

Per contro, l'SVG nasce come linguaggio di descrizione di immagini vettoriali, ed è mirato (seppur non in maniera esclusiva) alla fruizione del web, includendo pertanto funzionalità come la possibilità di descrivere semplici animazioni, eventualmente controllate mediante interazione con l'utente.

D'altra parte, l'SVG si integra piuttosto bene con il JavaScript, il linguaggio di programmazione dominante sul web, e grazie a questo può assumere tutta una serie di capacità la cui mancanza lo rende in certi casi inferiore al MetaPost; d'altra parte, trovo personalmente molto fastidioso dover ricorre ad un linguaggio di programmazione ausiliario per la descrizione di immagini statiche.

Se nel MetaPost questo era una necessità legata alla natura intrinseca del programma (compensata dall'immensa flessibilità offerta dalla possibilità di esprimere i tratti salienti di un'immagine in maniera relazionale), la necessità di utilizzare il JavaScript in SVG per raggiungere certi effetti statici continua a pesare come una limitazione dell'SVG stesso.

Si potrebbe supporre che se non avessi avuto una precedente esperienza con la potente flessibilità del MetaPost, non avrei mai sentito i limiti dell'SVG come tali. Ne dubito: avrei comunque molto rapidamente trovato frustrante l'impossibilità di usare quantità numeriche ‘esatte’ lasciando al computer il compito di interpolare, avrei comunque sentito fortemente la mancanza di esprimere come tali le relazioni tra componenti diverse di un'immagine.

Piuttosto, quello che penso potrebbe essere un interessante compromesso è qualcosa di simile al Markdown (che permette di scrivere documenti HTML quasi come se fosse del testo semplice) per l'SVG. Se il MetaPost stesso si pensa sia troppo complicato, si potrebbe cominciare da qualcosa di più semplice, come Eukleides (pacchetto attualmente specializzato per la geometria).

Ovviamente, è importante che gli SVG prodotti da questi programmi siano quanto più minimalistici possibile, e quindi che in qualche modo riflettano, nel prodotto finale, quello spirito di eleganza, semplicità ed efficienza che caratterizza la codifica a mano rispetto all'uso di un'interfaccia grafica. E come il Markdown, dovrebbe permettere l'inserimento di codice SVG ‘nudo’. Quasi quasi mi ci metto.


  1. una nozione che io trovo raccapricciante: trovo faticoso già solo guardare la gente che stacca le mani dalla tastiera per selezionare del testo con il mouse, per poi andare a cliccare su un bottone per l'apposita funzione d'interesse (grassetto, corsivo, cancella, copia, whatever). ↩

RCS fast export

Get the code for
RCS fast export:

gitweb
rcs-fast-export
git
rcs-fast-export
GitHub
rcs-fast-export

RCS is one of the oldest, if not the oldest, revision control systems (in fact, that's exactly what the name stands for). It may seem incredible, but there's still software around whose history is kept under RCS or a derivative thereof (even without counting CVS in this family).

Despite its age and its distinctive lack of many if not most of the features found in more modern revision control systems, RCS can still be considered a valid piece of software for simple maintenance requirements, such as single-file editing for a single user: even I, despite my strong passion for git, have found myself learning RCS, not earlier than 2010, for such menial tasks.

In fact, the clumsiness of RCS usage when coming from a sophisticate version control software like git was exactly what prompted me to develop zit, the single-file wrapper for git. And so I found myself with the need to convert my (usually brief, single-file) RCS histories to git/zit.

I was not exactly surprised by the lack of a tool ready for the job: after all, how many people could have needed such a thing? Most large-scale project had already migrated in time to some other system (even if just CVS) for which even quite sophisticated tools to convert to git exist. So I set down to write the RCS/git conversion tool myself: I studied the RCS file format as well as the git fast-import protocol, and sketched in a relatively short time the first draft of rcs-fast-export, whose development can be followed from its git repository.

The first release of the software was quite simple, supporting only linear histories for single files (after all, that was exactly what I needed), but I nevertheless decided to publish my work; who knows, someone else in the internet could have had some need for it.

In fact, what has been surprising so far to me has been the number of people that have had need for this small piece of software. Since the public release of the software, I've been contacted by some five or six different people (among which the most notable is maybe ESR) for suggestions/question/patches, and as with all software developed on a need/use basis, the capabilities of the script have hence grown to accommodate the needs of these other people.

In the current situation it can handle files with branched histories as well as multi-file projects with a linear history. It does not, however, currently support multi-file histories with branching, which is, unsurprisingly, the “most requested” feature at present times. I have actually been looking for a repository with this characteristics, to try and tackle the task, but it seems that finding one such repository is nigh impossible; after all, how many people still have RCS repositories around?

Design finlandese

Come ho già detto, l'N900 è un gran bel telefonino, la cui esperienza d'uso è però intralciata da qualche piccolo difetto di funzionamento. Il più strano è forse il problema del sensore di prossimità.

Per evitare che durante una chiamata il contatto con il viso o qualche mossa azzardata della mano possa chiudere le chiamate o caricare programmi non desiderati, il programma di telefonia legge il sensore di prossimità e blocca il telefonino se il sensore è chiuso.

Il sensore funziona ad infrarossi: è quindi accoppiato ad un LED a luce infrarossa che, se riflessa ad esempio dal viso dell'utente, torna al sensore invece di disperdersi nell'ambiente.

Cosa succede se nell'ambiente c'è un'altra sorgente di luce infrarossa sulle stesse frequenze del sensore? Il sensore ‘pensa’ di essere ostruito anche quando è libero; è quindi necessario che il sensore sia regolato per accettare interferenze ‘tipiche’ da sorgenti esterne.

Sorgenti tipo il Sole.

Solo che la quantità di luce (infrarossa) solare disponibile in zone come, che so, la Sicilia non è esattamente la stessa di zone come, che so, la Finlandia. Indovinate dove è stato tarato il sensore? (Suggerimento: di che nazionalità è la Nokia?)

(Davvero, per far funzionare il sensore correttamente basta ostruirlo parzialmente in modo da ridurre l'interferenza da luce solare.)

N900, o di come la Nokia ha scelto il suicidio dopo essersi aperta la strada per il futuro

Un paio d'anni fa —quando l'iPhone ormai spopolava tra i fighettini ed il concetto di smartphone si era abbondatemente esteso oltre quello di semplice Personal Digital Assistant con telefonino incluso, creando un nuovo mercato che andava ben oltre l'aspirante manager ed il suo Palm (prima) o il suo BlackBerry (dopo)— la Nokia, il cui Symbian era il sistema operativo mobile più diffuso (coprendo una gamma di prodotti che andava dai cellulari da Dash ai più sofisticati communicators), fece uscire un prodotto che per la prima volta in vita mia mi fece seriamente prendere in considerazione l'idea di prendere uno smartphone di fascia alta.

Il prodotto in questione era l'N900, uscito sul finire del 2009. L'unica cosa che allora mi trattenne dal prenderne possesso fu il prezzo, che si aggirava sui 600€ (un prezzo in realtà non inusuale per la classe dell'apparecchio, ma che ad esempio la Apple nascondeva dietro ‘offerte’ con contratti inestinguibili legati a questo o quel fornitore di connettività). Potete immaginare la mia sensazione di invidia quando ho scoperto che ad un corso libero tenuto all'università davano proprio questo modello —gratis— agli studenti (ingegneria informatica).

Qualche mese fa, approfittando della ‘zona compleanno’ e di una proposta via internet, ho deciso di prenderne uno di seconda mano ad un terzo del prezzo, ed ho finalmente avuto modo di giocarci a mio piacimento. Ed in breve posso dire che è stata forse la spesa (personale) più soddisfacente degli ultimi anni.

L'N900 si posiziona in evidente competizione con l'iPhone 3GS, uscito pochi mesi prima, con una interessante combinazione di pro e contro; una sintesi delle differenze si può trovare su questa pagina.

In realtà, l'unico ‘contro’ dell'N900 rispetto al concorrente Apple è nel touchscreen, resistivo nel Nokia (già questo ad alcuni può dare fastidio), ma soprattutto incapace di tracciare più dita, rendendo quindi impossibili i famosi gesti di pizzico e distensione per lo zoom. Per il resto, il Nokia vince praticamente su tutto, tranne lo spessore (un 60% in più, non tutto dovuto alla tastiera fisica scorrevole, che è uno dei punti di forza dell'N900): il display del Nokia ha una risoluzione quasi doppia, il Nokia ha sia una fotocamera posteriore (con flash e autofocus, ed una risoluzione superire a quella dell'iPhone) sia una anteriore (bassa risoluzione, per le videochiamate), il display del Nokia può essere usato sia con le dita sia con un pennino (incluso), il Nokia ha un ricevitore ed un trasmettitore FM (anche se, chi usa ancora le vecchie radio?), il Nokia ha una tastiera fisica (già detto), il Nokia ha un lettore per schede MicroSD, il Nokia ha la batteria sostituibile, il Nokia ha un'uscita video standard ed il cavo per la connessione ai televisori è incluso.

Infine, il Nokia ha Linux: non una macchina virtuale semi-proprietaria come il Dalvik di Android (su kernel Linux), non una variante proprietaria (iOS) del kernel open-source BSD dei telefoni Apple, ma una distribuzione Linux ad-hoc (Maemo) costituita quasi per intero da software open source.

Da quando sono entrato in possesso di questo giocattolino l'ho usato per un'infinità di cose: giocare, leggere libri e fumetti, scattare foto, girare filmati, amministrare il mio server, ascoltare musica, leggere e scrivere email, chattare e parlare via Skype e Google Talk. In sostanza l'unica cosa per cui non l'ho usato è stato telefonare, e questo principalmente perché non ho ancora trovato una SIM con buone tariffe per internet.

Molte delle cose per cui ho usato ed uso il telefonino hanno richiesto l'installazione di nuovi programmi; e benché sia disponibile un OVI store, io ho potuto trovare tutto quello che mi serviva nei repository ufficiali (dopo tutto si tratta sempre di una distribuzione Linux completa e basata su Debian). In sostanza, non ho dovuto spendere un centesimo più del costo del telefonino.

Non si può dire che l'N900 fosse perfetto: nell'uso quotidiano si possono riscontrare facilmente problemi anche molto fastidiosi che vanno da un'antenna non eccellente a qualche problema con il sensore di prossimità. Eppure, non sono certo stati questi a decretare il profondo insuccesso del tentativo della Nokia di entrare nel gioco degli smartphone di nuova generazione.

La Nokia, piuttosto, ha di fatto messo in atto un vero e proprio suicidio. L'N900, che altro non era che il primo passo verso un mercato in cui la Nokia stava entrando già con un notevole ritardo, è diventato invece, purtroppo, l'apice degli smartphone Nokia.

La strategia da seguire dopo l'uscita dell'N900 sarebbe dovuta essere focalizzata sul raffinamento ed il miglioramento del tipo di piattaforma già sperimentata con l'N900 ed il suo Maemo 5, per produrre nel minor tempo possibile un successore che ponesse rimedio ai limiti hardware e software del primo vero smartphone Nokia.

Dal lato hardware non c'era nemmeno nemmeno molto da fare: aggiungere funzionalità multi-touch al display, migliorare l'antenna interna e risolvere i problemi del sensore di prossimità sarebbero dovuti essere gli obiettivi principali. In nuove generazioni hardware, processori più potenti, batterie più capiente ed un design magari più sottile avrebbero reso insuperabili i successori dell'N900.

Ma è soprattutto sul lato software che la Nokia è caduta nel più infantile degli errori: ripartire da zero, proponendo una piattaforma completamente nuova, MeeGo, nata in teoria dalla fusione di Maemo con il Moblin della Intel: in sostanza, una terza alternativa ai due sistemi esistenti, da riprogettare dall'inizio e con la conseguente, inevitabile dilatazione dei tempi di uscita dei nuovi prodotti in un mercato che non aveva certo voglia di aspettare la Nokia.

A metà del 2010 la Nokia si trovava quindi con progetti interessanti cominciati (ma non finiti) tra le mani, il più importante dei quali l'acquisizione della Trolltech e per conseguenza il controllo sulle Qt, la più importante interfacce multipiattaforma in circolazione, e potenzialmente il punto d'incontro tra i mai nati Maemo 6 e Symbian4. Nello stesso periodo, l'azienda sforna qualche altro timido tentativo di smartphone basato sull'ormai moribondo Symbian, e l'unica nota veramente positiva dell'anno è la pubblicazione di una serie di aggiornamenti software che rendono l'N900, nei limiti imposti da un hardware dell'anno precedente, l'ottimo smartphone che mi ritrovo ora tra le mani.

Si sarebbe dovuto aspettare il 2011 per vedere i frutti dei nuovi progetti del 2010 —ed è infatti solo a giugno del 2011 che la Nokia renderà disponibile due nuovi modelli (N950 ed N9) ed il nuovo sistema operativo (Harmattan, il raccordo tra Maemo e Meego) che potrebbero farla tornare ad essere rilevante nel mercato degli smartphone.

Purtroppo, però, la latenza introdotta dalla reinvenzione di Maemo in MeeGo ed il crollo delle vendite dei prodotti Symbian portano alla decisione di un cambiamento di gestione a livello aziendale, e nel settembre 2010 l'allora CEO della Nokia viene sostituito da Stephen Elop, ex-direttore della divisione business della Microsoft (sostanzialmente, il responsabile di Microsoft Office per la release del 2010), che decide di cambiare rotta di nuovo: per sfondare nel mondo degli smartphone, secondo Elop, la Nokia dovrà appoggiarsi al più irrilevante dei sistemi operativi per smartphone, Windows Phone (precedentemente noto come Windows Mobile) della Microsoft.

Purtroppo per Elop, a fine 2010 la Nokia ha già in cantiere i successori dell'N900, e dopo la grande attesa che si è creata nel corso dell'anno per questi nuovi modelli, è impossibile impedirne l'uscita. La strategia adottata diventa quindi quella di renderli irrilevanti, distruggendo così la possibilità di una seria competizione con i nuovi modelli di iPhone della Apple, il principale singolo avversario contro cui la Nokia deve combattere.

L'N950 (il vero successore dell'N900) viene quindi rilasciato solo come developer preview per l'N9: non viene messo in vendita, ma viene reso disponibile solo a sviluppatori, attraverso strani procedimenti di selezione; la scelta è peraltro profondamente discutibile, poiché i due modelli hanno hardware abbastanza diverso (ad esempio, l'N950 ha una tastiera fisica, l'N9 no).

Per completare il suicidio, l'N9 viene reso disponibile solo su alcuni mercati (Finlandia, Hong Kong, Svizzera, India), e negato al resto del mondo. Tutto questo viene accompagnato da una gran fanfara per pubblicizzare l'uscita dei Lumia, i primi cellulari Nokia con il nuovo sistema operativo Microsoft.

Nonostante questi goffi e disperati tentativi di soffocare l'alternativa alla Microsoft, l'erede dell'N900 è talmente ambìto che varî rivenditori online rendono l'N9 disponibile anche su mercati (come quello italiano) che la Nokia aveva invece escluso. Persino altri modelli basati sull'ormai moribondo Symbian continuano a vendere più dei Lumia.

Se il CEO della Nokia non fosse stato un ‘cavallo di Troia’ mandato dalla Microsoft, la scelta da seguire per la Nokia sarebbe stata ovvia. Purtroppo, invece, ci ritroviamo in una situazione in cui a perdere sono sia la Nokia (che continua a perdere mercato ad un ritmo incredibile) sia gli utenti, che si ritrovano infine senza un degno successore per l'N900, quell'ottima combinazione di hardware Nokia (da sempre superiore alla concorrenza) e software di qualità che ne avrebbero potuto decretare il successo.

Verrebbe voglia di metter su un'azienda per costruire un clone dell'N950 e del suo compagno senza tastiera, per proseguire sulla strada che la Nokia ha scelto di abbandonare.

Reti asociali

Breve storia della socialità su Internet

Rispetto ad altre forme di comunicazione di massa, Internet si è sempre contraddistinta per la sua natura “da molti a molti”: sia in forma sincrona (IRC) sia asincrona (mailing list, newsgroup, forum) Internet ha sempre offerto la possibilità a tutti di raggiungere tutti. Fino ai tardi anni '90, per la maggior parte degli utenti questa possibilità era offerta in contesti che avevano molto della piazza e poco dell'individuale: pochi potevano permettersi una presenza fissa su internet con siti personali.

A cambiare questo sono stati la nascita dei blog (pagine personali in forma diaristica), il passaggio dalla loro cura manuale allo sviluppo di strumenti più o meno automatici per la loro gestione, ed infine il diffondersi di piattaforme che offrivano ‘a chiunque’ la possibilità di (ed in particolare lo spazio web necessario per) tenerne uno (LiveJournal, Blogger, i nazionali Splinder o ilCannocchiale, ed infine l'attualmente famosissimo WordPress).

Si avvia così un processo di individualizzazione di Internet, in cui il singolo assume (per il soggetto stesso) un peso sempre maggiore e le comunità cominciano a disgregarsi. Ai blog si affiancano siti in cui la pubblicazione e la condivisione del proprio (in forme non solo o non prevalentemente testuali) è il punto centrale: disegni (DeviantART), foto (Flickr, Zooomr), filmati (YouTube, Vimeo).

Per il singolo diventa sempre più facile pubblicarsi, ma sempre più difficile trovare e farsi trovare: il ruolo un tempo assunto principalmente dalle comunità che si aggrega(va)no in luoghi virtuali ben definiti (canali IRC a tema, gruppi specifici nell'immensa gerarchia dei newsgroup) viene progressivamente sostituito, in maniera oltremodo inefficiente, dalla rete di conoscenze (reali o virtuali).

L'apice di questo processo è la nascita dei cosiddetti social network, siti la cui spina dorsale non è più composta dai contenuti, bensì dai membri e dai modi in cui questi sono legati tra loro, invertendo così il rapporto tra utenti e contenuti che invece domina i servizi precedentemente menzionati.

Social network ed altre piattaforme

I social network non sono monchi della possibilità di pubblicare contenuti; anzi, un punto di forza su cui fanno leva per attirare utenti è la facilità con cui ‘tutto’ (testi, foto, video) può essere messo online, e soprattutto condiviso. Le possibilità offerte per la pubblicazione dei contenuti sono spesso di qualità nettamente inferiore a quelle offerte da piattaforme dedicate, ma sono per lo più sufficientemente buone (e soprattutto semplici) per l'utenza obiettivo preferenziale di questi servizi, con in più la comodità della centralizzazione.

Il social network si propaga facendo leva sulla natura sociale dell'animale umano, e la possibilità di condividere è lo strumento principale della nuova socialità virtuale. Così il social network diffonde la propria presenza oltre i limiti del proprio sito, e diventa lo strumento principale di diffusione anche di contenuti esterni. Prima dell'avvento dei social network, per un sito era importante essere ben indicizzato da un buon motore di ricerca; dopo l'avvento dei social network, per un sito diventa importante poter essere condiviso sui social network.

Ma la principale caratteristica che differenzia il social network dalle altre piattaforme è l'inversione dei rapporti tra utenti, prodotti, servizi e clienti. Mentre le altre piattaforme offrono servizi ai propri utenti, che sono anche i clienti, nei social network i servizi offerti pubblicamente sono solo un'esca per attirare utenti, le cui reti di connessioni ed i cui contenuti condivisi sono il prodotto da vendere ai clienti (principalmente, agenzie di pubblicità).

Il sottoscritto nella rete sociale

Già prima dell'avvento dei social network, i molti passaggi dell'evoluzione delle forme principali di interazione su Internet mi hanno lasciato molto tiepido. Non essendo quello che si definirebbe un early adopter, sono arrivato tardi su IRC (su cui permango tuttora), sui newsgroup e sulle mailing list (tra i quali rimango, e solo molto moderatamente attivo, solo su alcuni gruppi di interesse tecnico molto specifico), sui forum (che ho smesso quasi interamente di seguire). Ho aperto tardi un blog, e la mia presenza sui social network è pressoché inesistente.

Ogni approccio ad una nuova forma di interazione ha avuto origini e motivazioni ben precisi: persino i primi accessi ad internet (ben prima dell'arrivo della banda larga, quando ci si collegava a 56k —se andava bene— bloccando l'uso del telefono) furono motivati (ricordo che la prima connessione, usando uno di quei floppy di ‘prova internet per 15 giorni’, la feci per cercare un walkthrough per Myst, e stiamo quindi parlando dei primi anni '90).

I social network, invece, sfuggono tuttora al mio interesse: vuoi per la mia natura asociale (pardon: ‘selettiva’), vuoi per la mia totale estraneità (non priva di un certo disgusto) a quel vacuo entusiasmo per il numero di amici ed a quella passione quasi ossessivo-compulsiva per la condivisione sfrenata di ogni aspetto della propria vita nonché di quella degli altri tanto amplificata dai social network, vuoi per quella modaiolità intrinseca persino nella scelta dell'“ambiente” (ieri tutti su MySpace, oggi tutti su Facebook, domani tutti chissà dove), mi sono sempre sentito estraneo a queste forme di socialità virtuale.

Eccezionale veramente

Non nego tuttavia che alcuni aspetti del social networking possono essere utili, in determinati contesti o in particolari forme.

LinkedIn

In ambito lavorativo, ad esempio, il ‘grafo sociale’ di un individuo, le persone che conosce (ed il suo giudizio su di loro) e quelle che lo conoscono (ed il loro giudizio su di lui) hanno spesso un'importanza non inferiore a quella delle qualifiche dell'individuo stesso (soprattutto quando il lavoro scarseggia e per qualità e per quantità).

Con quest'ottica mi sono iscritto a LinkedIn, social network incentrato sul lavoro, i cui utenti sono definiti dalla professione, dall'azienda per cui lavorano e da quelle per cui hanno lavorato, dall'istruzione che hanno ricevuto. Uno scarno profilo ed una curata selezione di contatti costituiscono la mia ‘partecipazione’ al social network.

FriendFeed

L'altro aspetto che può rivelarsi utile dei social network è la centralità, ma piuttosto che incarnata in un ‘luogo unico’ per la pubblicazione dei contenuti, una centralità vista come punto di raccolta di contenuti pubblicati altrove.

Prima dei social network, lo strumento più comodo per seguire gli aggiornamenti di contenuti dei vari blog, siti di fotografia e quant'altro era l'utilizzo dei famosi feed; se della stessa persone si seguiva il blog, i video su YouTube, le foto su FlickR e quant'altro, ci si iscriveva a ciascuno dei feed separatamente. In questo, almeno una centralizzazione dei feed di tutti gli account sparsi per l'Internet avrebbe fatto comodo.

Ed è proprio su questo che ha puntato FriendFeed, ed è stato proprio questa sua principale caratteristica di aggregatore di contenuti, contenuti per i quali la scelta della piattaforma di appoggio rimane in mano agli utenti, che mi ha attirato.

Il profilo dell'utente FriendFeed è composto sostanzialmente dall'elenco dei feed che il social network si prenderà cura di controllare e raggruppare, per diffonderli poi automaticamente ai seguaci, ovvero a coloro che sono ‘iscritti’ all'account dell'utente. Per chi seguisse gente non iscritta a FriendFeed, questo social network permette anche la creazione di ‘amici immaginari’: nuovamente, nient'altro che elenchi di feed esterni raccolti a rappresentare un'unica persona virtuale.

Il punto chiave in tutto ciò è che FriendFeed non controlla i contenuti, ma aiuta molto a gestirli. D'altronde, non tutti i suoi utenti la vedono così, e non sono in pochi ad usare FriendFeed scrivendo messaggi e caricando foto direttamente sul social network.

La storia evolutiva di FriendFeed è stata intensa quanto breve, e si è conclusa con il suo acquisto, nel giro di meno di due anni, da parte di Facebook, mossa che per quanto favorevole agli sviluppatori di FriendFeed ha sostanzialmente sospeso, a tempo indeterminato, ogni speranza di sviluppo delle sue funzioni.

Così, ad esempio, difficilmente verrà aggiunto a FriendFeed il supporto per nuovi servizi e nuove piattaforme; difficilmente verrà migliorata la gestione della privacy, limitata alla possibilità di rendere privato il proprio account, senza ad esempio un'integrazione con la capacità di raggruppare gli amici in liste; difficilmente verrà migliorata la gestione dei gruppi, aggiungendo ad esempio la possibilità di pubblicare alcuni servizi direttamente su gruppi specifici.

Se l'approccio di FriendFeed ad una rete sociale rimane, a mio parere, il più intelligente e corretto, esso non sembra convincere coloro che invece vedono in quello tradizionale il modo migliore per sfruttare la ‘risorsa utente’. Così Google, nel suo ennesimo approccio al social network, dopo il fallimento di Wave e Buzz, opta per la strategià ‘à la Facebook’ in Google+: ed è questo, a mio parere, il grande errore di questo gigante della Rete che su altri fronti ha avuto invece tanti successi.

L'unica novità che Google+ offre in più rispetto a Facebook è infatti la gestione delle cerchie, e benché sia sorprendente che sulla rete sociale ci sia voluto tanto perché la distinzione tra gruppi di contatti molto diversi da loro fosse così ben integrata con il resto della piattaforma, non potrà mai essere questa la killer feature con cui Google+ potrà diventare per Facebook quello che Facebook è stato per MySpace.

Nel mio mondo ideal

{ Come progetterei io un social network. }

Vivere senza Windows(?)

Si può? È una domanda che dovrebbe venire spontanea a tutti quelli che continuamente si trovano ad affrontare virus, malfunzionamenti, appesantimenti ingiustificati del computer e quant'altro.

Invece, purtroppo con poca sorpresa, si scopre che la gente, principalmente per ignoranza, spesso anche per abitudine, e certamente anche per quella forma di conformistica pigrizia che ci fa preferire affrontare i problemi affrontati dalla maggioranza degli altri ai passi in più da compiere per averne molti meno ma diversi, la maggior parte della gente preferisce seguire la propaganda che vede in Windows il sistema “per tutti”, distinguendolo dallo chic alternativo della Apple e dal radical-comunista (e poco conosciuto) Linux.

Se però ci si pone davanti alla questione, la risposta (naturale quanto ovvia) è “dipende”. Dipende dal computer, dal tipo di uso che se ne fa, ed infine dal grado di interoperabilità richiesto con altri (e quindi, in definitiva, dai programmi che si intende usare).

Hardware

Dal punto di vista hardware, il problema, sempre meno frequente, è legato alla possibilità del sistema operativo di farne uso; se i produttori, per ovvie ragioni di mercato, hanno sempre fornito dischi di installazione con i driver per Windows del loro hardware, la situazione con Linux non è sempre così sorridente: si va dall'hardware con un supporto completo che in alcuni casi supera persino in qualità quello per Windows, ad hardware di cui si è fortunati se si riesce a far sapere al sistema operativo che quel particolare pezzo è presente, passando per tutta la possibile gamma di varianti.

La situazione è in realtà sempre meno tragica, ed ormai è alquanto raro trovare hardware che sia completamente non supportato: allo stato attuale, credo che i lettori di impronte digitali siano grossomodo l'unica classe di hardware quasi totalmente inutilizzabile. Più spesso capita che al momento dell'uscita di un nuovo modello questo non sia immediatamente supportato (ho avuto un'esperienza negativa in tal senso con la tavoletta grafica Wacom Bamboo Pen&Touch, che però adesso uso senza problemi), o che alcune funzioni avanzate non siano configurabili con l'immediatezza delle interfacce "a prova di idioti" che spesso si trovano in Windows (stesso esempio della Wacom, per le funzioni multi-touch).

Ovviamente, il livello di supporto per le componenti e le periferiche dei computer è molto legato alla disponibilità del produttore a cooperare con il mondo Linux. Si riscontrano classicamente quattro livelli:

  1. produttori che contribuiscono attivamente al supporto con driver e strumenti open source (esempi: Intel, HP),
  2. produttori che contribuiscono attivamente al supporto con driver e strumenti proprietari (esempio: ATI, NVIDIA, Broadcom),
  3. produttori che forniscono le specifiche dell'hardware, e quindi rendendo possibile la scrittura di driver e strumenti open source, ma non contribuiscono attivamente con codice di alcun tipo (esempi: ACECAD, Wacom),
  4. produttori il cui hardware è supportato solo grazie al paziente lavoro di reverse-engineering di gente senza alcun legame con la casa produttrice (esempi: troppi).

Per potersi lasciare alle spalle Linux è quindi opportuno diventare un po' più oculati nelle scelte, a meno di non avere interessi smanettoni. Per fortuna, è sempre più difficile trovare cose che non funzionino “out of the box”, ed ancora più difficile trovare cose che non si possano fare funzionare con un attimo di pazienza e qualche rapida ricerca su internet.

Software

L'uso più diffuso dei computer è dato (oggigiorno) probabilmente dalla navigazione in internet, seguita a ruota dall'uso di una suite per ufficio, o quanto meno del suo elaboratore testi (leggi: Word di Microsoft Office). Segue poi un po' di multimedialità, forse ascoltare musica e magari vedere qualche film, con utenze più smaliziate a cui interessa organizzare le proprie foto o i propri video (dal saggio di danza della figlia undicenne agli atti impuri con la compagna).

La scelta dell'applicazione per ciascun uso è, nuovamente, per lo più dettato dall'ignoranza: non sono pochi coloro per cui “la e blu sullo schermo” è internet (quando va bene) o Facebook (la parte di internet con cui si interfacciano il 96% del tempo, il restante 4% essendo YouTube, a cui magari arrivano da Facebook). Forse in questo caso non è proprio opportuno parlare di ‘scelta’.

Saltando la solita questione dell'inerzia (“sul computer c'è questo preinstallato, quindi uso questo”) un altro fattore determinante è l'interoperabilità, intesa specificamente in riferimento alla necessità di scambiare dati con altre persone. Se dalla monocultura web siamo finalmente usciti e sono ormai pochissimi i siti non correttamente fruibili senza Internet Explorer, per documenti di testo e fogli elettronici si continua a dipendere pesantemente dai formati stabiliti dalla suite per ufficio della Microsoft, nonostante per un accesso completo a questi documenti sia necessaria la suite stessa1.

La cosa è un po' paradossale, perché se davvero si puntasse all'interoperabilità ci si dovrebbe rivolgere a qualcosa di più universalmente disponibile, e quindi ad applicativi e formati che non siano legati ad uno specifico sistema operativo. Ma nuovamente l'inerzia e la necessità di compatibilità all'indietro con anni di monocultura (e la relativa legacy di documenti in quei formati) rendono difficile la transizione a soluzioni più sensate.

Cosa usare, e come

Per facilitare la transizione da Windows ad un altro sistema operativo, è meglio cominciare ad usare già in Windows stesso le stesse applicazioni che ci si troverebbe ad usare ‘dall'altra parte’. Prima di buttarsi a capofitto nell'ultima Ubuntu, ad esempio, è meglio rimanere nell'ambiente che ci è familiare (Windows), abbandonando però il nostro Internet Explorer (per chi lo usasse ancora), il nostro Microsoft Office, etc, per prendere dimestichezza con programmi equivalenti che siano disponibili anche sulle altre piattaforme. Questo spesso vuol dire rivolgersi al software open source, ma non sempre.

Si usi quindi ad esempio un browser come Opera, Firefox o Chrome per navigare in internet. Si usi lo stesso Opera di cui sopra o Thunderbird per gestire la posta. Gimp non sarà Photoshop, ma è un buon punto di partenza per il fotoritocco, ed Inkscape dà tranquillamente punti a Corel Draw se non ad Indesign. Come suite per ufficio LibreOffice (derivata dalla più nota OpenOffice.org) è una validissima alternativa al Microsoft Office, salvo casi particolari. DigiKam è eccellente per gestire le proprie foto (anche se forse non banale da installare in Windows; un'alternativa potrebbe essere Picasa, utilizzabile in Linux tramite Wine), VLC è un po' il media player universale, e così via.

Dopo tutto, le applicazioni sono ciò con cui ci si interfaccia più spesso, molto più che non il sottostante sistema operativo, usato per lo più per lanciare le applicazioni stesse ed eventualmente per un minimo di gestione (copia dei file, stampa).

Una volta presa dimestichezza con i nuovi programmi, la transizione al nuovo sistema operativo sarà molto più leggera, grazie anche agli enormi sforzi fatti negli ultimi anni (principalmente sotto la spinta di Ubuntu) per rendere Linux più accessibile all'utonto2 medio.

Ma a me serve …

Ci sono casi in cui non si può fare a meno di utilizzare uno specifico programma, vuoi perché in Linux non è disponibile un'alternativa, vuoi perché le alternative esistenti non sono sufficientemente valide (ad esempio non leggono correttamente i documenti su cui si sta lavorando, o mancano di funzioni essenziali).

La soluzione migliore, in tal caso, è offerta dalla virtualizzazione, alternativa più efficiente, su macchine recenti, al dual boot. Mentre con il secondo approccio si tengono sulla stessa macchina i due sistemi operativi, scegliendo quale utilizzare a ciascun avvio ed essendo eventualmente costretti a riavviare qualora si volesse anche solo temporaneamente utilizzare l'altro, la virtualizzazione consiste nell'assegnazione di risorse (memoria, CPU, un pezzo di disco) ad un computer appunto virtuale, emulato internamente dall'altro.

In tal modo, utilizzando Linux come sistema operativo principale, si può ‘accendere’ la macchina virtuale, avviando Windows in una finestra a sé stante che non interferisca con il resto del computer se non nelle forme imposte dalla virtualizzazione stessa.

Successi virtuali

Ho sperimentato personalmente e con successo questa situazione, che mi è tornata utile in almeno due momenti: la necessità di utilizzare Microsoft Office per la rendicontazione di un progetto che doveva seguire un ben preciso modello costruito in Excel, con tanto di macro ed altre funzioni per le quali l'OpenOffice.org di allora non forniva sufficiente compatibilità, e più recentemente per recuperare rubrica e messaggi dal mio cellulare non proprio defunto ma nemmeno proprio funzionante.

Ma il mio più grande successo in tal senso è stato un ingegnere incallito che usa per lavoro i computer dai tempi in cui 64K erano un lusso e doveva attendere la notte per poter utilizzare tutti e 256 i kilobyte di una macchina normalmente segmentata per il timesharing. Stiamo parlando di un uomo che è passato al Lotus 1-2-3 quando dal VisiCalc non si poteva più spremere una goccia, per poi restare con il QuattroPro sotto DOS finché i problemi di compatibilità non hanno superato i benefici dell'abitudine, e che ha riformattato il computer nuovo per poter rimettere Excel 95 su Windows XP per limitare al minimo indispensabile i cambiamenti rispetto alla sua macchina precedente.

Stiamo parlando di un uomo che si è convinto a mettere Linux solo dopo la terza irrecuperabile morte del suddetto Windows XP ed il mio ormai totale e definitivo (nonché abbastanza incazzato) rifiuto ad offrirgli il benché minimo aiuto per qualunque tipo di problemi gli si dovesse presentare con la sua beneamata configurazione. (E sinceramente non ne potevo più di sentirmi raccontare ogni volta di come Windows crashava, di come Excel rifiutava di salvare, di come questo, di come quest'altro.) Un uomo che si è convinto soltanto a condizione che (1) potessi trovargli sotto Linux qualcosa che potesse sostituire in maniera integrale le sue due uniche grosse applicazioni (Excel ed AutoCAD) senza fargli perdere alcunché del lavoro svolto fino ad allora e (2) gli offrissi aiuto ogni volta che avesse problemi con il nuovo sistema operativo.

Avendo già aiutato altre persone nella migrazione, il secondo punto non era affatto un problema: chi mi conosce sa bene che non mi sono mai rifiutato di aiutare gente che avesse problemi con il computer, ma ho recentemente maturato la decisione di rifiutare categoricamente aiuto a chi avesse problemi con Windows, con lo specifico obiettivo di far notare che il sistema operativo in questione non è affatto più ‘amichevole’ nei confronti dell'utente.

Per il primo punto, il problema è stato maggiore: anche l'ultima versione di LibreOffice continua ad avere problemi con i complicatissimi fogli Excel di mio padre, e nessun CAD disponibile per Linux regge minimamente il confronto con AutoCAD.

La virtualizzazione è quindi stata la soluzione da me proposta: una installazione pulita di Windows XP con solo i programmi in questione, in una macchina virtuale gestita da Linux; dati salvati in Linux su una directory accessibile come disco di rete dalla macchina virtuale; copia di riserva della macchina virtuale, con cui sovrascrivere quella in uso in caso si sviluppino problemi.

Fortunatamente, il supporto hardware per la virtualizzazione sul suo computer si è rivelato sufficiente ad un comodo utilizzo quotidiano. Sfortunatamente, l'ingegnere in questione non riesce a trovare la pazienza di imparare gli equivalenti in Linux di quell'infinità di piccoli programmini che era solito utilizzare sotto Windows, quindi l'installazione pulita si sporca poco dopo il ripristino, e Linux viene usato quasi principalmente per la navigazione in internet (tramite Firefox).

La situazione è però alquanto soddisfacente, con l'unico neo di non poter usufruire della complessa interfaccia basata su Silverlight che i siti RAI offrono per la visualizzazione delle trasmissioni (in particolare AnnoZero). Di Silverlight, tecnologia della Microsoft, esiste una parziale implementazione in Linux tramite Moonlight, ma a quanto pare il plugin permette solo la visualizzazione della pubblicità, mentre la trasmissione vera e propria rimane inaccessibile.

Anche senza tener conto del fatto che la stessa Microsoft sta pensando di abbandonare .NET e l'associato Silverlight per la prossima versione di Windows, la scelta della RAI (o di chi per lei; a chi è stato appaltato il lavoro della piattaforma web multimediale della RAI? Telecom? ) puzza da lontano di quel tipo di scelte che hanno favorito, nei lontani anni '90, la nascita di quella monocultura web dei danni della quale ho già parlato.

Fuga dalle monoculture

Comincia ora a vedersi la possibilità di un'emersione dalla monocultura Windows, con il diffondersi dei Mac dal lato trendy (e con un pericoloso rischio di sviluppo di una nuova monocultura che si sostituisca a quella esistente), e di Linux dall'altro. C'è da sperare che le quote di ciascun sistema raggiungano livelli tali da risanare l'ecosistema senza rischiare nuove degenerazioni. Fino ad allora, ci sarà sempre un po' di corrente contraria da affrontare, ma nulla di impossibile.

In questo ha sicuramente aiutato molto, come ho già detto, Ubuntu. Ultimamente, però, le nuove uscite sono state significativamente meno attraenti delle precedenti: tra una deriva ‘alla Apple’ dal punto di vista stilistico e funzionale ed alcune scelte troppo sperimentali per una piattaforma che si pone e propone come pronta per gli utonti, consiglio caldamente di permanere saldi sulla 10.04, attendendo con un po' di pazienza il decadere dell'attitudine ‘giocattolosa’ con cui Shuttleworth sta ultimamente gestendo Ubuntu, o magari l'emersione di una nuova alternativa un po' meno ‘coraggiosa’.

Le alternative a Windows ormai ci sono, e sono valide. Ma soprattutto, per fortuna, se ne stanno accorgendo tutti, indicando il superamento dell'ostacolo più grosso a qualunque progresso: il cambiamento di atteggiamento, di mentalità diffusa.


  1. è vero che sono disponibili viewer (per Windows) che non comprendano l'intera suite; è anche vero che la maggior parte dei formati, grazie a notevoli sforzi di reverse-engineering, sono ormai per lo più accessibili anche in altre suite, il problema della “compatibilità completa” rimane, e nonostante essa non sia garantita nemmeno da versioni diverse della suite MS, è anche vero che per i punti più delicati della formattazione altre suite possono differire più sensibilmente. ↩

  2. non si tratta di un errore di digitazione, bensí del termine spesso usato per indicare gli utenti con scarsa dimestichezza con gli strumenti informatici. ↩

API sociali

In attesa di attivare le funzioni interattive del wok (commenti, pagine a modifica libera, etc), ho iniziato a lavorare ad una forma minoritaria di integrazione con i social network. Ho colto l'occasione per ampliare il lavoro già fatto per il mio UserJS per mostrare i commenti di FriendFeed in qualunque pagina, del quale ho creato una versione specificamente mirata al wok. Le sue feature attuali sono

  • lavora su ogni pagina del wok (anche nella mia versione locale), nonché su ogni permalink presente nella pagina;
  • cerca tutte le entries su FriendFeed e su Twitter che facciano riferimento alla pagina/permalink in questione;
  • ignora le entries del sottoscritto, a meno che non abbiamo a loro volta commenti o like.

Il passo successivo che mi sarebbe piaciuto compiere sarebbe stato quello di aggiungere anche i riferimenti via Buzz, ed ho qui trovato il primo intoppo serio: non è possibile cercare activities che facciano riferimento ad un indirizzo preciso; la migliore approsimazione che si possa avere è cercare qualcosa per titolo, ma ovviamente i falsi positivi crescono così in quantità industriale.

Benché non sia certo questo ad impedire a Buzz di decollare come social network, è indubbio che è un limite che non l'aiuta per nulla, in special modo se confrontato con i limiti in cui si inciampa invece cercando di usare le API di FriendFeed1 o di Twitter2.

Lo script è ancora in formato UserJS/GreaseMonkey, scaricabile da qui in attesa di venire integrato con il wok stesso (repository). Testers welcome.

Nel frattempo penserò anche ad un modo intelligente di integrare qualcos'altro.

Aggiornamento al 2011-02-08

Scopiazzando qui e là (in particolare dai link ai social network in fondo agli articoli su Metilparaben) sono riuscito a cavare un piccolo ragno dal buco, scoprendo quali API di Buzz e FaceBook possono essere utilizzate quanto meno per ricavare il numero di commenti/riferimenti/apprezzamenti, se non il loro contenuto. Nel frattempo ho scoperto pure che la ricerca su Twitter restituisce solo contenuti, quindi dopo un po' pagine che sapevo avere riferimenti non ne rivelano.

Solo FriendFeed rimane il mio grande campione di sociabilità. Ma l'ho sempre sostenuto che era il social network “come doveva essere fatto”.


  1. la ricerca su FriendFeed non funziona più da un bel po' di tempo, né via sito, né via API, ed io sono fortunato a dover usare l'API in sola lettura e senza autenticazione, perché a sentire quelli che ci devono lavorare sul serio è messa proprio male. ↩

  2. Twitter attua un processo di risanamento del parametro di callback per JSONP che rende molto macchinoso chiamare una funzione a più parametri preassegnando i valor per i parametri che non sono i dati, cosa necessaria per caricare i dati di ciascun permalink distinguendone chiaramente le origini. ↩

Provveditorato agli Studi di Enna

Get the code for
UserJS/Greasemonkey fixer per il Provveditorato agli Studi di Enna:

gitweb
provvstudienna.user.js
git
provvstudienna.user.js

Il sito del Provveditorato agli Studi di Enna è uno di quei siti istituzionali che, in quanto tale, dovrebbe essere ad accesso universale: dovrebbe, in altre parole, essere (facilmente) consultabile da qualunque browser, testuale, aurale o con interfaccia grafica, per Windows, Linux, Mac OS, Wii, cellulare o quant'altro.

Invece, figlio com'è della monocultura web dell'inizio del millennio, è orribile e disfunzionale. In particolare, una delle sue funzioni più importanti (la presentazione delle “ultime novità”) non solo è esteticamente offensiva, ma per giunta funziona solo in Internet Explorer. In aggiunta, il menu laterale richiede (inutilmente) il pesantissimo Java ed è, nuovamente, disfunzionale: i link ai rispettivi contenuti funzionano (nuovamente) solo in Internet Explorer.

Per vedere se fosse possibile porre rimedio a questi limiti, ho dovuto dare un'occhiata al codice costituente della pagina: vomitevolmente offensivo per qualunque sviluppatore web, è evidente il figlio tipico della cultura del “copincollo facendo presto e male, basta che funziona con IE6” alimentata dalla succitata e fortunatamente ormai passata monocultura web.

Per fortuna, mi è anche stato possibile sistemare almeno il più grosso dei problemi della pagina: uno script utilizzabile sia con Opera sia con l'estensione GreaseMonkey di Firefox e che rende finalmente visibili le news.

Anche il menu soffre di disfunzionalità, sia per la necessità di avere Java, sia per i collegamenti alle voci specificati in versione Windows e quindi inutilizzabili al di fuori della monocultura. Lo script pone rimedio anche a questo sostituendo il menu Java con un semplice menu HTML con opportuno stile CSS e correggendo gli indirizzi di destinazione.

Questo è il meglio che si può fare per ora. In particolare, mi sarebbe piaciuto effettuare la sostituzione del menu prima del caricamento di Java, ma la cosa purtroppo non è possibile con uno script per GreaseMonkey (avrei potuto farlo se mi fossi limitato agli UserJS di Opera).

(Lo sviluppo dello script può essere seguito dal suo repository git)

Monocultura nel web

La conclusione della prima browser war verso la fine dello scorso millennio portò ad una solida monoculura: “tutti” avevano Internet Explorer 6 (IE6) su Windows (generalmente senza nemmeno sapere esattamente cosa fosse, se non vagamente “l'internet” per i più sofisticati).

L'apparente o quantomeno temporanea vittoria della Microsoft divenne rapidamente un problema non solo per quei pochi alieni che non usavano IE1, ma anche per tutti gli omogeneizzati nella monocultura dominante, grazie alle gigantesche falle di sicurezza offerte dal browser Microsoft: l'insicurezza di Windows, avente come vettori principali di attacco proprio il browser ed il consociato programma di posta elettronica2, è un fardello con cui tuttora la Microsoft, e soprattutto i suoi utenti, devono fare i conti.

Ma l'eredità della monocultura d'inizio millennio non si limita a questo. La possibilità di creare “facilmente” pagine web quando non interi siti con strumenti poco adatti (quali ad esempio la suite per ufficio della stessa Microsoft), senza grandi conoscenze tecniche (bastante ad esempio spesso la capacità di usare un motore di ricerca e di copincollare codice) ha comportato la diffusione di pagine web di pessima qualità dal punto di vista tecnico, e soprattutto scarsamente fruibili, generalmente solo per ragioni estestiche, ma troppo spesso anche per questioni funzionali, su browser diversi da quello dominante. (Perché d'altronde sprecarsi per quel misero 10% di outsiders?)

Lo scarso interesse della Microsoft per il web come piattaforma ha fatto sì che la monocultura da lei dominata ne arenasse lo sviluppo, soprattutto in termini di interattività: per cinque anni abbondanti (un lunghissimo periodo in campo informatico), le potenzialità offerte dalla sempre più diffusa banda larga sono esulati dal linguaggio specifico del web (HTML), rimanendo dominio quasi incontrastato di tecnologie “supplementari”, prima Java, quindi Flash, disponibili un po' per tutti, e i famigerati (quanto pericolosi) controlli ActiveX specifici proprio di IE6.

Nel frattempo il lavoro dei pochi campioni di quel misero 10% di outsiders, piuttosto che arrendersi e gettare la spugna, ha lavorato dapprima per coprire il breve passo che lo separava dalle funzionalità offerte dal famigerato browser della Microsoft, e quindi per aggiungere nuove funzionalità. Dalle ceneri degli sconfitti dello scorso millenio è nato un web solido e sempre più capace, lasciando ben presto gli sviluppatori web davanti ad una scelta: creare contenuti seguendo i nuovi standard con un sempre più promettente futuro, o limitarsi alle possibilità offerte dal browser attualmente dominante ma dalla posizione sempre più incerta?

Con l'avvio della transizione al nuovo, la stagnante monocultura ha cominciato a manifestarsi sempre più evidentemente come la pesante palla al piede che era in realtà sempre stata: tecnicamente inferiore e limitante, un peso per gli sviluppatori, ed un pericolo per gli utenti.

Creare pagine web universali si è sempre più rivelato per la sua assurda natura: scrivere codice una volta per “tutti gli altri” e quindi ricorrere a penosi e complessi artifici per qualcosa che era non solo cronologicamente del secolo scorso. Persino la Microsoft stessa, quando il crescente successo delle alternative3 l'ha costretta infine a riprendere lo sviluppo di Internet Explorer, si è trovata ad avere come ostacolo principale proprio gli utenti che, affidatisi allora a programmi sviluppati in forma di controlli ActiveX per IE6, si sono ritrovati a non poter aggiornare il browser salvo perdere l'uso dei suddetti programmi, spesso necessari per lavoro.

E se la reticenza all'aggiornamento è il più grave problema contro cui deve combattere la responsabile della monocultura, l'eredità del “pensiero pigro” che l'ha accompagnata la devono invece pagare quegli utenti che si ritrovano ancora a combattere contro siti che, per loro natura, avrebbero sempre dovuto essere di universale accessibilità, ma che purtroppo, gravemente, non lo sono.

Esempi:


  1. vuoi per volontà, vuoi perché impossibilitati dall'uso di piattaforme diverse da Windows, quali Linux o Mac OS ↩

  2. Outlook Express ↩

  3. soprattutto Firefox, presto seguito da Safari e più recentemente da Chrome (per qualche motivo, la percentuale di utenti Opera non è cambiata mai molto, nonostante la sua superiorità tecnica) ↩

Why a Wok?

Si inizia un blog dando al mezzo di comunicazione il suo valore etimologico: web log, diario in rete. Si scoprono altri usi, dallo sfogo sentimentale allo zibaldone di riflessioni, dalla filosofia alla critica letteraria, dalle analisi storiche, politiche e sociologiche alla narrativa.

Si smette di tenere un blog in genere progressivamente, e per una varietà di motivi. Ma soprattutto, in una forma o nell'altra, per mancanza di tempo: vuoi perché si hanno talmente tante cose da fare da non trovare più il tempo di scriverle; vuoi perché non si hanno molte cose da fare, ed allora ci si dedica alla loro ricerca (e comunque non è che allora abbondi la voglia di scrivere: su cosa si scriverebbe, poi?); vuoi perché si perde semplicemente interesse nel tenere questa finestra aperta sul mondo, e si preferisce un metodo comunicativo più semplice ed immediato, un Twitter o un Tumblr o un abusato FriendFeed, ma soprattutto FaceBook; un metodo, soprattutto, che per ogni intervento non ci faccia sentire quella pressione che può derivare dall'idea di avere dei lettori, dal bisogno, conscio o inconscio, di offrire loro qualcosa di qualità.

Nel mio caso, oltre forse ad un misto di quanto sopra, si è trattato molto di una sensasione di limitatezza del formato preso in considerazione. E benché la limitatezza della piattaforma del mio precedente blog sia stata un forte incentivo a cercare alternative, nessuna di quelle da me viste (da Splinder a LiveJournal, passando per l'ormai diffusissimo WordPress) mi sono sembrate “quello che cercavo”.

Quello che cercavo

Molte delle mie esigenze per il sostituto del mio blog hanno radici nella mia natura molto geek di matematico e soprattutto di programmatore.

Ad esempio, l'esigenza di poter lavorare ai contenuti con i miei abituali strumenti di scrittura: Vim in un terminale, riposante schermo nero con testo bianco, senza fronzoli (se non eventuali sofisticherie quali il syntax highlighting) e distrazioni.

Ad esempio, la possibilità di pubblicare in maniera semplice ed immediata senza nemmeno dovermi preoccupare di aprire un browser.

Ad esempio, la possibilità di avere traccia della storia dei contenuti, tutte le fasi di modifica, tutte le revisioni.

Infine, dal punto di vista più esteriore, qualcosa che offrisse più che la classica interazione scrittore-lettore del blog. Più d'una volta, nella mia trascorsa vita da blogger, mi sono ritrovato tra le mani cose che richiedevano un'interazione più ricca, scambi più approfonditi, proposte o richieste che potessero più semplicemente essere soddisfatte da “gli altri”.

Da queste esigenze nasce il wok.

Il Wok

Rubo il termine dal nome della tradizionale pentola di origine cinese per almeno due buoni motivi. Il primo, di natura fonica, è data dalla somiglianza tra il termine “wok” ed un'opportuna commistione di “wiki” e “blog”. Il secondo, di natura invece funzionale, è legato alla grande flessibilità del Wok in cucina, dove può essere impiegato per tipi di cotture che vanno dal bollito al fritto passando per il rosolato e la cottura a vapore. Come piattaforma per la gestione di contenuti, ci si può aspettare da un wok la stessa flessibilità, e quindi la possibilità di ospitare (tutte) le (principali) forme di espressione (testuale) del web:

  • lo sfogo (localmente) individuale, forse anche “lirico”, per il quale interventi di seconde parti, anche quando benvenuti, rimangono esterni al corpo principale; ovvero i contenuti che caratterizzano in maniera sostanziale il blog;
  • il dibattito tra più parti, un tempo dominio di mailing list e newsgroup ed ormai dominato dai forum, laddove questi sopravvivano;
  • i contenuti che nascono, crescono e si rifiniscono grazie alla collaborazione di più partecipanti, il cui ambiente naturale è una Wiki;
  • banali, classiche, statiche pagine web “1.0”.

La base tecnica di questo wok è ikiwiki, un compilatore per wiki con svariati possibili usi (tra cui forum e blog), i cui contenuti sono semplici file e che si appoggia a sistemi esistenti di revision control (tra cui il mio favorito git) per preservarne la storia. Non è difficile capire perché l'abbia scelto, anche se personalizzazioni e rodaggio (che verranno qui documentati) sono necessari perché da questa base si possa raggiungere la vera natura del wok nella forma e nei modi a me più consoni.

Limiti tecnici correnti

Nella distribuzione ufficiale di ikiwiki mancano le seguenti capacità perché il wok mi sia tecnicamente soddisfacente:

  • categorizzazioni multiple: ikiwiki supporta di default solo i tag, per cui eventuali categorizzazioni supplementari (rubriche etc) devono essere implementate con plugin esterni; cose per cui può servire:
    • una migliore gestione delle collection; il nuovo sistema delle trail aiuta a facilitare la gestione delle collection, ma namespace per i tag aiuterebbero comunque a marcare in maniera più ‘discreta’ i capitoli;
    • draft e wip dovrebbero essere categorie distinte dai tag
  • un indice della pagina corrente nella barra laterale (questo si può risolvere con un po' di javascript, come fatto dal sottoscritto per la pagina principale del wok, ma una soluzione senza javascript sarebbe ovviamente preferibile)
  • barre funzionali multiple (sinistra, destra, sopra)
  • commenti nidificati
  • un modo per specificare quali pagine hanno un foglio di stile (es. demauro.css per le pagine taggate demauro, ma anche per tutte quelle che le comprendono!)
  • specificare in che lingua è ciascun articolo
  • la possibilità di specificare per un insieme di pagine che i link vanno risolti controllando in specifiche sottodirectory (ad esempio, Postapocalittica dovrebe cercare nel proprio glossario); feature implementata come linkbase, ma non ancora integrata nell'ikiwiki ufficiale.

Altri problemi riscontrati:

  • le note a piè di pagina non funzionano correttamente nelle pagine che raccolgono più documenti, poiché MultiMarkdown usa lo stesso anchor per note a piè di pagina con lo stesso numero ma appartenenti a documenti diversi problema facilmente aggirabile: MultiMarkdown utilizza come codice di riferimento quello indicato dall'utente; basta quindi usare codici di riferimento univoci nei documenti: non perfetto, ma funzionale;
  • le pagine autogenerate vengono attualmente aggiunte al sistema di revision control, nel ramo principale problema risolto dal meccanismo delle transient introdotto in recenti versioni di ikiwiki;

MultiMarkown e Ikiwiki

Chi usa MultiMarkdown con IkiWiki normalmente si appoggia alla versione Perl (2.x). Nella sua forma originale, questa presenta alcuni problemi:

  • non supporta correttamente l'HTML5, con problemi che si manifestano tipicamente nell'HTML prodotto da inline nidificate: pagina A include pagina B che include pagina C; risultato il markup di pagina C in pagina A è pieno di tag p fuori posto)
  • non supporta correttamente note a piè di pagina con più riferimenti nel testo: in una tale situazione, le note vengono duplicate, sempre con lo stesso identificativo (ma numeri diversi)

Entrambe questi problemi sono risolti nel fork di MultiMarkdown matenuto dal sottoscritto.

Irretiti invisibili significanti

Un essere umano che leggesse il calendario delle Letture di San Nicolò l'Arena non avrebbe grosse difficoltà ad identificarlo come tale. Fino a qualche giorno fa, un programma che ‘leggesse’ quella stessa pagina non ne avrebbe invece potuto estrarre i dati essenziali (ovvero le date ed i temi degli appuntamenti).

In questi termini, ciò che differenzia la macchina dall'uomo non è tanto un diverso rapporto tra qualità e quantità d'informazione, quanto piuttosto la diversa forma: la mente umana ha più agio nella comunicazione verbale (orale o scritta) composta in un linguaggio naturale, che è invece notoriamente difficile da elaborare automaticamente (e non parliamo poi dell'informazione visiva).

Mi soffermo sulla forma piuttosto che sulla qualità dell'informazione perché una valutazione qualitativa della comunicazione informale può essere solo contestuale, ed è intrinsecamente soggettiva (ma esistono valutazioni qualitative che non lo sono?). Ad esempio: la (potenziale, e talvolta ricercata) ambiguità del linguaggio naturale aumenta o diminiuisce la qualità dell'informazione comunicata?

Una visione della Rete —e qui parliamo di qualcosa che sicuramente interresserebbe lo Sposonovello, e forse anche Tommy David, ma non certo, ad esempio, Yanez— come mezzo universale per dati, informazione e sapere (secondo il suo padre fondatore Tim Berners-Lee) deve quindi scendere a patti con il fatto che la fruibilità umana e quella automatica hanno richieste ben distinte; e per lungo tempo (e per ovvi motivi) quella umana ha avuto un'alta priorità, rendendo arduo il compito, ad esempio, di quei motori di ricerca (meccanismi automatici di raccolta ed elaborazione (indicizzazione) delle informazioni) cui gli esseri umani stessi si appoggiano per trovare le informazioni di cui vorrebbero fruire.

Se gli esseri umani devono passare attraverso i computer per trovare le informazioni scritte da altri esseri umani, ed i computer non sono (facilmente) in grado di ‘comprendere’ le stesse informazioni, è evidente che si pone un problema. Ed è altrettanto evidente che, nell'attesa che la singolarità tecnologica porti ad un'intelligenza artificiale (che si speri non degeneri in Skynet) in grado di interpretare autonomamente le forme d'informazione umanamente fruibili, è necessario che chi produce le informazioni stesse (e quindi, gli esseri umani stessi) le presentino in una forma consumabile dalle macchine. Ma se l'utenza finale è sempre un altro essere umano, è evidente che le forme umanamente ostiche offerte da certe proposte per la costruzione della Rete ‘semantica’ non sono meno problematiche di quelle attuali.

Una promettente soluzione in questo senso è quella di nascondere le informazioni per le macchine in tutta quella montagna di metainformazioni che sono già presenti (per altri motivi) nelle pagine che propongono contenuti per gli esseri umani: nascono così i microformati, che permettono allo stesso calendario di essere fruibile dalla macchina.

E adesso ho la smania di microformatizzare il mio blog, ma l'unica cosa di nota che sono riuscito a fare è stato aggiungere i tag XFN al blogroll. (Rimando ad altra sede una dissertazione sull'utilità.)