Sparkling wok, episode 3

Moving away from sparklines

← Sparkling wok, episode 2 | | Plotting sparklines with XSLT →

I'm cheating a bit with the title of this article, but I feel that it fits the general theme, even if it's about “moving away from sparklines”. No, I'm not removing the sparklines I introduced less than a year ago, and improved upon shortly after. The issue came from my desire to present a visualization of data for which sparklines could have been appropriate, but I felt that my expertise with them was not up to par to make it justice.

The data in question was prepared (and briefly discussed) in my recent article (in Italian) about what drives my choice of language for an article, a topic quite dear to me, as shown by the previous article (in English) on the same topic.

The plot in question is the following:

A diverging bar graph showing the number of articles per language for each year. — Languages used in the Wok over time (2004–2025)

The intent was to show visually that even though Italian (blue) has historically been (and still remains) my primary writing language, there is a growing trend for the number of articles written in English (red), that have at time even surpassed my first language, particularly in “slower” years.

So here's the big question: could a similar plot be represented through sparklines? The answer is “technically yes”: in fact, if you look at the script I use to generate the article distribution sparklines, the BEGIN block even contemplates the possibility for negative values (to be shown in the line under the main one), so one possibility would be to produce a sparkline for the blue language, one for the red language, and show them one on top of the other.

However, I feel that this violates the point of sparklines, which is to present data in a way that can fit inline with the text, per Tufte's original idea. (Never mind the fact that my article distribution sparklines also violate the point.)

Additionally, that single value approaching 100 in 2012, with the simplest linear conversion of values to bar height would completely flatten out all other years: the Unicode characters I'm using for sparklines only offer 8 possible values (plus 0), so with a maximum of 97 we can expect anything below 18 to be flattened to 1, and so on. This is frustrating enough already in the commits sparkline where the maximum is 77 and there are still several months that have comparable values —in this case it would be even more frustrating.

I guess things could improve by plotting more granular information (per year-month rather than just year, just like the commit sparklines), but ultimately I decided to go with an actual plot —which opened a whole new can of worms.

There's apparently a dearth of tools that allow you to produce a nice, clean, SVG “divergent bar graph plot” from some simple tabulated data. So I had to roll my own. On the upside, it was actually quite easy to customize for my purposes.

The curious can find here the shell script to extract the information, here an auxiliary awk script used by collect the statistics, and finally here another awk scripts that converts the statistics into an SVG. I guess this last script may be useful for others too, although it will need some heavy customization to be made more general.

Of course, once the statistics have been collected, there's other ways to present the same data. For example, rather than looking at the individual language totals, we may be interested in the percentages: what fraction of all the articles published each year are in one language rather than the other? (This is the script to generate the next plot.)

Stacked bar graph showing the percentage of articles per language for each year. — Percentage of posts by language in the Wok over time (2004–2025)

This plot is more amenable to sparklining, at least if we accept to ignore that single page in latin and the anomalies posed by the empty years. By putting in the sparkline only the native language statistics, it would look something like this: █ ████▅█▇▇▄▇ ▆▆▄▃▅▆▇.

This sparkline does give an “idea at a glance” abut the distribution of the native language posts (which is what they are for), although the gaps are grating, and the low vertical resolution allowed by the blocks kills off a lot of detail. In this sense —again— my sparklines end up violating some of Tufte's directives. Specifically, going by his notebook on the topic, the key takeaway is that sparklines should maximize data, minimize design. And at least in the vertical direction, the coarsening caused by the choice of using Unicode blocks is simply too much.

It really looks like, if I really want to lean in into the sparkline thing in a way that wouldn't appal Tufte, I'll have to look into the generation of inline SVGs, although I'll make sure to stay away from the JavaScript-based solutions which are linked to from his aforementioned notes.