The unbearable lightness of text, part 2

OK, maybe some times I _do_ play favorites, but that's not really the reason why 7SEEDS got a special treatment.

I have previously discussed how I try to keep the Wok “lightweight” by design, and how a big part of this is achieved thanks to a distinct lack of images and other so-called media files (audio/video in particular) —at least until recently.

Things first changed with my first comic, but not enough to get me convinced about the opportunity to add screenshots to my game review. In fact, before today not even my review of 7SEEDS had images, despite my unbounded appreciation for the manga, nearly bordering the obsessive —although to be fair the review itself focuses on everything but the art of the manga in question, so the absence of even a small sample wasn't even that out of place.

If you visit the 7SEEDS review now, though, you'll notice that it does feature images in its updated form. Yet curiously, these are not in the section where art is discussed (and yes, they are samples taken from the manga, so it might make sense to discuss them there). Even curiouser, those images have not been added to the Wok for the review, but intended for a largely unrelated article (in Italian) derived from a recent Mastodon thread by yours truly.

In fact, those samples were originally intended only for the Mastodon thread —as I mentioned on the previous part of this series, I am considerably more liberal in adding images to my toots, although I still have a preference for plain text there too. When the time came to collect the thread and massage it into a more presentable article, I had to make a decision on what to do about those images (aside from regenerating them from the original images, given the questionable way in which images are processed by Mastodon).

My approach to this was to recreate the images from the sources I had at hand, and evaluate their inclusion based on size, plus some additional considerations that I'll present below.

As it turns out, the original scanlation that I got the panels from had the images in PNG format. This allowed me to crop the panels I was interested in from the comfort of whatever image editor I had at hand (otherwise, I would have had to use something like jpegtran's lossless crop for JPEG, with a … less optimal UI, and a possible impact on the crop region position and size due to the limitation of DCT boundaries for this operation to not introduce additional compression artifacts).

After appropriate crushing to minimize the PNG file size, and an additional pass through optipng to strip unnecessary packets introduced by the image editor, I was left with a 234K file for the larger crop, and a 37K one for the smaller one. (For comparison, a reminder that my first image is 134K in PNG format, and 40K in JPEG XL.)

Fun fact: these images have, apparently, only 17 distinct colors. This is frustrating: had it been one single color fewer, the color palette could have been reduced from being indexed in 8 bits to being indexed in only 4 bits, packing two pixels per byte in uncompressed form, with likely a compression advantage.

By the way, since these images use palettes, there's a possibility for improving compression by reordering the colors in the palette although I'm not aware of any common-use free software program that actually does that. (If anybody does know of one, I wouldn't mind a heads up.)

Of course this would be more interesting to see in action on the larger crop, rather than the smaller one, given the respective file sizes, and of course there are a lot of other approaches that I could have taken, including more aggressive cropping (e.g. by include only Takashi's response on the second image, or Kiichi's question on the first one), or even overcoming my “loss aversion” and going with the removal of a single grey tone.

In fact, even adoption of a lossy image format could have made sense, since who knows how many manipulations the image has gone through before becoming the “source” for me, and even before that, considering the fact that the dithering that realises some of the halftones is more likely to be a “print and scan” artifact than the artist's intention. But —what's the best way to put it?— I am probably even less willing to take on any responsibility for the degradation of digital data than I am to consider uploading larger files to my website.

To make things even worse, the JXL version of these images, while smaller than the PNG, are not exceptionally so: about a 30% gain, compared to the massive 70% reduction for my parody comic (to be completely fair, I didn't know about optipng when I did the latter, but now that I've tried, that would only give me an extra 1%, so I won't actually update it —not worth the extra space in the repository.) Given my interest in promoting the new image file format, adding these images meant adding nearly 460K to the repository: 10% of the entire source tree!

The biggest argument in favor of allowing these images in was (unsurprisingly) the fact that they could be (re)used in my review of 7SEEDS, and to get an idea of how good of an argument that is, consider that the review itself, in source form, weight at 72K, 10K more than the PNG and JXL versions of the second image together. Add in the 25K for the source of the article the images were intended for, and the size of the images, while still large, becomes less impressive.

(And I'm not even counting the size of this article, which I might have not written if I hadn't decided to add the new images, and yes, I did consider that I would be writing it, when making the decision, even if I had no idea how big it would turn out to be —and that's nearly an additional 15K, as things stand now.)

Interestingly, even at 234K the largest image in PNG format is not the largest file in the repository. This is also counted as a point in favor, even if the only text file still larger than an image is a 50,000-words long incomplete draft of a long-form fictitious prose narrative (“aspiring novel”, maybe?) —and yes, it's in the repository, but no, I'm not ready to share a link to the draft just yet.

The second-longest text file, that now sits (in size) between the PNG and JXL versions of the images which are the topic of this article, is 232K (barely 2K less than the PNG monster), and is also an incomplete draft of a long-form fictitious prose narrative, currently standing at over 40,000 words, and that could easily pass the infamous PNG giant if I ever get back to work on it. And this is just Part #1 of a series I have planned spanning at least four parts plus at least a prologue.

Now, I'm not going to deny that my passion for 7 SEEDS might have biased my decision to include the images, and it's quite possible that every excuse I've come out with is just an attempt at rationalization.

On the one hand, if the passion for the manga was the prime mover, the review would have already given me an excuse to add the image a couple of years ago, especially considering that the long incomplete drafts that compete with the images in size had already been in the repository for years when I started drafting the review. On the other hand, maybe two years ago I wasn't ready yet to consider including images into the repository yet, and the necessity to self-host my parody comic had to manifest, if not else to allow me to reconsider my opinion on hosting raster images at a more personal level.

On a third hand, there's also the fact that for the review I would have picked different images, more representative of Tamura Yumi's art, which would have meant looking at larger hosting requirements, not unlike the ones for the Teslagrad screenshots1, as the focus would have been on full-page, detailed scenes rather than just a couple of scattered panels.

And on the fourth hand, to be completely fair, those huge text files I happen to have in my repository are … a bit unusual.

To understand why, the simplest thing to do is to compare them with other incomplete long-form fictitious prose I've dabbled with over the years here on the Wok: although none of these works is complete, they have all been “serialized” in the fashion characteristic of web novels, with each chapter in its own file, and a typical length between 4K and 7K characters per chapter. The longest of these works builds up to over 280K, which would take second place in file size even counting the new images if it hadn't been scattered across 54 files.

So, the main reason why I have such massive text files in the repository is that … I haven't split them in chunks yet. Had I worked on these long-form fictitious prose narratives the same way I did with the others, they would have also been split into chunks (chapters), and from a quick check the average length of those chunks would have been more or less in the same range as the ones linked above. I would still have some “rather large” text files (the 7 SEEDS review being probably the most prominent example), but nothing that could really compete with the larger image files.

It's even weirder when one considers that the reason why those2 particular works of fiction have been kept in single file (and have not “published” yet, not even partially) isn't a change of mind regarding how to approach writing and sharing the content, but rather from arguably silly “layout issues”: each for their own reason, they have special requirements in terms of presentations of part of the text, and I got stymied thinking of the way to obtain this both at the markup level source-side, and on the CSS side for the final output production.

(This is actually one of the many reasons why I've been thinking of moving away from my current production setup, as I've hinted previously here and there, and one of the reasons why I'd like to move to a static site builder based on AsciiDoctor, which might make it easier to tackle these aspects.)

When I say I didn't change my mind about how to approach writing, I do mean it —even in face of the OBTF/BATF movement that has been doing the rounds of the internet in the last decade for everything from to-do list to note-taking, almost in protest to the enshittification of many useful tools. And I say this as a huge fan of text in general (even for things that aren't), but also as a strong believer of “the right tool for the right job”. And for writing, as I've discussed already, I have a preference for releasing smaller chunks at once.

There are obvious downsides to the serial release, particularly if the whole story hasn't been finalized yet: you may get to a point where you need to revise some of the things you've written, to tighten things up or plug some continuity holes or because you may want to change the order in which things happen or are presented to the reader, and while it is relatively easy on the web (especially in a case such as mine, where one has full control of the medium), one wouldn't generally say that it is preferable.

(I don't follow many webnovels, but I do read a lot of webcomics, and not just page-episodic ones, and while I have come across the sporadic warning about past pages having been updated, I'm really amazed by how rarely this has happened. I know there's a lot of planning ahead, but still.)

So it's natural to accumulate several releases worth of updates as unreleased draft, and only beyond a certain point of maturity, start to release them issue by issue while more drafting is done in advance. Without going further off-topic, the question is then: when it comes to a long-form fictitious prose narrative, is it better to have said unreleased drafts as a single file, or would it be better to have them already roughly partitioned into separate chapters? Or maybe something in-between?

I'll leave the discussion for another article more focused on this particular aspect (particularly in reference to my personal approach to writing). Here, I'll just remark that the single file approach has a peculiar side effect: as the individual parts that are mature enough starts getting released, and thus moved from the “big file” to each individual chapter, the current “primacy of text” will decline, even if the total amount of text won't actually decrease, simply by virtue of the single files being dismembered. And I'm not sure I feel particularly happy about that. But it's a bridge we'll cross when we come to it.

OK, not exactly: I'm not sure yet which pages I'd use for the review, but the scanlation I've read is from high-quality scans, so the images are high-resolution, but only a third of them is in PNG format, the rest are JPEGs, so the typical file size seems to hover between 300K and 400K. ↩
and a few others that however have only grown to more modest sizes, in the few tens of thousand characters. ↩