10 digits

The question

How many digits do you need, in base 10, to represent a given (binary) number?

A premise

The C++ standard defines a trait for numerical datatypes that describes “the number of base-10 digits that can be represented by a given type without change”: std::numeric_limits::digits10.

What this means is that all numbers with at most that many digits in base 10 will be representable in the given type. For example, 8-bit integers can represent all numbers from 0 to 99, but not all numbers from 0 to 999, so their digit10 value will be 2.

For integer types, the value can be obtained by taking the number of bits (binary digits) used by the type, dividing by ${log}_{2} (10)$ (or multiplying by ${log}_{10} (2)$ , which is the same thing), and taking the integer part of the results.

This works because with $n$ bits you can represent $2^{n}$ values, and with $d$ digits you can represent $10^{d}$ values, and the condition for digit10 is that $d$ should be such that $10^{d} \leq 2^{n}$ . By taking the logarithm on both sides we get $d \leq {log}_{10} (2^{n}) = n {log}_{10} (2)$ , and since $d$ must be an integer, we get the formula $d = ⌊ n {log}_{10} (2) ⌋$ .

(Technically, this is still a bit of a simplification, since actually the highest representable number with $n$ bits is $2^{n} - 1$ , and that's still only for unsigned types; for signed one things get more complicated, but that's beyond our scope here.)

The answer

What we want is in some sense the complement of digit10, since we want to ensure that our number of (decimal) digits will be sufficient to represent all numbers of the binary type. Following the same line of reasoning above, we want $d$ such that $2^{n} \leq 10^{d}$ , and thus, skipping a few passages, $d = ⌈ n {log}_{10} (2) ⌉$ , at least assuming unsigned integer types.

We're looking for the simplest formula that gives us the given result. With C++, we could actually just use digits10 plus one, but we want something independent, for example because we want to use this with C (or any other language that doesn't have a digits10 equivalent).

The first thing we want to do is avoid the logarithm. We could compute the actual value, or at least a value with sufficient precision, but in fact we'll avoid doing that, and instead remember that $2^{10}$ is pretty close to $10^{3}$ , which puts the logarithm in question in the $\frac{3}{10}$ ballpark, an approximation that is good enough for the first several powers of $2^{10}$ .

With this knowledge, we can approximate $d ≅ ⌈ n \cdot \frac{3}{10} ⌉$ . In most programming languages integer division with positive operands returns the floor rather than the ceiling, but it can be turned into something that returns the ceiling by adding to the numerator one less than the denominator1. So:

$d = ⌊ \frac{n \cdot 3 + 9}{10} ⌋$

is the formula we're looking for. In a language like C, where the size of types is given in bytes, that would be come something like

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*3+9)/10)

where we're assuming 8 bits per byte (adapt as needed if you're on an insane architecture).

Limitations

The C expression provided above isn't universal. It is better than the even more aggressive approximation sizeof(type)*CHAR_BIT/3, which for example fails for 8-bit bytes (gives 2 instead of 3) and overestimates the result for 64-bit data types (gives 21 instead of 20), but it's not universal.

It works for most standard signed data types, because the number of base-10 digits needed to represent them is almost always the same as their unsigned equivalents, but for example it doesn't work for 64-bit data types (the signed ones need one less digit in this case).

Moreover, it actually starts breaking down for very large integers, because the $\frac{3}{10}$ approximation commits an error of about 10% which starts becoming significant at 256 bits or higher: the formula predicts 77 digits, but 78 are actually needed.

We can expand this by taking more digits to approximate the logarith. For example

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*301+999)/1000)

doesn't break down until 4096 bits, at which point it misses one digit again. On the other hand

#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*30103+99999)/100000)

can get us reasonably high (in fact, by a quick check it seems this formula should work correctly even for types with $2^{2^{16}} = 2^{65536}$ bits, if not more). It also has a nice symmetry to it, even though I guess it would overflow on machines with smaller word sizes (but then again, you probably wouldn't need it there anyway).

If $a, b$ are non-negative integers with $b > 0$ , then $⌊ \frac{a + b - 1}{b} ⌋ = ⌈ \frac{a}{b} ⌉$ : (1) if $a$ is a multiple of $b$ , then adding $b - 1$ doesn't go to the next multiple, and thus on both sides we have $\frac{a}{b}$ (which is an integer) and (2) if $a$ is not a multiple of $b$ , adding $b - 1$ will overtake exactly one multiple of $b$ . More formally, we can write $a = k b + c$ where $k, c$ are non-negative integers, and $c < b$ ( $c = 0$ if $a$ is a multiple, and $c > 0$ otherwise). Define $s = s i g n (c)$ , i.e. $s = 0$ if $c = 0$ and $s = 1$ otherwise. Then $⌊ \frac{a + b - 1}{b} ⌋ = ⌊ \frac{k b + c + b - 1}{b} ⌋ = ⌊ (k + 1) + \frac{c - 1}{b} ⌋ = k + 1 - (1 - s) = k + s$ and $⌈ \frac{a}{b} ⌉ = ⌈ \frac{k b + c}{b} ⌉ = ⌈ k + \frac{c}{b} ⌉ = k + s$ . ↩