## The question

How many digits do you need, in base 10, to represent a given (binary) number?

## A premise

The C++ standard defines a *trait* for numerical datatypes that
describes “the number of base-10 digits that can be represented by a
given type without change”: `std::numeric_limits::digits10`

.

What this means is that all numbers with at most that many digits in
base 10 will be representable in the given type. For example, 8-bit
integers can represent all numbers from 0 to 99, but not all numbers from
0 to 999, so their `digit10`

value will be 2.

For integer types, the value can be obtained by taking the number of bits (binary digits) used by the type, dividing by ${log}_{2}\left(10\right)$ (or multiplying by ${log}_{10}\left(2\right)$, which is the same thing), and taking the integer part of the results.

This works because with $n$ bits you can represent ${2}^{n}$
values, and with $d$ digits you can represent ${10}^{d}$
values, and the condition for `digit10`

is that $d$ should be such that
${10}^{d}\le {2}^{n}$. By taking the logarithm on both sides we get $d\le {log}_{10}\left({2}^{n}\right)=n{log}_{10}\left(2\right)$, and since $d$ must be an integer, we get the
formula $d=\lfloor n{log}_{10}\left(2\right)\rfloor $.

(Technically, this is still a bit of a simplification, since actually the highest representable number with $n$ bits is ${2}^{n}-1$, and that's still only for unsigned types; for signed one things get more complicated, but that's beyond our scope here.)

## The answer

What we want is in some sense the complement of `digit10`

, since we want
to ensure that our number of (decimal) digits will be sufficient to represent all
numbers of the *binary* type. Following the same line of reasoning above,
we want $d$ such that ${2}^{n}\le {10}^{d}$, and thus, skipping a few passages,
$d=\lceil n{log}_{10}\left(2\right)\rceil $, at least assuming unsigned integer
types.

We're looking for the simplest formula that gives us the given result.
With C++, we could actually just use `digits10`

plus one, but we want
something independent, for example because we want to use this with C (or
any other language that doesn't have a `digits10`

equivalent).

The first thing we want to do is avoid the logarithm. We could compute the actual value, or at least a value with sufficient precision, but in fact we'll avoid doing that, and instead remember that ${2}^{10}$ is pretty close to ${10}^{3}$, which puts the logarithm in question in the $\frac{3}{10}$ ballpark, an approximation that is good enough for the first several powers of 2^10.

With this knowledge, we can approximate $d\cong \lceil n\cdot \frac{3}{10}\rceil $.
In most programming languages integer division with positive operands
returns the *floor* rather than the ceiling, but it can be turned into
something that returns the ceiling by adding to the numerator one less
than the denominator1. So:

$$d=\lfloor \frac{n\cdot 3+9}{10}\rfloor $$

is the formula we're looking for. In a language like C, where the size of types is given in bytes, that would be come something like

`#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*3+9)/10)`

where we're assuming 8 bits per byte (adapt as needed if you're on an insane architecture).

## Limitations

The C expression provided above isn't universal. It is better
than the even more aggressive approximation `sizeof(type)*CHAR_BIT/3`

,
which for example fails for 8-bit bytes (gives 2 instead of 3) and
overestimates the result for 64-bit data types (gives 21 instead of 20),
but it's not universal.

It works for *most* standard signed data types, because the number of
base-10 digits needed to represent them is *almost always* the same as
their unsigned equivalents, but for example it doesn't work for 64-bit
data types (the signed ones need *one less digit* in this case).

Moreover, it actually starts breaking down for very large integers, because the $\frac{3}{10}$ approximation commits an error of about 10% which starts becoming significant at 256 bits or higher: the formula predicts 77 digits, but 78 are actually needed.

We can expand this by taking more digits to approximate the logarith. For example

`#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*301+999)/1000)`

doesn't break down until 4096 bits, at which point it misses one digit again. On the other hand

`#define PRINT_SIZE(type) ((sizeof(type)*CHAR_BIT*30103+99999)/100000)`

can get us *reasonably* high (in fact, by a quick check it seems this
formula should work correctly even for types with ${2}^{{2}^{16}}={2}^{65536}$
bits, if not more). It also has a nice symmetry to it, even though I
guess it would overflow on machines with smaller word sizes (but then
again, you probably wouldn't need it there anyway).

If $a,b$ are non-negative integers with $b>0,then\$\lfloor \frac{a+b-1}{b}\rfloor =\lceil \frac{a}{b}\rceil $: (1) if $a$ is a multiple of $b$, then adding $b-1$ doesn't go to the next multiple, and thus on both sides we have $\frac{a}{b}$ (which is an integer) and (2) if $a$ is

*not*a multiple of $b$, adding $b-1$ will overtake*exactly*one multiple of $b$. More formally, we can write $a=kb+c$ where $k,c$ are non-negative integers, and $c<b$ ($c=0$ if $a$ is a multiple, and $c>0$ otherwise). Define $s=sign\left(c\right)$, i.e. $s=0$ if $c=0$ and $s=1$ otherwise. Then $\lfloor \frac{a+b-1}{b}\rfloor =\lfloor \frac{kb+c+b-1}{b}\rfloor =\lfloor (k+1)+\frac{c-1}{b}\rfloor =k+1-(1-s)=k+s$ and $\lceil \frac{a}{b}\rceil =\lceil \frac{kb+c}{b}\rceil =\lceil k+\frac{c}{b}\rceil =k+s$. ↩