Printing floating points
Berni44
someone at somemail.com
Tue Jan 26 16:25:54 UTC 2021
Being the author of the PR mentioned here several times, I feel
somewhat obliged to write something too.
On Saturday, 23 January 2021 at 00:13:16 UTC, deadalnix wrote:
> Related paper: https://dl.acm.org/doi/pdf/10.1145/3360595
That's great. I'll have a deeper look into it, whether there is
something, that can be used for our implementation of printf.
From first glance: It overcomes some of the speed problems I
encountered and where I currently am trying to find something
better. The drawback of RYU printf is the large amount of memory
needed for the tables involved. The paper states 104KB for
doubles (can be compressed on the expense of slowing everthing
down). For reals it will be even more. We should make sure, that
we do not blow up the executables more than needed. We will
probably eventually have to decide between speed and code size -
or we'll have to find some sort of a compromise.
On Monday, 25 January 2021 at 04:27:49 UTC, Walter Bright wrote:
> One way is to randomly generate floating point bit patterns and
> compare the output with that of print formatting on C.
The last commit of my PR adds a test, that is doing exactly this.
I would be happy, if people could run that test and report
differences. (There will be some, because the snprintf
implementations are known to contain bugs - see my comments above
the test.)
To do so, check out the PR, uncomment the line "version =
printFloatTest;" and run the unittests of std.format. The test
lasts for 30 minutes (you can configure this by changing the
value of "duration"). After that it prints some summary. (Feel
free to change the test to make it meet your needs.)
On Monday, 25 January 2021 at 09:17:25 UTC, Ola Fosheim Grøstad
wrote:
> You only need to test the boundary cases, so one can do this,
> no problem.
I fear, with this approach you'll have a hard time: There are so
many boundary cases, it's easy to miss some. - You need to test
all combinations of five flags (-, #, +, space, 0), width,
precision, rounding mode, different algorithms depending on the
size of the exponent, nans, infs, subnormals, zero,
float/double/real(s) and probably more.
On Monday, 25 January 2021 at 21:40:01 UTC, Bruce Carneal wrote:
> If I'm wrong, if it really is as you say "no problem", then
> the RYU author sure wasted a lot of time on the proof in his
> PLDI 2018 paper.
There is a subtle, but important difference between proving
correctness for RYU and printf-implementations: For RYU, the
theorem to prove is simple to state: Find one shortest
representation of a number, that works in round robin.
But what would be the equivalent theorem for printf? Produce the
same result, that snprintf produces? But which implementation of
snprintf? And what exactly does snprintf produce? Can you write
that down? And wouldn't it make more sense to have an
implementation that fixes the known bugs in snprintf instead of
copying them? (For example, I read about an implementation where
snprintf("%.2f",double.inf) produces "#J" instead of the expected
"inf". Not only adding "#", but also the "I" of inf has been
rounded up to "J"... *lol*, or printing different results for
identical numbers, just because one is float and the other is
double.)
I'm not completely sure, I have not found the time to read the
paper about RYU printf carefully, so I might be wrong, but from
first glance I think they only prove, that the algorithm produces
the correct digits. It ignores all the flags and stuff which make
up about 90% of the PR.
The heart of the PR (which could be replaced by RYU printf) is
this (version for numbers, that have no fractional part - see
below for fractions):
> int msu = 0;
> while (msu < count - 1 || mybig[$-1] != 0)
> {
> ulong mod = 0;
> foreach (i;msu .. count)
> {
> mybig[i] |= mod << 60; // (a)
> mod = mybig[i] % 10; // (b)
> mybig[i] /= 10; // (c)
> }
> if (mybig[msu] == 0)
> ++msu;
>
> buffer[--left] = cast(byte) ('0'+mod);
> }
It's basically: Divide by 10 and print the reminder until you
reach zero (printing from right to left). The algorithm is a
little bit more complicated, because the numbers involved are up
to 1000 digits long and cannot be saved anymore in a single ulong.
Going a little bit more into details here: The number which
should be print is given in mybig[] as an integer (similar to the
implementation of BigInt). The outer loop produces one decimal
digit every time it is executed - the digit is saved in buffer
(last line).
The inner loop is a division by 10 for big numbers. It is quite
similar to a long division, like we learned it in school (here
division by 3):
736 : 3 = 245 reminder 1.
6
-
13
12
--
16
15
--
1
First, we divide (7 / 3 = 2 reminder 1) and add another digit to
the reminder (1 => 13) and so on. The same is done in the
algorithm: (a) adds another digit, (b) calculates the reminder
and (c) divides by 10. (If you worry about the reordering of the
steps: This has the benefit, that the loop is without branch and
therefore faster; in the first step a zero is added, which
doesn't harm.). Finally, the last reminder is the next digit.
(And the if below the inner loop is for speed reasons.)
As for fractional digits: Here a similar algorithm is used, but
this time the number is multiplied each round by 10 and the
overflow leads to the next digit.
Don't know, if this will comfort you or not, but: This algorithm
*is* already part of Phobos since 7th of Februar 2020 and hence
in the stable version since v2.091. It's already used for
printing floats with %e. The PR is about adding it for %f too.
More information about the Digitalmars-d
mailing list