Web lists-archives.com

Re: [Mingw-users] msvcrt printf bug




Hello List (and Alexandru)!

I see that I'm coming late to the party, but I would like to chime in.

On Sat, Jan 14, 2017 at 9:02 AM,  <tei@xxxxxxxxx> wrote:
>
> Hello,
> I encountered a loss of precision printing a float with C printf.
> ...

There are a couple of common misconceptions running through
this thread that I would like to correct.

Short story:  Floating-point numbers are well-defined, precise,
accurate, exact mathematical objects.  Floating-point arithmetic
(when held to the standard of the "laws of arithmetic") in not
always exact.

Unfortunately, when you conflate these two issues and conclude
that floating-point numbers are these mysterious, fuzzy, squishy
things that can never under any circumstances be exact, you create
the kind of floating-point FUD that runs through this thread, this
email list, and the internet in general.

(This kind of FUD shows up in a number of forms.  You'll see,
for example:  "Never test for equality between two floating-point
numbers."  "Never test a floating-point number for equality with
zero."  "Floating-point numbers are fuzzy and inexact, so equality
of floating-point numbers is meaningless."  When you see this on
the internet (and you will), don't believe it!  Now, it's true that
when you have two floating-point numbers that are the results
of two independent chains of floating-point calculations it
generally won't make sense to test for exact equality, but
there are many cases where it does.)

Worse yet (as is typical with FUD), when people call out this
FUD (on the internet, not just this list), they get attacked
with scorn and ad hominem arguments.  So, to Alexandru Tei (the
original poster) and Emanuel Falkenauer, stand your ground, you
are right!  (And don't let the personal attacks get you down.)

Let me start with an analogy:

Consider the number 157/50.  It is a well-defined, precise, accurate,
exact mathematical object.  It's a rational number, which makes it
also a real number (but it's not, for example, an integer).  There
is nothing fuzzy or imprecise or inaccurate about it.

It has as its decimal representation 3.14 (and is precisely equal to
3.14).  Now if we use it as an approximation to the real number pi,
we find that it is an inexact value for pi -- it is only an approximation.

But the fact that 3.14 is not exactly equal to pi doesn't make 3.14
somehow squishy or inaccurate.  3.14 is exactly equal to 314 divided
by 100 and is exactly equal to the average of 3.13 and 3.15.  It's
just not exactly equal to pi (among many other things).

Now it is true that the vast majority of floating-point numbers
running around in our computers are the results of performing
floating-point arithmetic, and the large majority of these numbers
are inexact approximations to the "correct" values (where by correct
I mean the real-number results that would be obtained by performing
real arithmetic on the floating-point operands).  And anybody
performing substantive numerical calculations on a computer needs
to understand this, and should be tutored in it if they don't.

(By the way, Alexandru asked nothing about floating-point calculations
in his original post, and everything he has said in this thread indicates
that he does understand how floating-point calculations work, so I have no
reason to think that he needs to be tutored in the fact that floating-point
arithmetic can be inexact.)

Alexandru asked about printing out floating-point numbers.  People
have called this the "decimal" or "ASCII" representation of a
floating-point numbers.  I will stick to calling it the "decimal
representation," by which I will mean a decimal fraction, of
potentially arbitrary precision, that approximates a given
floating-point number.

In keeping with my point that floating-point numbers are well-defined,
precise mathematical values, it is the case that every floating-point
number is exactly equal to a single, specific decimal fraction.

Alexandru complains that msvcrt doesn't use as its decimal representation
of a floating-point number the decimal fraction that is exactly equal to it.
This is a perfectly legitimate complaint.

Now an implementation can use as its decimal representation of floating-point
numbers whatever it wants -- it's a quality-of-implementation issue.
The implementation could always print out floating-point numbers with
two significant decimal digits.  Or it could use ten significant digits,
and add three to the last significant digit just for fun.  But there is
a preferred, canonically distinguished decimal representation for floating-point
numbers -- use the  unique decimal fraction (or ASCII string or whatever you
want to call it) that is exactly equal to the floating-point number.  "Exactly
equal" -- what could get more canonical than that?

In fairness, I don't consider this to be a particularly important
quality-of-implementation issue.  I do prefer that my implementations
use the canonically distinguished decimal representation, but I don't care
enough to have retrofitted mingw to do so (or to set the _XOPEN_SOURCE
flag).  But it isn't hard to get this right (i.e., to use the canonically
distinguished representation), and apparently both glibc and Embarcadero/Borland
have done so.

I would argue that the glibc/Borland implementation is clearly better in
this regard than that of msvcrt, and that there is no basis on which one
could argue that the msvcrt implementation is better.  (Again, in fairness,
microsoft probably felt that is was a better use of a crack engineer's
time to more smoothly animate a tool-tip fade-out than to implement the
better decimal representation, and from a business perspective, they
were probably right.  But it's not that hard, so they could have done
both.)

On Mon, Jan 16, 2017 at 3:17 PM, Keith Marshall <km@xxxxxxxxxxxxxxxxx> wrote:
> On 16/01/17 16:51, Earnie wrote:
>> ...
> Regardless, it is a bug to emit more significant digits than the
> underlying data format is capable of representing ... a bug by
> which both glibc and our implementation are, sadly, afflicted;
> that the OP attempts to attribute any significance whatsoever to
> those superfluous digits is indicative of an all too common gap
> in knowledge ... garbage is garbage, whatever form it may take.

Here Keith claims that the glibc/Borland implementation is actually
a bug.  This is the kind of FUD we need to defend against.

I create a floating-point number however I choose, perhaps by
twiddling bits.  (And perhaps not by performing a floating-point
operation.)  The number that I have created is perfectly well-defined
and precise, and it is not a bug to be able to print out the decimal
representation to which it is exactly equal.  An implementation that
lets me print out the exactly correct decimal representation is better
than an implementation that does not.

The floating-point number that I created may well have a well-defined,
precise -- and even useful -- mathematical meaning, Keith's assertion
that "garbage is garbage," notwithstanding.

To reiterate this point ...

On Wed, Jan 18, 2017 at 4:39 PM, Keith Marshall <khall@xxxxxxxxxxx> wrote:
> ...
> On 18/01/17 10:00, tei@xxxxxxxxx wrote:
>> Emanuel, thank you very much for stepping in. I am extremely happy
>> that you found my code useful.
>
> Great that he finds it useful; depressing that neither of you cares
> in the slightest about accuracy; rather, you are both chasing the
> grail of "consistent inaccuracy".

Representing a floating-point number with the decimal fraction
(or ASCII string or whatever you want to call it) that is exactly,
mathematically equal to that floating-point number is, quite
simply, accuracy, rather than inaccuracy.

Granted, there are times when it may not be important or useful
to do so, but there are times when it is.

> ...
>> I will use cygwin when I need a more accurate printf.
> ...
> Yes, I deliberately said "consistently inaccurate"; see, cygwin's
> printf() is ABSOLUTELY NOT more accurate than MinGW's, (or even
> Microsoft's, probably, for that matter).  You keep stating these
> (sadly all too widely accepted) myths:

Alexandru is right here, and is stating truths.  The printf() that
emits the decimal fraction that is exactly, mathematically equal
to the floating-point number being printed is in a very legitimate
and substantive sense more accurate than the one that does not.

>> Every valid floating point representation that is not NaN or inf
>> corresponds to an exact, non recurring fraction representation in
>> decimal.
>
> In the general case, this is utter and absolute nonsense!

On the contrary, Alexandru is completely correct here

(Note:  "utter and absolute nonsense"  <--  FUD alert!)

Alexandru's statement is sensible, relevant, and mathematically
completely correct.

>> There is no reason why printf shouldn't print that exact
>> representation when needed, as the glibc printf does.

Absolutely correct, Alexandru.  If I or Alexandru or Emanuel wants
the exact representation it's a plus that the implementation provides
it for us.

>
> Pragmatically, there is every reason.  For a binary representation
> with N binary digits of precision, the equivalent REPRESENTABLE
> decimal precision is limited to a MAXIMUM of N * log10(2) decimal
> digits;

Again, you conflate the inaccuracy of some floating-point calculations
with individual floating point numbers themselves.  Individual
floating-point numbers have mathematically well-defined, precise,
accurate values (and some of us want to print those values out).

Let me repeat this point in the context of a comment of Peter's:

On Thu, Jan 19, 2017 at 6:19 AM, Peter Rockett <sprocket@xxxxxxxxxxxxx> wrote:
> On 19/01/17 08:21, tei@xxxxxxxxx wrote:
> ...
> I suspect the OP's conceptual problem lies in viewing every float in
> splendid isolation rather than as part of a computational system.

On the contrary, the conceptual problem underlying the FUD in this
thread is conflation of the properties of the overall computational
system with the individual floating-point numbers, and attributing
to the individual floating-point numbers, which are well-defined and
exact, the inexactness of some floating-point operations.

Floating-point numbers make perfect sense and are perfectly well-defined
"in splendid isolation" and to assume that all floating-point numbers of
legitimate interest are the results of inexact floating-point computations
is simply wrong.

> ...
> Or another take: If you plot possible floating point representations on a
> real number line, you will have gaps between the points. The OP is trying
> print out numbers that fall in the gaps!

When I read Alexandru's original post, it appears to me that he is trying
to print out individual, specific floating-point numbers.  That's his use
case.  I see nothing to suggest that he is trying to print out values in
the gaps.  (Alexandru clearly knows that floating-point numbers are discrete
points on the real number line with gaps between them.)

On Sun, Jan 15, 2017 at 10:08 PM, KHMan <cainherb@xxxxxxxx> wrote:
> On 1/16/2017 8:56 AM, John Brown wrote:
>> ...
> I do not think there are canonical conversion algorithms that must
> always be upheld, so I did not have an expectation that glibc must
> be canonical.

There is a canonically distinguished conversion algorithm -- it's the
one that produces the decimal representation that is mathematically
equal to the floating-point number.  To repeat myself: Mathematical
equality, what's more canonical than that?

But, of course, this algorithm does not need to be upheld.  I am quite
sure that it is not required by either the c or c++ standard, and I
am pretty sure that IEEE 754 is silent on this matter.  (I also don't
think that this issue is that important.  But it is legitimate, and
an implementation that does uphold this conversion algorithm is a
better implementation.)

> The glibc result is one data point, msvcrt is also one data point.
> He claims to have his own float to string, but knowing digits of
> precision limitations and the platform difference, why is he so
> strident in knocking msvcrt? Curious. I won't score that, so we
> are left with two data points running what are probably
> non-identical algorithms.

But it's more than just data points.  We have a canonical representation.
From this thread, we have three data points -- msvcrt, glibc, and
Borland (plus Alexandru's roll-your-own) -- and (apparently, as I
haven't tested them myself) glibc and Borland (and Alexandru's)
produce the canonical representation, while msvcrt doesn't, so
msvcrt is not canonical and is also the odd man out.

> ...
> For that expectation we pretty much need everyone to be using the same
> conversion algorithm.

Yes, and we probably won't have everyone using the same conversion
algorithm (for example, msvcrt).  Well that's what standards are for,
and some things don't get standardized.  But if everyone were to use
the same algorithm (for example, the canonical decimal representation),
then this whole thread would be much simpler, and life would be easier
for Emanuel.

There are some side comments I would like to make:

Two distinct, but related issues have been discussed.  The first is
whether printf() should print out the exact decimal representation
of a floating-point number (Alexandru), and the second is whether
different implementations should print out the same representation
(Emanuel).  Both are desirable goals, and if you get the first (for
all implementations), you get the second.

My preference, of course, would be to have all implementations print
out the exact representation (when asked to).  But you could, say,
have printf() print out the (specific-floating-point-number-dependent)
minimum number of digits for which you get "round-trip consistency"
(i.e., floating-point number --> printf() --> ASCII --> scanf() -->
back to the same floating-point number).  That would be reasonable,
and would solve Emanuel's problem.  (Or you could print out the minimum
number of "consistency" digits, and swap the last two digits just for fun.
That would be less reasonable, but would also solve Emanuel's problem.)

My point is that a canonically distinguished representation exists,
so, even if you only care about the second issue, it's easier to
get various implementations to hew to that canonical representation,
rather than to some well-defined, but semi-arbitrary representation
that I (or someone else) might make up.

In retort to Keith's claim that such a standard across implementations,
such as my "swap the last two digits" standard, would be chasing "consistent
inaccuracy" (Of, course, the canonical representation would be "consistent
accuracy."), Emanuel has presented a perfectly logical and valid use case
for this.  Sure, there are other ways Emanuel could achieve his goal of
cross-checking the output of different builds, but this is a good one.

More importantly, all of us (including Emanuel) agree that he has no
right to expect the floating-point results of his different builds to
be the exactly the same.  (Nothing in the c or c++ standard, nor in
IEEE 754 requires this.)  However, he enjoys the happy accident that
the floating-point results do agree, so it's unfortunate that printf()
from mingw/msvcrt and Borland print out different values, and that he
therefor has to go through additional work to cross-check his results.

That he can use Emanuel's code or set the _XOPEN_SOURCE flag to
resolve this issue is a good thing, but the fact that he has to take this
extra step is a minor negative.

Last, and quite tangential, Kein-Hong took a dig at Intel and the 8087:

On Fri, Jan 20, 2017 at 7:54 PM, KHMan <herbnine@xxxxxxxx> wrote:
> On 1/21/2017 6:18 AM, mingw@xxxxxxxxxxxxx wrote:
> ...
> AFAIK it is only with 8087 registers -- just about the only
> company who did this was Intel. Didn't really worked out,

Actually, it worked out quite well for some important use cases,
and kudos to Intel for doing this.

Often in numerical analysis you perform linear algebra on large
systems.  Often with large systems round-off error accumulates
excessively, and when the systems are ill-conditioned, the round-off
error is further amplified.

Often, as part of the linear-algebra algorithm, you compute a sum
of products, that is, the inner product of two vectors.  It turns
out (and, if you're a numerical analyst, you can prove, given conditions
on the linear systems involved), that you do not need to perform the
entire calculation with higher precision to get dramatically more
accurate results -- you can get most of the benefit just using higher
precision to compute the inner products.  The inner-product computation
(a chain of multiply-accumulates) fits trivially in the 8087 floating-point
registers (without register spill), and the use of the 8087's 80-bit
extended-precision just on these inner-product computations yields
dramatically more accurate results in many cases.

There are various engineering considerations for not using extended-precision
registers (some legitimate, some less so), but Intel, for whatever reason,
decided that they wanted to do floating-point right, they hired Kahan to
help them, and we're all the better for it.

Look, I understand the frustration felt by many commenters here, particularly
that expressed by Keith and Kein-Hong.  Stack Overflow and forums and bulletin
boards and chat rooms (and even classrooms, where they use things like, you
know, "blackboards" and "chalk") are filled with naive confusion about
floating-point arithmetic, with questions of the sort "I did x = y / z, and
w = x * z, and x and w don't test equal.  How can this be?  Mercy!  It must
be a compiler bug!"  And now you have to tutor another generation of novice
programmers in how floating-point arithmetic works.  It gets old.

But it's counter-productive to tutor them with misinformation.  Floating-point
numbers are what they are and are mathematically perfectly well defined.
Floating-point arithmetic is inexact when understood as an approximation
to real-number arithmetic.  Let's tutor the next generation in what actually
happens: Two perfectly well-defined floating-point operands go into a
floating-point operation, and out comes a perfectly well-defined (if you're
using a well-defined standard such as IEEE 754) floating-point result that
in general, is not equal to the real-number result you would have obtained
if you had used real-number arithmetic on the floating-point operands.

But, to repeat Emanuel's comment, "it's not as if a FPU had a Schrödinger cat
embedded!"  The floating-point operation doesn't sometimes give you one
(inherently fuzzy) floating-point result and sometimes another (inherently
fuzzy) result.  It gives you a single, consistent, perfectly well-defined
(if you're using a well-defined standard) result that makes perfectly good
mathematical sense.  Furthermore, it's also not as if the data bus connecting
memory to the FPU has an embedded Schrödinger cat, and that these (squishy,
fuzzy, inexact) floating-point numbers get fuzzed up somehow by cat hair as
they travel around inside our computers.

The way floating-point numbers -- and floating-point arithmetic -- really work
is completely precise and mathematically well defined, even if it's subtle
and complex -- and different from real-number arithmetic.  And how it really
works -- not FUD -- is what we need to help the next generation of people
doing substantive numerical calculations learn.


Happy Floating-Point Hacking!


K. Frank

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
MinGW-users mailing list
MinGW-users@xxxxxxxxxxxxxxxxxxxxx

This list observes the Etiquette found at 
http://www.mingw.org/Mailing_Lists.
We ask that you be polite and do the same.  Disregard for the list etiquette may cause your account to be moderated.

_______________________________________________
You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Also: mailto:mingw-users-request@xxxxxxxxxxxxxxxxxxxxx?subject=unsubscribe