[Issue 17861] UTF Decode fails with exception

d-bugmail at puremagic.com d-bugmail at puremagic.com
Tue Oct 3 22:34:56 UTC 2017


https://issues.dlang.org/show_bug.cgi?id=17861

--- Comment #12 from Jonathan M Davis <issues.dlang at jmdavisProg.com> ---
(In reply to Etienne from comment #11)
> If the current idea is to not fix the bugs due to possible breakage, why
> have a bug tracker for druntime in the first place?

The current behavior is not a bug. The code is functioning exactly as designed.
That design is arguably a bad design, and many of us would like to change it,
but changing it would break existing code, so it is unlikely that it will be
changed. There simply isn't a good deprecation path that would allow us to go
from one behavior to the other - certainly no one has come up with one thus
far.

> Also, what's the point of having unit tests if you can't rely on them?

What unit test are you referring to? Nothing about the current behavior of
foreach and decoding code points should make it so that unit tests are
unreliable. foreach is completely consistent in what it does. It's just that
it's designed to do something that we wouldn't design it to do if we were doing
things from scratch.

You weren't previously aware that foreach threw when decoding invalid UTF. Now,
you are, and you can write your code accordingly. The information about foreach
throwing when decoding invalid UTF should be in the spec, but I don't know if
it is or not. The spec doesn't always have the information that it should, but
this is how foreach was designed and has worked ever since it was made so that
it could decode code points. And it's the intended behavior until such time as
we can figure out how to move to using the replacement character without
breaking code in the process, which unfortunately, may very well be never.

Right now, literally, our best option that would involve making the change
would be to make the change and warn in the changelog that that's what we're
doing, and anyone reading it would then have the opportunity to scour their
code to see if they needed to change it as a result. The breakage would be
silent and easy to miss even if in many cases, it wouldn't matter. And as such,
thus far, that solution has been deemed unacceptable.

So, if you know of a way to make it so that foreach can be changed to use the
replacement character without silently breaking code, then great. We'd love to
hear it. As it stands, this is one of those design decisions that we regret in
retrospect but seem to be stuck with.

--


More information about the Digitalmars-d-bugs mailing list