Strange implicit conversion integers on concatenation

Jonathan M Davis newsgroup.d at jmdavisprog.com
Tue Nov 6 00:37:22 UTC 2018


On Monday, November 5, 2018 4:14:18 PM MST H. S. Teoh via Digitalmars-d 
wrote:
> On Mon, Nov 05, 2018 at 05:43:19PM -0500, Steven Schveighoffer via
> Digitalmars-d wrote: [...]
>
> > It's not just ints to chars, but chars to wchars or dchars, and wchars
> > to dchars.
> >
> > Basically a character type should not convert from any other type.
> > period.  Because it's not "just a number" in a different format.
>
> +1.  I recall having this conversation before.  Was this ever filed as a
> bug?  I couldn't find it this morning when I tried to look.
>
> > Do we need a DIP? Probably. but we have changed these types of things
> > in the past from what I remember (I seem to recall we had at one point
> > implicit truncation for adding 2 smaller numbers together). It is
> > possible to still fix.
>
> [...]
>
> If it's possible to fix, I'd like to see it fixed.  So far, I don't
> recall hearing anyone strongly oppose such a change; all objections
> appear to be only coming from the fear of breaking existing code.
>
> Some things to consider:
>
> - What this implies for the "if C code is compilable as D, it must have
>   the same semantics" philosophy that Walter appears to be strongly
>   insistent on.  Basically, anything that depends on C's conflation of
>   char and (u)byte must either give an error, or give the correct
>   semantics.

I'm pretty sure that the change would just result in more errors, so I don't
think that it would cause problems on this front.

> - The possibility of automatically fixing code broken by the change
>   (possibly partial, leaving corner cases as errors to be handled by the
>   user -- the idea being to eliminate the rote stuff and only require
>   user intervention in the tricky cases).  This may be a good and simple
>   use-case for building a tool that could do something like that.  This
>   isn't the first time potential code breakage threatens an otherwise
>   beneficial language change, where having an automatic source upgrade
>   tool could alleviate many of the concerns.

An automatic tool would be nice, but I don't know that focusing on that
would be helpful, since it would be making it seem like the amount of
breakage was large, which would make the change seem less acceptable.
Regardless, the breakage couldn't be immediate. It would have to be some
sort of deprecation warning first - possibly similar to whatever was done
with the integer promotion changes a few releases back, though I never
understood what happened there.

> - Once we start making a clear distinction between char types and
>   non-char types, will char types still obey C-like int promotion rules,
>   or should we consider discarding old baggage that's no longer so
>   applicable to modern D?  For example, I envision that this DIP would
>   make int + char or char + int illegal, but what should the result of
>   char + char or char + wchar be?  I'm tempted to propose outright
>   banning char arithmetic without casting, but for some applications
>   this might be too onerous.  If we continue follow C rules, char + char
>   would implicitly promote to dchar, which arguably could be annoying.

Well, as I understand it, the fact that char + char -> int is related to how
the CPU works, and having it become char + char -> char would be a problem
from that perspective. Having char + char -> dchar would also go against the
whole idea that char is an encoding, because adding two chars together isn't
necessarily going to get you a valid dchar. In reality though, I would
expect reasonable code to be adding ints to char, because you're going to
get stuff like x - 48 to convert ASCII numbers to integers. And honestly,
adding two chars together doesn't even make sense. What does that even mean?
'A' + 'Q' does what? It's nonsense. Ultimately, I think that it would be too
large a change to disallow it (and _maybe_ someone out there has some weird
use case where it sort of makes sense), but I don't see how it makes any
sense to actually do it. So, making it so that adding two chars together
continues to result in an int makes the most sense to me, as does adding an
int and a char (which is the operation that code is actually going to be
doing). Code can then cast back to char (which is what it already has to do
now anyway). It allows code to continue to function as it has (thus reducing
how disruptive the changes are), but if we eliminate the implicit
conversions, we eliminate the common bugs. I think that you'll get _far_
stronger opposition to trying to change the arithmetic operations than to
changing the implicit conversions, and I also think that the gains are far
less obvious.

So basically, I wouldn't advise mucking around with the arithmetic
operations. I'd suggest simply making it so that implicitly converting
between character types and any other type (unless explicitly defined by
something like alias this) be disallowed.

Given that you already have to cast with the arithmetic stuff (at least to
get it back into char), I'm pretty sure the result would actually be that
almost all of the code that would have to be changed as a result would be
code that was either broken or a code smell, which would probably make it a
lot easier to convince Walter to make the change.

- Jonathan M Davis





More information about the Digitalmars-d mailing list