[Issue 4483] foreach over string or wstring, where element type not specified, does not support unicode

d-bugmail at puremagic.com d-bugmail at puremagic.com
Fri Jan 17 01:30:10 PST 2014


https://d.puremagic.com/issues/show_bug.cgi?id=4483


Lionello Lunesu <lio+bugzilla at lunesu.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lio+bugzilla at lunesu.com
            Summary|Make foreach over string or |foreach over string or
                   |wstring where element type  |wstring, where element type
                   |not specified a warning     |not specified, does not
                   |                            |support unicode


--- Comment #2 from Lionello Lunesu <lio+bugzilla at lunesu.com> 2014-01-17 01:30:01 PST ---
I took the liberty to remove the suggested solution from the title, since I
think there are a couple of possible fixes here:

1. Issue a warning (original suggestion)
2. Issue an error, always require a value type (breaking change)
3. Infer the value type as "dchar" in all cases (breaking change)
4. Throw an exception at runtime when >char, >wchar unicode is encountered
(breaking change)

I think this issue is serious enough to warrant a breaking change. I taught a D
workshop, in China, and everybody expected foreach to "just work", and
rightfully so.

foreach(c; "你好") {}

This should just work! And it's hard to explain people why it doesn't, without
getting into Unicode encoding issues, which no user wants to care about.

I'm going to argue for fix 3. and I'd say it's worth taking a breaking change
for this issue.

The breaking change is compile time only, and limited to foreach over char[] or
wchar[], with a non-ref, inferred value type, and where the scope cares about
the value type being char or wchar. 

That last part is important: In all of druntime and phobos there were only 2
places where that was the case. All others, including all tests(!), compiled
(and ran) successfully without changes. The two places were fixed by adding the
appropriate type, in both cases "char". A nice side effect of this change is
that it makes it immediately obvious that the foreach does NOT handle the full
Unicode character set. It's self-documenting, in a way.

Note that we might still choose a runtime exception. It's hardly useful to get
a char with value 0xE8 out of a char[]. But throwing a sudden exception is a
breaking change that might be too risky to take on.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list