Proposal for fixing dchar ranges

monarch_dodra monarchdodra at gmail.com
Wed Mar 12 01:47:56 PDT 2014


On Tuesday, 11 March 2014 at 18:02:26 UTC, Steven Schveighoffer 
wrote:
> No, where we are today is that in some cases, the language 
> treats a char[] as an array of char, in other cases, it treats 
> a char[] as a bi-directional dchar range.
>
> -Steve

I want to mention something I've had trouble with recently, that 
I haven't seen mentioned yet, but is related:

The ambiguity of the "lone char".

By that I mean: When a function accepts 'char' as an argument, it 
is (IMO) very hard to know if it is actually accepting a?
1. An ascii char in the 0 .. 128 range?
2. A code unit?
3. (heaven forbid) a codepoint in the 0 .. 256 range packed into 
a char?

Currently (fortuantly? unfortunatly?) the current choice taken in 
our algorithms is 3, which is actually the 'safest' solution.

So if you write:
find("cassé", cast(char)'é');

It *will* correctly find the 'é', but it *won't* search for it in 
individual codeunits.

--------

Another more pernicious case is that of output ranges. "put" is 
supposed to know how to convert and string/char width, into any 
sting/char width.

Again, things become funky if you tell "put" to place a string, 
into a sink that accepts a char.

Is the sink actually telling you to feed it code units? or ascii?


More information about the Digitalmars-d mailing list