Creeping Bloat in Phobos

Sun Sep 28 03:58:10 PDT 2014

Am Sun, 28 Sep 2014 10:04:21 +0000
schrieb "Marc Schütz" <schuetzm at gmx.net>:

> On Saturday, 27 September 2014 at 23:33:14 UTC, H. S. Teoh via 
> Digitalmars-d wrote:
> > On Sat, Sep 27, 2014 at 11:00:16PM +0000, bearophile via 
> > Digitalmars-d wrote:
> >> H. S. Teoh:
> >> 
> >> >If we can get Andrei on board, I'm all for killing off 
> >> >autodecoding.
> >> 
> >> Killing auto-decoding for std.algorithm functions will break 
> >> most of
> >> my D2 code... perhaps we can do that in a D3 language.
> > [...]
> >
> > Well, obviously it's not going to be done in a careless, 
> > drastic way!
> >
> > There will be a proper migration path and deprecation cycle. We 
> > already
> > have byCodeUnit and byCodePoint, and the first step is probably 
> > to
> > migrate towards requiring usage of one or the other for 
> > iterating over
> > strings, and only once all code is using them, we will get rid 
> > of
> > autodecoding (the job now being done by byCodePoint). Then, the 
> > final
> > step would be to allow the direct use of strings in iteration 
> > constructs
> > again, but this time without autodecoding by default. Of course,
> > .byCodePoint will still be available for code that needs to use 
> > it.
> 
> The final step would almost inevitably lead to Unicode 
> incorrectness, which was the reason why autodecoding was 
> introduced in the first place. Just require 
> byCodePoint/byCodeUnit, always. It might be a bit inconvenient, 
> but that's a consequence of the fact that we're dealing with 
> Unicode strings.

And I would go so far to say that you have to make an informed
decision between code unit, code point and grapheme. They are
all useful. Graphemes being the most generally useful, hiding
away normalization and allowing cutting by "user perceived
character".

-- 
Marco