kill the commas! (phobos code cleanup)

Marco Leise via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 7 04:51:23 PDT 2014


On Sunday, 7 September 2014 at 10:29:41 UTC, ketmar via 
Digitalmars-d wrote:
> but there is no need in extra work actually. using ASCII
> and English for program UI will work in any encoding.

I'm not so convinced that many people would be happy with
reduction of they alphabet to ASCII. Some for aesthetics and
some for political reasons. Cyrillic, Arabic or Japanese just
wouldn't look right anymore. But I figure, your system is 100%
English anyways and you have no use for NLS ? :D

> index nth symbol! ucs-4 (aka dchar/dstring) is ok though.

Now you mentally map UCS-4 onto your 1-byte encodig and try
to see it as the same, just 4 times larger and think that
C style indexing solves all use cases.
But it doesn't. While Latin places letters in a sequence which
you could cut off anywhere, Korean uses blocks containing
multiple consonants and vowels. For truncation of text you
would be interested in the whole block or "grapheme" not a
single vowel/consonant.

Am Sun, 07 Sep 2014 10:45:22 +0000
schrieb "Ola Fosheim Grøstad"
<ola.fosheim.grostad+dlang at gmail.com>:

> [...]
> 
> I think the D approach to strings is unpleasant. You should not 
> have slices of strings, only slices of ubyte arrays.

Rust does that for at least OS paths.

> If you want real speedups for streams of symbols you have to move 
> into the landscape of huffman-encoding, tries, dedicated 
> datastructures…
> 
> Having uniform string support in libraries (i.e. only supporting 
> utf-8) is a clear advantage IMO, that will allow for APIs that 
> are SSE backed and performant.

-- 
Marco



More information about the Digitalmars-d mailing list