Creeping Bloat in Phobos

Uranuz via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 28 05:06:16 PDT 2014


On Sunday, 28 September 2014 at 00:13:59 UTC, Andrei Alexandrescu 
wrote:
> On 9/27/14, 3:40 PM, H. S. Teoh via Digitalmars-d wrote:
>> If we can get Andrei on board, I'm all for killing off 
>> autodecoding.
>
> That's rather vague; it's unclear what would replace it. -- 
> Andrei

I believe that removing autodeconding will make things even 
worse. As far as understand if we will remove it from front() 
function that operates on narrow strings then it will return just 
byte of char. I believe that proceeding on narrow string by `user 
perceived chars` (graphemes) is more common use case. Operating 
on single bytes of multibyte character is uncommon task and you 
can do that via direct indexing of char[] array. I believe what 
number of bytes is in *user perceived chars* is internal 
implementation of UTF-8 encoding and it should not be considered 
in common tasks such as parsing, searching, replacing text and 
etc. If you need byte representation of string you should cast it 
into ubyte[] and work with it using the same range functions 
without autodecoding.

The main problem that I see that unexpirienced in D programmer 
can be confused where he operates by bytes or by graphemes. 
Especially it could happen when he migrates from C#, Python where 
string is not considered as array of it's bytes. Because *char* 
in D is not char it's a part of char, but not entire char. It's 
main inconsistence.

Possible solution is to include class or struct implementation of 
string and hide internal implementation of narrow string for 
those users who don't need to operate on single bytes of UTF-8 
characters. I believe it's the best way to kill all the rabbits)) 
We could provide this class String with method returning ubyte[] 
(better way) or char[] that will expose internal implementation 
for those who need it.

A question: can you list some languages that represent UTF-8 
narrow strings as array of single bytes?


More information about the Digitalmars-d mailing list