D-ish way to work with strings?

Robert M. Münch robert.muench at saphirion.com
Fri Dec 27 12:23:57 UTC 2019


On 2019-12-23 15:05:20 +0000, H. S. Teoh said:

> On Sun, Dec 22, 2019 at 06:27:03PM +0100, Robert M. Münch via 
> Digitalmars-d-learn wrote:
>> Want to add I'm talking about unicode strings.
>> 
>> Wouldn't it make sense to handle everything as UTF-32 so that
>> iteration is simple because code-point = code-unit?
>> 
>> And later on, convert to UTF-16 or UTF-8 on demand?
> [...]
> 
> Be careful that code point != "character" the way most people understand
> the word "character".

I know. My point was that with UTF-8 code-points (not being a 
character) have different sizes. Which you need to take into account if 
you want to iterate by code-points.

> The word you're looking for is "grapheme". Which, unfortunately, is 
> rather complex and very slow to handle in
> Unicode. See std.uni.byGrapheme.

Yes, that's when we come to "characters". And a "grapheme" can consists 
of several code-points. Is grapheme handling just slow in D or in 
general? If it's the latter, well, than that's just how it is.

> Usually you want to just stick with UTF-8 (usually) or UTF-16 (for
> Windows and Java interop). UTF-32 wastes a lot of space, and *still*
> doesn't give you what you think you want, and Grapheme[] is just dog
> slow because of the amount of decoding/recoding needed to manipulate it.

I need to handle graphemes when things are goind to be rendered and edited.

> What are you planning to do with your strings?

Pretty simple: Have user editable content that is rendered using 
different fonts supporting unicode.

So, all editing functions: insert, replace, delete at all locations in 
the string supporting all unicode characters.

Viele Grüsse.

-- 
Robert M. Münch
http://www.saphirion.com
smarter | better | faster



More information about the Digitalmars-d-learn mailing list