D-ish way to work with strings?
Robert M. Münch
robert.muench at saphirion.com
Fri Dec 27 12:23:57 UTC 2019
On 2019-12-23 15:05:20 +0000, H. S. Teoh said:
> On Sun, Dec 22, 2019 at 06:27:03PM +0100, Robert M. Münch via
> Digitalmars-d-learn wrote:
>> Want to add I'm talking about unicode strings.
>> Wouldn't it make sense to handle everything as UTF-32 so that
>> iteration is simple because code-point = code-unit?
>> And later on, convert to UTF-16 or UTF-8 on demand?
> Be careful that code point != "character" the way most people understand
> the word "character".
I know. My point was that with UTF-8 code-points (not being a
character) have different sizes. Which you need to take into account if
you want to iterate by code-points.
> The word you're looking for is "grapheme". Which, unfortunately, is
> rather complex and very slow to handle in
> Unicode. See std.uni.byGrapheme.
Yes, that's when we come to "characters". And a "grapheme" can consists
of several code-points. Is grapheme handling just slow in D or in
general? If it's the latter, well, than that's just how it is.
> Usually you want to just stick with UTF-8 (usually) or UTF-16 (for
> Windows and Java interop). UTF-32 wastes a lot of space, and *still*
> doesn't give you what you think you want, and Grapheme is just dog
> slow because of the amount of decoding/recoding needed to manipulate it.
I need to handle graphemes when things are goind to be rendered and edited.
> What are you planning to do with your strings?
Pretty simple: Have user editable content that is rendered using
different fonts supporting unicode.
So, all editing functions: insert, replace, delete at all locations in
the string supporting all unicode characters.
Robert M. Münch
smarter | better | faster
More information about the Digitalmars-d-learn