fixedstring: a @safe, @nogc string type

WebFreak001 d.forum at webfreak.org
Tue Jan 11 12:22:36 UTC 2022


On Tuesday, 11 January 2022 at 11:16:13 UTC, Moth wrote:
> On Tuesday, 11 January 2022 at 03:20:22 UTC, Salih Dincer wrote:
>> [snip]
>
> glad to hear you're finding it useful! =]
>
> hm, i'm not sure how i would go about fixing that double 
> character issue. i know there's currently some wierdness with 
> wchars / dchars equality that needs to be fixed [shouldn't be 
> too much trouble, just need to set aside the time for it], but 
> i think being able to tell how many chars there are in a glyph 
> requires unicode awareness? i'll look into it.
>
> [...]

you can relatively easily find out how many bytes a string takes 
up with `std.utf`. You can also iterate by code points or 
graphemes there if you want to translate some kind of character 
index to byte position.

HOWEVER it's not clear what a character is. Sure for the posted 
cases here it's no problem but when it comes to languages based 
on combining glyphs together to form new glyphs it's no longer 
clear what is a character. There are Graphemes (grapheme 
clusters) which are probably the closest to what everybody would 
think a character is, but IIRC there are edge cases with that a 
programmer wouldn't expect, like adding a character not 
increasing the count of characters of the string because it 
merges with the last Grapheme. Additionally there is a 
performance impact on using Graphemes over simpler things like 
codepoints which fit 98% of use-cases with strings. Codepoints in 
D are mapped 1:1 using dchar, take up to 2 wchars or up to 4 
chars. You can use `std.utf` to compute byte lengths for a 
codepoint given a string.

I would rather suggest you support FixedString with types other 
than `char`. (wchar, dchar, heck users could even use any 
arbitrary type and use this as array class) For languages that 
commonly use more than 1 byte per codepoint or for interop with 
Win32 unicode APIs, JavaScript strings, C# strings, UTF16 files 
in general, etc. programmers might opt to use FixedString with 
wchar then.

With D's templates that should be quite easy to do (add a 
template parameter to the struct like `struct FixedString(size_t 
maxSize, CharT = char)` and replace all usage of char in your 
code with `CharT` in this case)


More information about the Digitalmars-d-announce mailing list