fixedstring: a @safe, @nogc string type

Moth postmaster at gmail.com
Wed Jan 12 19:50:51 UTC 2022


On Tuesday, 11 January 2022 at 12:22:36 UTC, WebFreak001 wrote:
> [snip]
>
> you can relatively easily find out how many bytes a string 
> takes up with `std.utf`. You can also iterate by code points or 
> graphemes there if you want to translate some kind of character 
> index to byte position.
>
> HOWEVER it's not clear what a character is. Sure for the posted 
> cases here it's no problem but when it comes to languages based 
> on combining glyphs together to form new glyphs it's no longer 
> clear what is a character. There are Graphemes (grapheme 
> clusters) which are probably the closest to what everybody 
> would think a character is, but IIRC there are edge cases with 
> that a programmer wouldn't expect, like adding a character not 
> increasing the count of characters of the string because it 
> merges with the last Grapheme. Additionally there is a 
> performance impact on using Graphemes over simpler things like 
> codepoints which fit 98% of use-cases with strings. Codepoints 
> in D are mapped 1:1 using dchar, take up to 2 wchars or up to 4 
> chars. You can use `std.utf` to compute byte lengths for a 
> codepoint given a string.

aha, i think i might have miscommunicated here - i was talking 
about an error i thought i was having where a fixedstring of 
`"áéíóú"` wasn't equal to a string literal of the same, but as it 
turned out i was misreading the error message [i had been trying 
to assign a literal larger than the fixedstring could take]. to 
tell the truth, unicode awareness is... not something i really 
want to mess with right now, lol. it would be nice to have the 
option at some point in the future though.

> I would rather suggest you support FixedString with types other 
> than `char`. (wchar, dchar, heck users could even use any 
> arbitrary type and use this as array class) For languages that 
> commonly use more than 1 byte per codepoint or for interop with 
> Win32 unicode APIs, JavaScript strings, C# strings, UTF16 files 
> in general, etc. programmers might opt to use FixedString with 
> wchar then.
>
> With D's templates that should be quite easy to do (add a 
> template parameter to the struct like `struct 
> FixedString(size_t maxSize, CharT = char)` and replace all 
> usage of char in your code with `CharT` in this case)


[i've pushed an update to the repo for 
this!](https://github.com/Moth-Tolias/fixedstring/releases/tag/v1.1.0) =] it was a bit more complicated than a simple replace all, but not too hard.


More information about the Digitalmars-d-announce mailing list