Some questions about strings
Denis
noreply at noserver.lan
Mon Jun 22 03:43:58 UTC 2020
On Monday, 22 June 2020 at 03:24:37 UTC, Adam D. Ruppe wrote:
> On Monday, 22 June 2020 at 03:17:54 UTC, Denis wrote:
>> - First, is there any difference between string, wstring and
>> dstring?
>
> Yes, they encode the same content differently in the bytes. If
> you cast it to ubyte[] and print that out you can see the
> difference.
>
>> - Are the characters of a string stored in memory by their
>> Unicode codepoint(s), as opposed to some other encoding?
>
> no, they are encoded in utf-8, 16, or 32 for string, wstring,
> and dstring respectively.
>
>> - Can a series of codepoints, appropriately padded to the
>> required width, and terminated by a null character, be
>> directly assigned to a string WITHOUT GOING THROUGH A DECODING
>> / ENCODING TRANSLATION?
>
> no, they must be encoded. Unicode code points are an abstract
> concept that must be encoded somehow to exist in memory
> (similar to the idea of a number).
OK, then that actually simplifies what's needed, because I won't
need to decode the UTF-8, only validate it.
My code reads a UTF-8 encoded file into a buffer and validates,
byte by byte, the UTF-8 encoding along with some additional
validation. If I simply return the UTF-8 encoded string, there
won't be another decoding/encoding done -- correct?
More information about the Digitalmars-d-learn
mailing list