Some questions about strings

Denis noreply at noserver.lan
Mon Jun 22 03:43:58 UTC 2020


On Monday, 22 June 2020 at 03:24:37 UTC, Adam D. Ruppe wrote:
> On Monday, 22 June 2020 at 03:17:54 UTC, Denis wrote:
>> - First, is there any difference between string, wstring and 
>> dstring?
>
> Yes, they encode the same content differently in the bytes. If 
> you cast it to ubyte[] and print that out you can see the 
> difference.
>
>> - Are the characters of a string stored in memory by their 
>> Unicode codepoint(s), as opposed to some other encoding?
>
> no, they are encoded in utf-8, 16, or 32 for string, wstring, 
> and dstring respectively.
>
>> - Can a series of codepoints, appropriately padded to the 
>> required width, and terminated by a null character, be 
>> directly assigned to a string WITHOUT GOING THROUGH A DECODING 
>> / ENCODING TRANSLATION?
>
> no, they must be encoded. Unicode code points are an abstract 
> concept that must be encoded somehow to exist in memory 
> (similar to the idea of a number).

OK, then that actually simplifies what's needed, because I won't 
need to decode the UTF-8, only validate it.

My code reads a UTF-8 encoded file into a buffer and validates, 
byte by byte, the UTF-8 encoding along with some additional 
validation. If I simply return the UTF-8 encoded string, there 
won't be another decoding/encoding done -- correct?


More information about the Digitalmars-d-learn mailing list