How to print unicode characters (no library)?

max haughton maxhaton at gmail.com
Sun Dec 26 23:45:25 UTC 2021


On Sunday, 26 December 2021 at 21:22:42 UTC, Adam Ruppe wrote:
> On Sunday, 26 December 2021 at 20:50:39 UTC, rempas wrote:
>> [...]
>
> write just transfers a sequence of bytes. It doesn't know nor 
> care what they represent - that's for the receiving end to 
> figure out.
>
>> [...]
>
> You are mistaken. There's several exceptions, utf-16 can come 
> in pairs, and even utf-32 has multiple "characters" that 
> combine onto one thing on screen.
>
> I prefer to think of a string as a little virtual machine that 
> can be run to produce output rather than actually being 
> "characters". Even with plain ascii, consider the backspace 
> "character" - it is more an instruction to go back than it is a 
> thing that is displayed on its own.
>
>> [...]
>
> This is because the *receiving program* treats them as utf-8 
> and runs it accordingly. Not all terminals will necessarily do 
> this, and programs you pipe to can do it very differently.
>
>> [...]
>
> The [w|d|]string.length function returns the number of elements 
> in there, which is bytes for string, 16 bit elements for 
> wstring (so bytes / 2), or 32 bit elements for dstring (so 
> bytes / 4).
>
> This is not necessarily related to the number of characters 
> displayed.
>
>> [...]
>
> yes, it just passes bytes through. It doesn't know they are 
> supposed to be characters...

I think that mental model is pretty good actually. Maybe a more 
specific idea exists, but this virtual machine concept does 
actually explain to the new programmer to expect dragons - or at 
least that the days of plain ASCII are long gone (and never 
happened, e.g. backspace as you say)


More information about the Digitalmars-d-learn mailing list