How to print unicode characters (no library)?
Adam Ruppe
destructionator at gmail.com
Sun Dec 26 21:22:42 UTC 2021
On Sunday, 26 December 2021 at 20:50:39 UTC, rempas wrote:
> I want to do this without using any library by using the
> "write" system call directly with 64-bit Linux.
write just transfers a sequence of bytes. It doesn't know nor
care what they represent - that's for the receiving end to figure
out.
> know (and tell me if I'm mistaken), UTF-16 and UTF-32 have
> fixed size lengths for their characters.
You are mistaken. There's several exceptions, utf-16 can come in
pairs, and even utf-32 has multiple "characters" that combine
onto one thing on screen.
I prefer to think of a string as a little virtual machine that
can be run to produce output rather than actually being
"characters". Even with plain ascii, consider the backspace
"character" - it is more an instruction to go back than it is a
thing that is displayed on its own.
> Now the UTF-8 string will report 11 characters and print them
> normally.
This is because the *receiving program* treats them as utf-8 and
runs it accordingly. Not all terminals will necessarily do this,
and programs you pipe to can do it very differently.
> Now what about the other two? I was expecting UTF-16 to report
> 16 characters and UTF-32 to report 32 characters.
The [w|d|]string.length function returns the number of elements
in there, which is bytes for string, 16 bit elements for wstring
(so bytes / 2), or 32 bit elements for dstring (so bytes / 4).
This is not necessarily related to the number of characters
displayed.
> Isn't the "write" system call just writing a sequence of
> characters without caring which they are?
yes, it just passes bytes through. It doesn't know they are
supposed to be characters...
More information about the Digitalmars-d-learn
mailing list