How to print unicode characters (no library)?

Tue Dec 28 06:46:57 UTC 2021

On 27.12.21 15:23, Adam D Ruppe wrote:
> Let's look at:
> 
> "Hello 😂\n";
[...]
> Finally, there's "string", which is utf-8, meaning each element is 8 
> bits, but again, there is a buffer you need to build up to get the code 
> points you feed into that VM.
[...]
> H, e, l, l, o, <space>, <next point is combined by these bits PLUS THREE 
> MORE elements>, <this is a work-in-progress element and needs two more>, 
> <this is a work-in-progress element and needs one more>, <this is the 
> final work-in-progress element>, <new line>
[...]
> Notice how each element here told you how many elements are left. This 
> is encoded into the bit pattern and is part of why it took 4 elements 
> instead of just three; there's some error-checking redundancy in there. 
> This is a nice part of the design allowing you to validate a utf-8 
> stream more reliably and even recover if you jumped somewhere in the 
> middle of a multi-byte sequence.

It's actually just the first byte that tells you how many are in the 
sequence. The continuation bytes don't have redundancies for that.

To recover from the middle of a sequence, you just skip the orphaned 
continuation bytes one at a time.