extended characterset output

anonymous anon at ymous.org
Fri Apr 8 09:51:23 UTC 2022


On Friday, 8 April 2022 at 08:36:33 UTC, Ali Çehreli wrote:
> On 4/7/22 23:13, anonymous wrote:
> > What's the proper way to output all characters in the
> extended character
> > set?
>
> It is not easy to answer because there are a number of concepts 
> here that may make it trivial or complicated.
>
> The configuration of the output device matters. Is it set to 
> Windows-1252 or are you using Unicode strings in Python?

I'm running Ubuntu and my default language is en_US.UTF-8.

> >
> > ```d
> > void main()
> > {
> >      foreach(char c; 0 .. 256)
>
> 'char' is wrong there because 'char' has a very special meaning 
> in D: A UTF-8 code unit. Not a full Unicode character in many 
> cases, especially in the "extended" set.
>
> I think your problem will be solved simply by replacing 'char' 
> with 'dchar' there:
>
>   foreach (dchar c; ...

I tried that. It didn't work.

> However, isControl() below won't work because isControl() only 
> knows about the ASCII table. It would miss the unprintable 
> characters above 127.
>
> >      {
> >         write(isControl(c) ? '.' : c);
> >      }
> > }
> > ```

Oh okay, that may have been the reason.

> This works:
>
> import std.stdio;
>
> bool isPrintableLatin1(dchar value) {
>   if (value < 32) {
>     return false;
>   }
>
>   if (value > 126 && value < 161) {
>     return false;
>   }
>
>   return true;
> }
>
> void main() {
>   foreach (dchar c; 0 .. 256) {
>     write(isPrintableLatin1(c) ? c : '.');
>   }

Nope... running this code, I get a bunch of digits as the output. 
The dot's don't even show up. Maybe I'm drunk or lacking sleep.

Weird, I got this strange feeling that this problem stemmed from 
the compiler I'm using (GDC) so I installed DMD. Would you 
believe everything worked fine afterwords? To include the 
original version where I used isControl and 'dchar' instead of 
'char'. I wonder why that is?

Thanks Ali.


More information about the Digitalmars-d-learn mailing list