how to localize console and GUI apps in Windows

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Dec 29 18:13:04 UTC 2017


On Fri, Dec 29, 2017 at 10:35:53AM +0000, Andrei via Digitalmars-d-learn wrote:
> On Thursday, 28 December 2017 at 18:45:39 UTC, H. S. Teoh wrote:
> > On Thu, Dec 28, 2017 at 05:56:32PM +0000, Andrei via Digitalmars-d-learn
> > wrote:
> > ...
> > The string / wstring / dstring types in D are intended to be Unicode
> > strings.  If you need to use other encodings, you really should be
> > using ubyte[] or const(ubyte)[] or immutable(ubyte)[], instead of
> > string.
> 
> Thank you Teoh for advise and good example! I was looking towards
> writing something like that if no decision exists. Still this way of
> deliberate translations seems to be not the best. It supposes explicit
> workaround for every ahchoo in Russian and steady converting ubyte[]
> to string and back around. No formatting gems, no simple and elegant
> I/O statements or string/char comparisons. This may be endurable if
> you write an application where Russian is only one of rare options,
> and what if your whole environment is totally Russian?

You mean if your environment uses a non-UTF encoding?  If your
environment uses UTF, there is no problem.  I have code with strings in
Russian (and other languages) embedded, and it's no problem because
everything is in Unicode, all input and all output.

But I understand that in Windows you may not have this luxury. So you
have to deal with codepages and what-not.

Converting back and forth is not a big problem, and it actually also
solves the problem of string comparisons, because std.uni provides
utilities for collating strings, etc.. But it only works for Unicode, so
you have to convert to Unicode internally anyway.  Also, for static
strings, it's not hard to make the codepage mapping functions CTFE-able,
so you can actually write string literals in a codepage and have the
compiler automatically convert it to UTF-8.

The other approach, if you don't like the idea of converting codepages
all the time, is to explicitly work in ubyte[] for all strings. Or,
preferably, create your own string type with ubyte[] representation
underneath, and implement your own comparison functions, etc., then use
this type for all strings. Better yet, contribute this to code.dlang.org
so that others who have the same problem can reuse your code instead of
needing to write their own.

[...]
> p.s. I’ve found that I may set “Consolas” font for a console window
> and then you can output properly localized UTF8 strings without any
> special code in D script managing code pages. Still this does not
> decide localized input problem: any localized input throws an
> exception “std.utf.UTFException...  Invalid UTF-8 sequence”.

Is the exception thrown in readln() or in writeln()? If it's in
writeln(), it shouldn't be a big deal, you just have to pass the data
returned by readln() to fromKOI8 (or whatever other codepage you're
using).

If the problem is in readln(), then you probably need to read the input
in binary (i.e., as ubyte[]) and convert it manually. Unfortunately,
there's no other way around this if you're forced to use codepages. The
ideal situation is if you can just use Unicode throughout your
environment. But of course, sometimes you have no choice.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.


More information about the Digitalmars-d-learn mailing list