std.locale
Michel Fortin
michel.fortin at michelf.com
Mon Mar 2 19:13:55 PST 2009
On 2009-03-02 10:02:10 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
> Michel Fortin wrote:
>> I think there are three aspects to localization. One is date and number
>> formating. Another is offering a facility for translating all the
>> messages an application can give. And the last one is the configuration
>> part, where you know which format to use.
>
> Sounds like a good start.
>
>> The only problem I've seen addressed by you right now is the
>> configuration part; I believe it's the wrong end to start with.
>>
>> We should start by defining how to perform the tasks I enumerated
>> above: translating date and number formats, selecting strings for a
>> given language. After that we can figure out how to pass the proper
>> default configuration around. And then you're done.
>>
>> For date and number formatting, I like very much the NSDateFormatter
>> and NSNumberFormatter approach in Cocoa for instance: you have a base
>> class to format dates, another for numbers; you can easily create your
>> own subclass if you want, and there's a way to get the default
>> formatter instance.
>
> Well I was thinking of passing the buck around. Instead of std.locale
> defining a hierarchy for formatting numbers and dates, it provides a
> means for user code to plant a routine in the locale object that knows
> how to format numbers and dates. Of course, with time default localized
> routine implementations will show up (hopefully contributed to by
> people), but the basic mechanism is simple - there exists a locale
> table that allows you to store a delegate in it.
Looks somewhat like what I proposed. But the point I was trying to make
is that you don't need to regroup all these in one big object called a
"locale".
Instead of seeing a locale as a central object for localizing every
kind of data, I'm suggesting that we have different kinds of formatters
capable of localizing different kinds of data. Each formatter would
have its own definition of a locale that suits its needs. All you need
is a standardized naming scheme for locales compatible between
formatters, but that we have.
Note that while I've proposed that formatters be classes, I have no
problem in them being structs which could be accepted in template
functions.
What's good about a class, or a struct, is that it can regroup a bunch
of related functions. For instance, you could have a number formatter
help you display the right string, read a formatted string, and
validate a formatted string. And you could configure the formatter for
a fixed number of decimals, specific rounding behaviour, negative
format, etc.
>> This is extensible, because if you wanted to go further, you could add
>> formatter classes for various units (length, mass...), or anything else.
>
> This I want to avoid, at least for the time being. I want to define a
> table that can contain strings, integers, delegates, and other
> sub-tables. This is it. The path to extensibility will not be Phobos
> defining new classes to format various things. This could go on
> forever. Phobos will use the table consistently, and users who do want
> to format various things will simply plant their delegates in the table.
Well, when I said "you", I really meant anyone, and not necessarily
inside Phobos. That was just to point out that the design is
extensible. Sorry, it was confusing.
>> Translating strings is a little harder because 1) strings are
>> application-defined, 2) strings are often not available in the user's
>> prefered language, adding the need for a fallback mecanism, and 3)
>> different applications will want to to store those strings in different
>> ways. Perhaps we could define a base class for getting translated
>> strings, then allow the program to use whatever subclass it wants.
>
> There's no need for classes and subclasses. It's all data. Why should
> we replace data with code? Data is easier.
>
> Consider some code in phobos that must throw an exception:
>
> throw Exception("File `%s' not found, system error is %s.",
> filename, errnomsg);
>
> The localized version will look like this:
>
> auto format = "File `%s' not found, system error is %s.";
> auto localFormat = currentLocale ? currentLocale.peek(format) : null;
> if (!localFormat) localFormat = format;
> throw Exception(localFormat, filename, errnomsg);
>
> What happens is that the default format string _is_ the key for looking
> up the localized strings. If there's no value for that string, the
> default format string is in vigor. Note that on the default path,
> currentLocale is null so there is hardly any inefficiency.
Firstly, while you and I both agree that it's good that the key for
searching a localized string be a readable message, not everyone does.
It often doesn't work well when you want to translate small words
having an overloaded meaning in English for instance.
Secondly, always falling back to english (or the developer's locale)
when the currentLocale is not available isn't flexible enough. On Mac
OS X for instance, you can select a number of languages for
applications to use in order of preference. When the first isn't
available, it looks for the second (skipping some details).
Thirdly, I hope you don't expect everyone to write the above each time.
We should provide a nice fucntion to do the localization, say
"localize"? This function should really be an overridable delegate.
auto format = "File `%s' not found, system error is %s.";
throw Exception(localize(format), filename, errnomsg);
Fourthly, various libraries are likely to provide their own translation
tables (perhaps even in various formats). Unless you merge them all
(risking some clashes) so you may want a second argument for specifying
the translation table to use.
auto format = "File `%s' not found, system error is %s.";
throw Exception(localize(format, PHOBOS), filename, errnomsg);
Finally, no current library address this, but I'd be great if there was
a way to correctly manage plurals in all languages. Perhaps making a
word parametrizable depending on a number...
>> Notice how I'm not using the word "locale" to talk about these things.
>> "Locale" is a concept too abstract to be able to do something good with
>> it. Since you could only define it using Algebraic type and a loosely
>> defined tree of strings, that seems to confirm my view. Call the module
>> std.locale if you want, but keep in mind that the most important task
>> at hand is facilitating localization, not defining what constitutes a
>> locale, that can wait.
>
> How should I call it?
My point was that there shouldn't be a class/struct/thing representing
a locale. Having a collection of formatters, each knowning where to get
their locale information (when given a locale name) would work better
in my opinion.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list