Rename std.ctype to std.ascii?
Jonathan M Davis
jmdavisProg at gmx.com
Tue Jun 14 03:10:57 PDT 2011
On 2011-06-14 02:51, David Nadlinger wrote:
> On 6/14/11 11:20 AM, Jonathan M Davis wrote:
> > On 2011-06-14 01:51, David Nadlinger wrote:
> >> But the functions in<ctype.h> do. And there can be some
> >> locale-dependent problems even if you use only ASCII, the most prominent
> >> being the different handling of »i« in the Turkish locale:
> >> http://www.i18nguy.com/unicode/turkish-i18n.html
> >>
> >> This is probably another reason why it shouldn't be called std.ctype…
> >>
> > From the looks of it, that affects extended ASCII but not ASCII (since
> > the
> >
> > Turkish uppercase I isn't even in ASCII). It's definitely a great link
> > though. Thanks!
>
> Oh, I was probably a bit unclear – what I meant is that it affects you
> also if you use only ASCII input, since toupper('i') == 221 when your
> locale is tr_TR.ISO-8859-9.
Yes, but the result is extended ASCII, so it doesn't affect anything which
only deals with pure ASCII. ctype.h deals with extended ASCII, so locales
actually affect what it's doing. std.ctype only deals in pure ASCII, so it
wouldn't do anything which would result in a non-ASCII character, and so
locales shouldn't matter at all. However, if you _do_ want to bring locales
into it, then a locale like tr_TR.ISO_8859-9 is not going to be able to
operate purely in ASCII, since the uppercase value of i is 221, which is
extended ASCII.
So, yes I understood. It's just that as far as I can tell, locales don't
matter if you're completely restricting yourself to ASCII like std.ctype does.
And std.ctype is not going to try and deal with locales at this point (and
likely not ever). I think that that is far better left to unicode. The Turkish
locale is a great example of why you _want_ to be dealing with unicode when
dealing with locales. std.ctype is for when you're specifically restricting
yourself to ASCII (which sometimes can be very useful - e.g. with formatting
strings or regex strings where all of the special characters are ASCII; using
unicode functions would just make them slower at no benefit and would risk
changing behavior based on locale if you brought locales into it). If you're
not restricting yourself to ASCII, then std.uni is the way to go.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list