Rename std.ctype to std.ascii?

Jonathan M Davis jmdavisProg at gmx.com
Tue Jun 14 12:41:33 PDT 2011


On 2011-06-14 11:53, Jouko Koski wrote:
> "Jonathan M Davis" <jmdavisProg at gmx.com> wrote:
> > So, yes I understood. It's just that as far as I can tell, locales don't
> 
> matter if you're completely restricting yourself to ASCII like std.ctype
> does.
> 
> I would not consider it being good idea to include this kind of ascii-only
> utilities in the standard-ish library. It might be best to rename the
> module to std.ascii_for_insular_yankees_others_keep_away so that nobody
> would use it by accident. This way the name would also remind us about the
> historical terms which were used quarter of a century ago when ascii-only
> <ctype.h> utilities were first suggested to the intenational C
> standardization committee.

For some classes of operations, it makes perfect sense to be checking for 
ASCII characters only. For others, it's just people not worrying about 
internationalization like they should be. For instance, format strings don't 
care about unicode as far as their escape sequences go. %a, %d, etc. are all 
pure ASCII. So, worrying about unicode with them just wouldn't make sense. In 
most cases, isDigit working on the arabic numerals 0 through 9 is _exactly_ 
what people want and need. But if you were to try and make it more unicode-
friendly, would Greek or Chinese numbers count as digits? Maybe, maybe not. It 
gets much more complicated. In some cases, all you care about with isUpper or 
toUpper is ASCII. In others, you want it to deal with unicode (and probably 
locales as well) properly.

std.ctype/std.ascii deals with ASCII for those situations where you really do 
only care about ASCII. It deals with unicode characters, but it returns false 
for everything with them which returns a bool, and it never tries to change 
their case. std.uni actually deals with unicode and worries about things like 
whether a unicode character is uppercase or not.

They're for two different use cases. Most of Phobos should be dealing with 
unicode (e.g. pretty much everything in std.string should be using the std.uni 
functions rather than the std.ascii functions if there's a function which is 
in both), but there are cases where unicode doesn't matter, and you might as 
well have the efficiency available of just dealing with ASCII. Ultimately, 
it's up to the programmer to do the right thing.

- Jonathan M Davis


More information about the Digitalmars-d mailing list