Naming conventions for functions in similar modules

Wed Jun 22 14:25:50 PDT 2011

On 2011-06-22 14:06, Jonathan M Davis wrote:
> On 2011-06-22 13:30, Lars T. Kyllingstad wrote:
> > On Wed, 22 Jun 2011 09:53:39 -0700, Walter Bright wrote:
> > > On 6/22/2011 4:47 AM, Lars T. Kyllingstad wrote:
> > >> One problem: std.uni only contains functions for dealing with upper/
> > >> lower case and for checking whether something is an alpha character.
> > >> If you want the other functions, such as isDigit(), isPunctuation(),
> > >> etc. you still have to import std.ascii. And once you have imported
> > >> both std.uni and std.ascii, you are forced to disambiguate every time
> > >> you call a function which exists in both.
> > > 
> > > True, but I don't see much of an improvement of:
> > > toAsciiLower()
> > > 
> > > over:
> > > std.ascii.tolower()
> > > 
> > > at least as far as typing goes.
> > 
> > I agree with that. My point was that maybe std.unit should also have
> > functions such as isDigit(), isPunctuation() etc. I suppose we want to
> > encourage the use of std.uni over std.ascii in most cases, since D is
> > supposed to handle Unicode out of the box.
> 
> Oh, std.uni will likely end up with functions for that eventually, but it
> might be a while. In addition to someone actually taking the time to figure
> out the correct way to implement those functions, there's also the question
> of locales. Eventually, we're going to have to figure out how/if we want
> them to figure into std.uni, since that could have a big effect on various
> functions (including those which are already in there).
> 
> So, I don't think that there's much question that std.uni should be
> expanded, but it's a lot more work to write the unicode versions than it
> is to write the ASCII versions, and someone has to take the time to do it.
> If no one else does it, I'll probably get around to it eventually, but it
> could be quite a while before that. Time and manpower are really the
> limiting factors here, not a lack of desire to have excellent unicode
> handling. And then, of course, there's the question of what to do about
> graphemes...
> 
> Having someone who's actually both knowledgable about unicode and good with
> D take on such issues would be a big boon for us.

I would point out though that even if std.uni had unicode versions of 
functions for every function that std.ascii has that doesn't really solve the 
name clash problem in the general case. Sure, you could choose to always use 
std.uni and never use std.ascii, but there will be other modules with 
identical function names where you will _want_ to intermix them - e.g. 
std.algorithm and std.parallel_algorithm. So, if those functions are truly 
interchangeable, then you're going to have to deal with name clashes on a 
regular basis when you deal with them. So, ultimately, the issue of whether 
std.uni has an isDigit or not is a bit of a side issue. Whenever you're 
dealing with two modules which share a lot of function names, name clashes are 
more or less inevitable. The only way to avoid it is to not use them together, 
and that's not necessarily going to be an option. It may be with std.ascii and 
std.uni, but it's unlikely to be with std.algorithm and 
std.parallel_algorithm, and who knows whether it will be with any future such 
module pairs that we come up with.

Not to mention, I'm not sure how useful a unicode isDigit really would be, 
much as it should exist. That's one case where I would think that you would 
virtually always want ASCII/European digits and not whatever stray symbols 
happen to qualify as digits in unicode. And that opens up a whole can of worms 
with regards to what you'd do with something like std.conv.to and 
std.conv.parse. Should they just care about std.ascii.isDigit, or should they 
use std.uni.isDigit? I expect that the odds are that very few people - if any 
- would really care about unicode digits in such conversions (certainly the 
common case would be that they don't), and depending on how efficient 
std.uni.isDigit manages to be, the extra cost of using it instead of 
std.ascii.isDigit could be painful in code which deals heavily with strings. 
So, we're going to have plenty of questions to resolve with regards to that 
sort of thing somewhere down the line.

In any case, while having to mix std.ascii and std.uni at the moment may be 
annoying, it's not a problem with is going to be unique to them, so it's not 
like resolving the lack of unicode versions of functions which are in 
std.ascii solves the general issue being discussed here. It just reduces its 
impact in this one instance.

- Jonathan M Davis