ranges of characters and the overabundance of traits that go with them

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Mon Mar 13 12:08:12 PDT 2017


Ugh.  What a horrible mess!

I think, instead of wading through the specifics and losing sight of the
forest for the myriad trees, we should take a step back and consider the
big picture.

1) First of all, the user-facing API should be as simple as possible,
and things like which overload gets which type are, apropos Walter's
recent post, implementation details that ought not to be exposed to the
user directly.  So instead of:

	auto func(R)(R r)
		if (isInputRange!R && ...) { ...}
	auto func(C)(C[] s)
		if (isSomeChar!C) { ... }
	auto func(R)(R r)
		if (isWhoKnowsWhat!R && !isNotWhoKnowsWhatElse!R) { ... }

we really should consolidate the whole overload set into a single
template function:

	auto func(R)(R r)
		if (/* user-readable constraints */)
	{
		static if (isWhatever!R)
			return implA(r);
		else static if (isWhateverElse!R)
			return implB(r);
		else ...
		else static assert(0, niceErrorMessageHere);
	}

implA, implB, etc., can then be useful for deprecating stuff that's
otherwise hard to deprecate, like no longer accepting implicitly
convertible enums, alias this, or whatever.

2) The /* user-readable constraints */  ought to be one of a small
number of self-describing templates that tell the user exactly what the
*intent* of the function is (note, *intent*, as in, implementation
limitations should not be a factor).  For example, is the function
operating on r in a range-like way? Or is it primarily treating r as a
more-or-less opaque blob that has string-like characteristics (e.g.,
pass it as a filename to the OS)?  I think there are only a very small
number of such cases, and Phobos' user-facing API really should limit
themselves to only these cases and nothing more.

The current mess of isSomeString, isConvertibleToString, isNarrowString,
etc., really should be internal to Phobos, and should not be exposed to
the user at all. At the most, I'd say just ONE template that works for
all strings and string-like ranges (and whatever else we wish to
include) ought to be exposed to the user.  What should be done within
Phobos itself (in private std.* space) is another matter -- we do need
to handle the dirty details like auto-decoding, narrow strings, etc.,
here. But the point is that this should be *internal* to Phobos, not
exposed to the user like dirty laundry.


3) The purpose of a particular template function, concerning which
Jonathan wrote:

> In general, when you templatize a function involving strings, you're
> doing one of
> 
> - create a generic function that works on any range (which just so
>   happens to include strings)
> - create a function that specifically operates on strings (but is
>   templated so that it can operate on different constness or operate
>   on different character types)
> - create a function that specifically operates on a range of
>   characters, which happens to include strings
> - create a function that operates on strings or ranges of characters
>   or operates on types that implicitly convert to a string type
> - templatize a function that used to take string (or took any dynamic
>   array of characters) and make it take any range of characters

I think the first case (generic function that works on any range) is the
simplest to handle. It just accepts anything that's a range of the level
of functionality (input, forward, bidi, etc.) required by the function,
and that's good enough.  Everything else, like whether something is a
narrow string, enum, alias this, or whatever we decide to support / not
support, should be handled as implementation details. The user shouldn't
have to care about this. Inside the user-facing function, we can either
use static ifs or overloads of private implementation functions to do
whatever needs to be done to handle all those different cases.

For the rest of the cases, my question is, is there any reason to *not*
accept any arbitrary range of characters? Sure, the current
implementation may not be able to handle all the different combinations,
but again, that's implementation details.  In fact, I'd even say that as
far as is possible, we should try to generalize functions into the
previous category (works on any range).  If that's not possible, e.g.,
the function relies on the fact that the elements must be some kind of
characters, then I'd say it should, if at all possible, accept *any*
range of characters regardless of the specifics (it's a string / wstring
/ dstring, it's a user-defined range over char/wchar/dchar, etc.).

If there's any function that *cannot* be generalized into something that
accepts any arbitrary range of characters, I'd like to know about it.
I'm expecting that if there are any, they should be in the minority, so
they shouldn't need special template constraints specially dedicated for
their special use case -- we should just write out explicit sig
constraints (in combination with the "standardized" constraints on
ranges, etc.).

So basically, this leaves us really with just two general categories:

- Generic range function: use isInputRange, isForwardRange, etc., in the
  sig constraints.

- Function that needs to operate on some kind of characters: either have
  a general constraint, say isRangeOfChar (tentative name), or just use
  is{Input,Forward,...}Range!R along with is(ElementType!R : dchar) or
  isSomeChar!(ElementType!R).

Anything that doesn't fall into these 2 categories shouldn't have
dedicated sig constraints (i.e., named public templates like
isSomeString), IMO, but should just explicitly list their constraints.

I'm aware that there are currently functions that for whatever reason
require the argument to be some kind of array of characters.  I question
whether the argument being an array is a *necessary* requirement. Maybe
there are a few functions that wouldn't make sense otherwise, I'm not
sure, but IMO most of them ought to be generalizable to one of the above
two generic categories (i.e., accept any range).  If the current
implementation only works with actual arrays, that's an implementation
detail; it should be handled thus:

	auto myFunc(R)(R r)
		if (isRangeOfChar!R)
	{
		static if (is(R : C[], C))
			/* current implementation *?
		else assert(0, "Not implemented yet");
	}

I.e., what the user sees should be the most generic API that makes sense
relative to this function.  For cases where the implementation can't
(yet) handle, a helpful error message is given, and the docs should also
indicate the present limitations.


4) Implicit conversions:  this is a tricky one, esp. once you factor in
alias this.  Am I right to assume that this is mostly coming from
functions that used to take strings explicitly, but later were
templatized, thus losing some of the original implicit conversions?  If
so, I'm inclined to do this:

	auto myFunc(R)(R r)
		/* N.B. no sig constraints */
	{
		static if (isConvertibleTo!(R,x))
			...
		... /* handle dirty laundry cases internally here */
	}

then document in the ddocs exactly what is expected of the incoming
type. I.e., accept everything, then sort out the different cases
internall as an implementation detail.


5) What to do about the whole mess that is isSomeString,
isAutodecodableString, isImplicitlyConvertibleToString, etc.: I'm
tempted to say most of these variants ought to be private to Phobos, and
users should not even see them.  In fact, I'd even go as far as saying
Phobos internal code should, as much as possible, use explicit
constraints (i.e., spell out is(ElementType!R : dchar) rather than hide
it behind yet another similar-but-subtly-different name like
isSomeString).  Phobos maintainers ought to be able to work with complex
explicit constraints, one would hope.  But none of this should be
visible to the user.


--T


More information about the Digitalmars-d mailing list