UFCS in generic libraries, silent hijacking, and compile errors.

Sun Mar 11 15:24:31 UTC 2018

On Sunday, March 11, 2018 08:39:54 aliak via Digitalmars-d-learn wrote:
> On Saturday, 10 March 2018 at 23:00:07 UTC, Jonathan M Davis
> > issue in practice. That doesn't mean that it's never a problem,
> > but from what I've seen, it's very rarely a problem, and it's
> > easy to work around if you run into a particular case where it
> > is a problem.
>
> Ya, it's easy to work around but the caveat there is you need to
> realize it's happening first, and add that to that it's "rarely a
> problem" and well ... now it seems scary enough for this to
> mentioned somewhere I'd say.

You're talking about a situation where you used a function whose parameters
match that of a member function exactly enough that a member function gets
called instead of a free function. That _can_ happen, but in most cases,
there's going to be a mismatch, and you'll get a compiler error if the type
defines a member function that matches the free function. I don't think that
I have ever seen that happen or ever seen anyone complain about it. The only
case I recall along those lines was someone who was trying to use a free
function that they'd decided to call front instead of something else, and it
had parameters beyond just the input range, so that programmer got
compilation errors when they tried to use it in their range-based functions.

I think that this is really a theoretical concern and not a practical one.
Certainly, it's really only going to potentially be an issue in library code
that gets used by a ton of folks with completely unexpected types. If it's
in your own code, you're usually well aware of what types are going to be
used with a generic function, and proper testing would catch the rare case
where there would be a problem. If you're really worried about it, then just
don't use UFCS, but for better or worse, it seems to be the case that the
vast majority of D programmers use UFCS all the time and don't run into
problems like this.

> > The one case that I am aware of where best practice is to avoid
> > UFCS is with put for output ranges, but that has nothing to
> > with your concerns here. Rather, it has to do with the fact
> > that std.range.primitives.put has a lot of overloads for
> > handling various arguments (particularly when handling ranges
> > of characters), and almost no one implements their output
> > ranges with all of those overloads. So, if you use put with
> > UFCS, you tend to run into problems if you do anything other
> > than put a single element of the exact type at a time, whereas
> > the free function handles more cases (even if they ultimately
> > end up calling that member function with a single argument of
> > the exact type). We probably shouldn't have had the free
> > function and the member function share the same name.
>
> Oh, can you share a short example here maybe? Not sure I followed
> completely
>
> Is it basically:
>
> // if r is output range
>
> r.put(a, b) // don't do this?
>
> put(r, a, b) // but do this?
>
> (Cause compilation error)

Essentially yes, though you're passing too many arguments to put. There are
cases where put(output, foo) will compile while output.put(foo) will not. In
particular, std.range.primitives.put will accept both individual elements to
be written to the output range and ranges of elements to be written, whereas
typically, an output range will be written to only accept an element at a
time. It's even more extreme with output ranges of characters, because the
free function put will accept different string types and convert them, and
even if the programmer who designed the output range added various overloads
to put for completeness, it's enough extra work to deal with all of the
various character types that they probably didn't. And put also works with
stuff like delegates (most frequently used with a toString that accepts an
output range), which don't have member functions. So, if you write your
generic code to use the member function put, it's only going to work with
user-defined types that define the particular overload(s) of put that you're
using in your function, whereas if you use the free function, you have more
variety in the types of output ranges that your code works with, and you
have more ways that you can call put (e.g. passing a range of elements
instead of a single element).

> How about if it's not part of the overload set, but is looked up
> if the function does not exist in the overload set. What would
> the problems there be?
>
> Basically I don't see a reason why we wouldn't want the following
> to work:
>
> struct S { void f() {} }
> void f(S s, int i) {}
> S().f(3); // error

So, are you complaining that it's an error, or you want it to be an error?
As it stands, it's an error, because as far as the compiler is concerned,
you tried to call a member function with an argument that it doesn't accept.

If you want that code to work, then it would have to add the free function
to the overload set while somehow leaving out the overloads that matches the
member function, which isn't how D deals with overloading at this point. But
if it did, then you have problems as soon as the type adds another member
function overload. Also, if you have a free function that matches the name
of a member function but where their parameters don't match, wouldn't they
be unrelated functions? At that point, if you wrote code that accidentally
matched the free function instead of the member function, you end up with
code hijacking. Just because you made a mistake when typing the code, you
called entirely the wrong function, and it's very hard to see, because the
function names match. Hopefully, testing will catch it (and there's a decent
chance that it will), but essentially, the member function has been hijacked
by the free function.

D's overload rules were written with a strong bias towards preventing
function hijacking. To an extent, that's impossible once UFCS comes into
play, and Walter went with the choice that hijacked the least and was the
simplest to deal with. Basically, once UFCS comes into play, you have these
options:

1. Put all of the functions in the overload set.
2. The member function wins.
3. The free function wins.
4. Have a pseudo-overload set where when there's a conflict between a member
   function and a free function, the member function wins, but free
   functions that don't match can be called as well.
5. Have a pseudo-overload set where when there's a conflict between a member
   function and a free function, the free function wins, but member
   functions that don't match can be called as well.

If it's ever the case that the free function wins, then you can't call the
member function if the free function is available, which definitely causes
problems, so #3 and #5 are out. If all of the functions are in the overload
set, then you're in basically the same boat, because you can't call the
member function if there's a conflict. It's just that the free function
results in a compilation error as well without using an alias or the full
import path or some other trick to get at the free function. So, #5 is out.

That leaves #2 and #4. And as I said, aside from the fact that #4 doesn't
fit with how D does overloads in general, you run the risk of the free
function hijacking the member function whenever there's a mistake, and you
have problems whenever the member functions are altered, making it so that
which function gets called can change silently. So, that leaves #2, which is
what we have.

Basically, D's overload rules are designed to favor compilation errors over
the risk of calling the wrong function, and while its import system provides
ways to differentiate between free functions, it really doesn't provide a
way to differentiate between a member function and a free function except
via whether you use UFCS or not. And when those facts are taken into
account, it makes the most sense for member functions to just win whenenver
a free function and a member function have the same name. It also has the
bonus that it reduces compilation times, because if a free function could
ever trump a member function or was in any fashion included in its overload
set, then the compiler would have to check all of the available functions
when UFCS is used instead of looking at the member functions and then only
looking at free functions if there was no member function with that name.

- Jonathan M Davis