UFCS in generic libraries, silent hijacking, and compile errors.

aliak something at something.com
Tue Mar 13 23:42:49 UTC 2018


On Sunday, 11 March 2018 at 15:24:31 UTC, Jonathan M Davis wrote:
> On Sunday, March 11, 2018 08:39:54 aliak via 
> Digitalmars-d-learn wrote:
>> On Saturday, 10 March 2018 at 23:00:07 UTC, Jonathan M Davis
>> > issue in practice. That doesn't mean that it's never a 
>> > problem, but from what I've seen, it's very rarely a 
>> > problem, and it's easy to work around if you run into a 
>> > particular case where it is a problem.
>>
>> Ya, it's easy to work around but the caveat there is you need 
>> to realize it's happening first, and add that to that it's 
>> "rarely a problem" and well ... now it seems scary enough for 
>> this to mentioned somewhere I'd say.
>
> You're talking about a situation where you used a function 
> whose parameters match that of a member function exactly enough 
> that a member function gets called instead of a free function. 
> That _can_ happen, but in most cases, there's going to be a 
> mismatch, and you'll get a compiler error if the type defines a 
> member function that matches the free function. I don't think 
> that I have ever seen that happen or ever seen anyone complain 
> about it. The only case I recall along those lines was someone 
> who was trying to use a free function that they'd decided to 
> call front instead of something else, and it had parameters 
> beyond just the input range, so that programmer got compilation 
> errors when they tried to use it in their range-based functions.

Not saying it's common, just something to be aware of that is 
non-obvious (well it was not to me at least when I started 
getting in to D). It's _probably_ not going to be a problem, but 
if it ever is then it's going to be a very hard to detect one. 
And sure, the solution is to just not use ufcs to be certain, but 
ufcs is pretty damn appealing, which is probably why I didn't 
realize this at the beginning. As generic codes bases grow, the 
chances of this happening is certainly not 0 though.

>
> Essentially yes, though you're passing too many arguments to 
> put. There are cases where put(output, foo) will compile while 
> output.put(foo) will not. In particular, 
> std.range.primitives.put will accept both individual elements 
> to be written to the output range and ranges of elements to be 
> written, whereas typically, an output range will be written to 
> only accept an element at a time. It's even more extreme with 
> output ranges of characters, because the free function put will 
> accept different string types and convert them, and even if the 
> programmer who designed the output range added various 
> overloads to put for completeness, it's enough extra work to 
> deal with all of the various character types that they probably 
> didn't. And put also works with stuff like delegates (most 
> frequently used with a toString that accepts an output range), 
> which don't have member functions. So, if you write your 
> generic code to use the member function put, it's only going to 
> work with user-defined types that define the particular 
> overload(s) of put that you're using in your function, whereas 
> if you use the free function, you have more variety in the 
> types of output ranges that your code works with, and you have 
> more ways that you can call put (e.g. passing a range of 
> elements instead of a single element).

Ooh ouch, well that's certainly good to know about.

>>
>> Basically I don't see a reason why we wouldn't want the 
>> following to work:
>>
>> struct S { void f() {} }
>> void f(S s, int i) {}
>> S().f(3); // error
>
> So, are you complaining that it's an error, or you want it to 
> be an error? As it stands, it's an error, because as far as the 
> compiler is concerned, you tried to call a member function with 
> an argument that it doesn't accept.

Complaining that it is an error :) well, not complaining, more 
trying to understand why really. And I appreciate you taking the 
time to explain. There're a lot of points in there so here we 
go...

>
> If you want that code to work, then it would have to add the 
> free function to the overload set while somehow leaving out the 
> overloads that matches the member function, which isn't how D 
> deals with overloading at this point.

Yeah, I'd say that's an implementation detail, but the main idea 
would be to treat an overload set that completely fails as an 
undefined function so that ufcs would kick in. Your problems with 
put would also go away then and implementing an output range 
would be less of a hassle.

> But if it did, then you have problems as soon as the type adds 
> another member function overload.

I'm not sure I see how. The member function would win out. This 
is the situation now anyway, with the added (IMO) disadvantage of 
ufcs being unusable then.

> Also, if you have a free function that matches the name of a 
> member function but where their parameters don't match, 
> wouldn't they be unrelated functions?

Well, maybe. The free function takes T as the first parameter so 
it's certainly related to the type. I suppose they are unrelated 
in the same way that:

struct S { f() {} }
g(S s) {}

g and f are unrelated.

> At that point, if you wrote code that accidentally matched the 
> free function instead of the member function, you end up with 
> code hijacking.

I'm not sure if code hijacking is the correct term here. This is 
a programmer error. It's exactly the same as if you have f(int) 
and f(long) and you call f(3) expecting to call f(long). Or if 
you have f(int, int) and f(int) and you accidentally type f(1) 
instead of f(1, 1).

> Just because you made a mistake when typing the code, you 
> called entirely the wrong function, and it's very hard to see, 
> because the function names match. Hopefully, testing will catch 
> it (and there's a decent chance that it will), but essentially, 
> the member function has been hijacked by the free function.

The exact same arguments can be made against function overloading 
here. This is as much a hijack as calling the wrong overload.

>
> D's overload rules were written with a strong bias towards 
> preventing function hijacking. To an extent, that's impossible 
> once UFCS comes into play, and Walter went with the choice that 
> hijacked the least and was the simplest to deal with.

Ya, I can understand it's a hard problem. So as it stands now, a 
member function can hijack an intended ufcs call of a free 
function. The case you've mentioned above though I'm not sure 
qualifies as hijacking. In the above case where a programmer 
accidentally types a name wrong, or parameters wrong, they've 
made a mistake. They wanted to call function f but they typed it 
wrong so they're calling function g. In this other case where a 
member function hijacks a ufcs call, the programmer intended to 
call f, typed f, but is somehow calling g.

> Basically, once UFCS comes into play, you have these options:
>
> 1. Put all of the functions in the overload set.
> 2. The member function wins.
> 3. The free function wins.
> 4. Have a pseudo-overload set where when there's a conflict 
> between a member
>    function and a free function, the member function wins, but 
> free
>    functions that don't match can be called as well.
> 5. Have a pseudo-overload set where when there's a conflict 
> between a member
>    function and a free function, the free function wins, but 
> member
>    functions that don't match can be called as well.
>
> If it's ever the case that the free function wins, then you 
> can't call the member function if the free function is 
> available, which definitely causes problems, so #3 and #5 are 
> out.  If all of the functions are in the overload set, then 
> you're in basically the same boat, because you can't call the 
> member function if there's a conflict. It's just that the free 
> function results in a compilation error as well without using 
> an alias or the full import path or some other trick to get at 
> the free function. So, #5 is out.

3, 5 and 1, yes, all out, completely agree here.

>
> That leaves #2 and #4. And as I said, aside from the fact that 
> #4 doesn't fit with how D does overloads in general, you run 
> the risk of the free function hijacking the member function 
> whenever there's a mistake, and you have problems whenever the 
> member functions are altered, making it so that which function 
> gets called can change silently.

I understand that #4 does not fit with how D currently does 
overloads in general. And you make a good point of getting a 
silent ufcs call if you alter a member function after the fact 
though. That would certainly be unwanted.

Hmm... ok touche on that part. I think I may agree with the 
current D implementation just because of that last point of yours 
now. I'm not entirely sure yet, need to think about it.

Now I'm thinking that if you really want to write a utility 
function that acts on generic code, but you also want to allow 
specialization by a type, then this (not sure it works, not 
tested):

void util(T, U)(T t, U u) if (hasMember!(T, "util") && 
is(typeof(t.util(u)))) {
     t.util(u);
}

void util(T t, int a) // int case
void util(T t, string a) // string case
void util(T, U)(T t, U u) {
   // generic case, probably needs constraints I can't think of 
though.
}

And then later:

void g(T)(T t) {
   util(t, 3);
}

Now you get all your cases handled and no compilation error if T 
implements one of the cases of util but not the others (I wonder 
if free function put does this?)

> So, that leaves #2, which is what we have.
>
> Basically, D's overload rules are designed to favor compilation 
> errors over the risk of calling the wrong function, and while 
> its import system provides ways to differentiate between free 
> functions, it really doesn't provide a way to differentiate 
> between a member function and a free function except via 
> whether you use UFCS or not. And when those facts are taken 
> into account, it makes the most sense for member functions to 
> just win whenenver a free function and a member function have 
> the same name. It also has the bonus that it reduces 
> compilation times, because if a free function could ever trump 
> a member function or was in any fashion included in its 
> overload set, then the compiler would have to check all of the 
> available functions when UFCS is used instead of looking at the 
> member functions and then only looking at free functions if 
> there was no member function with that name.
>
> - Jonathan M Davis

I'm not giving you the compilation times bonus point :p Yes I do 
agree it saves time but I doubt this would be an issue that would 
stop implementation if the things above were not an issue.

Cheers, thanks again for taking the time.
- Ali




More information about the Digitalmars-d-learn mailing list