Object.toString, toHash, opCmp, opEquals

Timon Gehr timon.gehr at gmx.ch
Fri Apr 26 13:27:41 UTC 2024


On 4/26/24 05:13, Walter Bright wrote:
> On 4/25/2024 6:32 PM, Timon Gehr wrote:
>> A range is useless unless it is mutable. The range interface is 
>> inherently mutable. To iterate a range, you have to call `popFront()` 
>> on it. There is no way to have a `const popFront()`.
> 
> I agree there's no reason to have a const popFront(). But opEquals() is 
> inherently non-mutable.

That does not mean it can be D `const`. This is one of the two reasons I 
mentioned why "const correctness" is such a damaging concept for a D 
programmer.

Here, you are again conflating the logical with the physical semantics. 
It's a bit like saying "`opEquals` changes the state of the stack, hence 
obviously it cannot be `const`!", just one abstraction level higher.

Ideally, `opEquals` implements an equivalence relation. It is fine if it 
changes the representatives in the process, as long as it properly 
encapsulates the internal state such that whenever two values compare 
equal, the observable semantics of the two representatives is the same.

> Let's posit a mutating opEquals() and:
> 
> ```
> o.opEquals(o);
> ```
> 
> and the opEquals() mutated which one, or both, or what would happen if 
> it did?
> ...

"If you stop brushing your teeth, you might get cavities! There is hence 
no reason not to record video evidence!"

Anyway, here is a simple contrived example of a mutating `opEquals` that 
is not a logical problem:

```d
struct int31{
     private int payload;
     bool opEquals(int31 rhs){
         payload^=1;
         return (payload>>1)==(rhs.payload>>1);
     }
     void opBinary(string op:"+")(int31 rhs){
         return ((payload>>1)+(rhs>>1))<<1;
     }
}
```

If you want something that is actually useful, you will have to look 
into splay trees or something like that. Or e.g., maybe you have a ring 
buffer or something that compacts itself on iteration. As I said, 
amortized data structures. It may be incorrect to have a const opEquals. 
It can introduce a performance regression.

> 
>>> The utility is being able to write borrow-checker style code, so you 
>>> can avoid things like double frees.
>>> ...
>>
>> `@live` does not enable this.
> 
> ```
> auto p = q;
> free(p);
> free(q);
> ```
> ...

Well, I can just not use `malloc` and `free`. Anyway, to me this is not 
"borrow-checker style" code. This is C-style `@system` code.

>> Anyway, you are trying to impose nonsensical restrictions on 
>> garbage-collected code. I have yet to run into a double-free using GC 
>> allocation and I doubt `@live` would help me avoid that if it were a 
>> thing.
> 
> D doesn't distinguish between gc pointers and non-gc pointers. It has 
> been proposed, but I have very extensive experience with multiple 
> pointer types and it is a cure worse than the disease.
> ...

I understand that there exist bad solutions to basically any problem. 
This very thread provides ample evidence of that fact. We have `scope` 
and non-`scope` pointers and the world has not ended yet.

> 
>>> As I recall, it was you that pointed out that reference counting can 
>>> never be safe if two mutable pointers to the same ref counted object 
>>> (one to the object, the other to its interior) were passed to a 
>>> function. (Freeing the first can leave the second interior pointer 
>>> pointing to a deleted object.) The entire ref counting scheme 
>>> capsized because of this.
>> I provided the counterexample, but the unsound generalization is yours.
> 
> All it takes is one counterexample to capsize it.
> ...

Sure, I was just objecting to the characterization that I claimed a 
"Rust-style" mutation-restricting solution is the only possible one.

> 
>> (Technically, there would be ways to type check that code without 
>> banning mutation outright.)
> 
> Neither Andrei nor I nor anyone else working on it could figure out a 
> solution (other than disallowing all pointers to payload).

This is not true, it seems they just did not explain it to you. You 
could have some sort of more precise type-state system that only 
disallows operations that may deallocate the payload. This is the kind 
of thing that Rust initially explored. Anyway, I am not even saying that 
this is necessarily better, I just don't like technically wrong words 
being put into my mouth. ;)

> The borrow checker does solve it, though.
> ...

It does not, because it does not actually get aliasing under control. It 
adds checks that are incomplete in some programs, and unnecessary in 
other programs.

> 
>>> Why would anyone need toHash(), toString(), opEquals() or opCmp() to 
>>> mutate their data? Wouldn't that be quite surprising behavior?
>>>
>>
>> As I keep pointing out, there is a difference between mutating 
>> abstract data and concrete memory locations. For instance, data types 
>> with amortized guarantees usually have to reorganize the internal data 
>> representation on each query. (Think e.g. splay trees.)
>>
>> Anyway, let's for the sake of argument assume that I want to write 
>> functions that leave memory in exactly the state they encountered it 
>> in. Const will _still_ unduly restrict me because it is not 
>> fine-grained enough.
>>
>> ```d
>> import std.stdio, std.range, std.conv;
>>
>> struct S{
>>      auto r=iota(1,2);
>>      string toString()const{ return text(r); }
> 
> I agree that mutates the argument passed to toString(). That would 
> consume the range. Calling toString() again would return an empty string.
> ...

No, this is not true. `text` does not accept its argument by `ref`. The 
range stays intact. This is similar to how in:

```
int[] a = [1,2,3];
writeln(a);
```

The array `a` is not empty after printing.

> 
>> Sometimes there is not even a safe workaround to get a mutable version 
>> of a range, because of transitive `const`. A range can have 
>> indirections in its implementation.
>> This is just one example establishing that `const` is not expressive 
>> enough to say _ONLY_ "this will not mutate anything". It also spells: 
>> "This code can be a huge pain in the ass at any point in the future 
>> for dumb, incidental reasons."
>>
>> I really do not want to deal with this. I'd much rather fork Phobos so 
>> it uses non-const alternatives to toHash and toString.
> 
> I suppose it wouldn't help if I suggest:
> 
> ```
> writeln(text(r));
> ```
> ...

No, it does not. I do not see how this would help.

> I only proposed the const toString() for Object.toString(), not for 
> struct, where indeed you are free to have struct toString() do anything 
> you want.
> ...

I happen to be already using classes. Forking Phobos is less effort than 
moving to structs. Or I could just switch to OpenD I guess.

> Class and struct are fundamentally different in that class is a 
> universal hierarchy with a common root, and hence we must define what 
> that common root is. Struct, on the other hand, is rootless, and hence 
> the user can define it however he pleases.
> 
> I agree with you that Object shouldn't have had any members, and Andrei 
> and I did discuss that, but since it had members, we couldn't really 
> take them away. Note that COM classes also have a common root with one 
> member QueryInterface().
> ...

I am amazed that you want to break most D code by imposing attributes on 
common root functions, but removing functions from the common root is a 
bridge too far even though the fix is usually simply to remove `override`.

> 
>> If you expect people to prove properties to an incomplete type system 
>> via annotations and to accept unnecessary restrictions, they have to 
>> get some value out of it. You also would not go: "Starting from 
>> tomorrow, you have to prove to me that you brush your teeth every day. 
>> I want video evidence." And then, when I refuse, you can't say: "Why 
>> would you not brush your teeth?" This is what this is.
>>
>> I caution you to now not miss the forest for the trees and engage in a 
>> "tooth-brushing related" argument (e.g., proposing a different range 
>> design or something like that). This is an inherent issue. Even if you 
>> make the type system more expressive, the annotation overhead is still 
>> real, and often uneconomical.
>>
>> I am perfectly fine with having some restricted system like Rust for 
>> people who want to do safe manual memory management. This would even 
>> be useful to me. But this has to be opt-in, based on data structures, 
>> and interoperate as seamlessly as possible with the full language.
> 
> 
> I think I see your point of view. Mine is a little different.

My point of view is D-focused, yours often enough seems to be C-focused. 
There is only so much insights about D's design that can be extracted 
from issues with C's design. Actual experience with D is increasingly 
important.

You will notice that all of the experience you mention in this thread is 
with systems that do not work well. I have considerable experience with 
D, and the only memory-management related issue that I care about is use 
after free. Yet `@live` does not solve this problem for me. (I am aware 
that you can write a snippet of code that is rejected by @live for use 
after free. Personally I care about code that is accepted and hence is 
guaranteed not to have use after free.)

> I have considerable experience with C. When I see:
> 
> ```
> int foo(T* p);
> ```
> 
> Is p an array? is foo() going to mutate what it points to? Is foo() 
> going to free() it?

I agree with this point of view, this is not what I am objecting to. 
This is a "tooth-brushing related" argument.

Anyway, this is the C point of view. OTOH, in @safe D, `p` cannot be 
`free`d. It may e.g. be a GC pointer.

If you want to allow an `@safe foo` to free its argument, you will have 
to encode in the type of that argument that it is a malloc'd pointer. 
There is just no way around that unless you say "in this language, every 
non-scope pointer comes from malloc". That would be a bad outcome.

The best way to do such an encoding is to have a struct wrapper around 
the pointer, have proper move semantics and a borrow checker that works 
well, and soundly, with data abstraction. In this case, the borrow 
checker actually makes a difference in `@safe` code. Otherwise it does not.

> How would I know without reading the implementation? 
> (The documentation is always incomplete, wrong, or missing.) Annotations 
> give me confidence that I understand what it does. const/ref/scope here 
> answer my questions, and the compiler backs it up.
> ...

Your considerable experience with C contradicts your extensive 
experience with "multiple pointer types" and D's actual, existing 
DIP1000 and `const` design. I implore you to refine your position, 
otherwise it is simply internally inconsistent and hence allows you to 
dismiss any argument. This is very frustrating for an interlocutor.

Anyway, I agree that `const` and `scope` can be very useful in cases 
where they work. They are just not a panacea.

> 
>  > One thing I absolutely agree on with Robert is that it should always be
>  > _possible_ to write simple @safe D code without any advanced type system
>  > shenanigans. I think any design that strays from that principle is 
> bad. This
>  > proposed change absolutely torpedoes that.
> 
> I agree with Robert, too. I asked him to prepare a list of his proposals 
> so I can see what can be done.
> ...

One concrete thing that can be done is to change course here. If you 
want to do a breaking change, do one that causes less pain and does not 
make D code more complicated by default.

> P.S. const class Objects are more or less unusable with the non-const 
> toString, toHash, opCmp and opEquals.
> ...

`const` class Objects are more or less unusable full stop. You can't 
even have a tail-const class reference.

Yet `const` class Objects are exactly what this proposal is trying to 
impose on unsuspecting D programmers. It just does not work.

> P.P.S. all of D's annotations are subtractive. This means you can write 
> code without annotations and it'll work.

That's great, but it will sometimes not interoperate with code that has 
annotations, as in this case. Hence if you start imposing annotations on 
code, you lose this property. This would be a significant loss for the 
approachability of D, particularly as a first language. Furthermore, it 
is also a slap in the face to experienced D developers that have come to 
understand the limitations and proper applications of D's annotations.

> But safe, probably not.
> ...

I do not understand. Do you agree with Robert or not?

A big strength of D is that you can start out prototyping stuff with the 
GC without unnecessary annotation overhead and then often it will be 
good enough. If it is not, you can then explore different memory 
management options, surgically for the parts of the program state where 
that actually makes a difference. Only at this point is it then okay to 
expect people to annotate things if they want checked safety.

> P.P.P.S. I almost never write a multiple free bug these days. But that 
> doesn't translate to "don't need double free protection", as I spent 
> many years making that mistake and tracking them down. I even wrote my 
> own malloc/free debugger to help. Eventually, I simply internalized what 
> not to do. But that isn't a transferable skill. I can't even explain 
> what I do.
> ...

As I said many times, if you want `@live` to be a linter to avoid manual 
memory management bugs in `@system/@trusted` functions that avoid proper 
data abstraction with constructors and destructors, that is fine. But 
you cannot hold this position and at the same time turn around and claim 
it does anything for `@safe` reference counting. It just does not. A 
more careful approach is needed.

> Anyhow, thanks for the food for thought!
> 

My pleasure! Here is some more: Why did you not propose to add `pure` to 
the signatures? How about `@nogc`? `nothrow`? `@safe`? Why is `toHash` 
`@trusted nothrow`, but not other functions?


More information about the Digitalmars-d mailing list