Algorithms should be free from rich types

Mon Jul 3 19:27:45 UTC 2023

On Mon, Jul 03, 2023 at 02:30:14PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 7/3/23 2:05 PM, H. S. Teoh wrote:
[...]
> > I think we all agree that the mechanics of this won't (and
> > shouldn't) change. But I think the OP was arguing at a higher level
> > of abstraction.  It isn't so much about whether private should be
> > overridable or not, or even whether some piece of data in an object
> > should be private or not; the question IMO is whether the library
> > could have been designed in such a way that there's no *need* for
> > private data in the first place. Or at least, the need for such is
> > minimized.
> > 
> > A library with tons of private state and only a rudimentary public
> > API is generally more likely to have situations where the user will
> > be left wishing that there were a couple more knobs to turn that can
> > be used to customize the library's behaviour.
> 
> But that's the thing, there are parts that *simply must be private*.
> No matter how you cut it, it has to have some level of privacy,
> because otherwise, you can't enforce semantic invariants with the
> type.
> 
> Should array length (not the property, but the actual data field) be
> public?  What about the pointer? Of course not. Yet, you still might
> want to access those things for some reason. That doesn't mean it's
> worth a change to public just for that one reason.

We're actually agreeing with each other, y'know. :-D

As I said, the *ideal* is that you wouldn't have private state, or that
the private state would be minimal.  In practice, of course, certain
things *should* be private, and that's not a problem. The problems the
OP described arise when either private is used carelessly, causing
things to be private that really need not be, or the API is poorly
designed, so that parts of the library that ought to be reusable aren't
just because of some arbitrary decision made by the author.

I've never heard people complaining about how the array length data
field is private, for example.  That's because it being private does not
hinder the user from doing whatever he wants to do with the array (short
of breaking the implementation and doing something involving UB, of
course).  That's an example of proper usage of private.

An example of where private hinders what a user might wish to do is an
algorithm used internally by the library, that for whatever reason is
private and unusable outside of the library code, even though the
algorithm itself is general and can be applied outside of the scope of
the library.  Often in such cases there are immediate pragmatic reasons
for it -- the implementation of the algorithm is bound to internal
implementation details of other library code, for example. So you can't
actually make it public without also making lots of things public that
probably shouldn't be.  But at a higher level, one asks the question,
why is that algorithm implemented in that way in the first place?  It
could have been implemented generically, and the library could have used
just a specialized instance of it to solve whatever it is it needs to
solve, but the algorithm itself should be available for user code to
use.  *That's* the proper design.

But alas, all too often this is not done, and you end up with 5
different implementations of the same algorithm, each with different
quirks (and often, different subsets of bugs), and all of them are
locked up behind `private`, or require some tangential private structure
as argument that isn't constructible except via a long-winded circuitous
route that probably doesn't do what the user actually wants it to do,
even though the algorithm itself doesn't actually depend on this.

Ultimately these details are just the incidental symptoms. The
underlying root cause is a poor design that doesn't correctly decouple
orthogonal functionality into reusable pieces.

--T