Algorithms should be free from rich types

H. S. Teoh hsteoh at qfbox.info
Wed Jul 5 18:15:57 UTC 2023


On Mon, Jul 03, 2023 at 10:14:38PM -0400, Steven Schveighoffer via Digitalmars-d wrote:
> On 7/3/23 3:27 PM, H. S. Teoh wrote:
[...]
> > As I said, the *ideal* is that you wouldn't have private state, or
> > that the private state would be minimal.  In practice, of course,
> > certain things *should* be private, and that's not a problem. The
> > problems the OP described arise when either private is used
> > carelessly, causing things to be private that really need not be, or
> > the API is poorly designed, so that parts of the library that ought
> > to be reusable aren't just because of some arbitrary decision made
> > by the author.
> 
> 
> If you carelessly label your fields as public, then realizing later
> they should have been private is costly, maybe impossible.

Depends.  D is flexible enough that public fields can be replaced with
access functions, and almost all downstream code doesn't have to change
to adapt to it.  I've done it a lot in my own code, where some field,
say mydata, was previously public but now needs to be private. No
problem: just rename it to _mydata, and create access functions
mydata() and mydata(typeof(_mydata)) to maintain compatibility with old
code.  Unless downstream code does something like take an address of the
old field, this change will be transparent, a recompile will make it all
work as before without requiring further changes.


> If you carelessly label your fields as private, while it might upset
> some people, making them public later is easy.

The point is that it then bottlenecks on the author. If the author is
not responsive for whatever reason (busy, abandoned the project, etc.)
downstream users are stuck up the creek without a paddle.


> So if you are going to "not care" about public/private, technically
> the less risky choice is to make everything private, and worry about
> it later if it becomes an issue. So in that sense I disagree with the
> OP point.

OK, I guess we differ on this point.  Given the choice between having to
wait for a potentially MIA author to fix an issue and having the ability
to go under the hood to manually work around the issue, I choose the
latter.


> That being said, I've done a lot of libs where I just don't care and
> leave everything public. It's mostly because I don't expect widespread
> usage, and I also don't mind breaking peoples code (I don't think any
> of my projects that I started are past 1.0 yet). But something like
> Phobos shouldn't be so careless. We really should continue to make
> careless things private unless there is a good reason to make them
> public.

I guess this has to be judged on a case-by-case basis.


> > I've never heard people complaining about how the array length data
> > field is private, for example.  That's because it being private does
> > not hinder the user from doing whatever he wants to do with the
> > array (short of breaking the implementation and doing something
> > involving UB, of course).  That's an example of proper usage of
> > private.
> 
> It's an obvious example that we all can agree on. If we agree there
> are clearly cases where private is important, than we start working
> our way back to where the line should be drawn.

My personal criteria is, if something can be designed without private
(and without opening up holes that may allow user code to break stuff),
prefer that design.  Barring that, prefer the design that has the least
amount of private possible for it to work without opening up loopholes
for breakage.

In general, I don't quite agree with e.g. Java's approach of making
everything private by default and having only member functions mediate
access to private state.  My approach is to prefer POD types that hold
public data that anybody can safely mutate, and public functions that
operate on said POD types, rather than the closed-box approach advocated
by OO.

There's a time and place for the closed-box approach, of course. But in
my book, that's the less preferred option that you'd fall back on only
if you couldn't do it another way.  And even when you can't avoid the
closed-box approach, my preference is to minimize the degree of
closedness as much as possible.


> > An example of where private hinders what a user might wish to do is
> > an algorithm used internally by the library, that for whatever
> > reason is private and unusable outside of the library code, even
> > though the algorithm itself is general and can be applied outside of
> > the scope of the library.  Often in such cases there are immediate
> > pragmatic reasons for it -- the implementation of the algorithm is
> > bound to internal implementation details of other library code, for
> > example. So you can't actually make it public without also making
> > lots of things public that probably shouldn't be.  But at a higher
> > level, one asks the question, why is that algorithm implemented in
> > that way in the first place?  It could have been implemented
> > generically, and the library could have used just a specialized
> > instance of it to solve whatever it is it needs to solve, but the
> > algorithm itself should be available for user code to use.  *That's*
> > the proper design.
> 
> I agree that some things shouldn't be private. But what's the answer?
> When it should be public, just change it to public!

It's not always so simple, though.  The algorithm might have been
implemented in a way that depends on private types and internal
assumptions that may break in unforeseen ways if you use it without
realizing what the assumptions are.  Forcibly changing it to public may
require you to make other stuff public that shouldn't be.  Or it may be
written in a way that's tightly coupled to other internal library code,
such that you can't call it separately.

This gets particularly frustrating when the core of the algorithm itself
does *not* depend on these things, but the upstream author wrote it that
way because "it's private, so nobody cares if this code is dirty and
badly designed". Being able to hide bad code behind private encourages
this kind of one-off hacks that avoids having to think about proper
code decomposition.


> An actual example of this in Phobos is the absence of a binary search
> algorithm. It's there, in SortedRange. But that implementation is
> private basically for no good reason (it can be trivially extracted
> into its own function). And SortedRange in itself is a schizophrenic
> meld of overbearing restrictions and puzzling allowances.

Yeah, that binary search function really ought to be public.

I think by now, experience has more than proven that SortedRange was a
mistake.  It was an attempt to encode the sortedness of a range in the
type system such that Phobos would be able to take advantage of this to
provide performance improvements, but D's type system simply isn't
powerful enough to express what's needed for this without unnecessary
limitations and the weird quirks you see in the current implementation
of SortedRange.

It was an interesting and ambitious experiment, but I think it has run
its course and the conclusion is that it doesn't work in the current
language. Or at least isn't pulling its own weight given its current
limitations.  Perhaps it's time to send it to the scrap yard.


> The only reason I haven't made a PR for it is I just made a copy in my
> own code and have moved on. But it would probably be pretty trivial to
> expose.
[...]

IMO, we should just get rid of SortedRange and make the binary search
algo a public function.

Or even if we don't get rid of SortedRange (breakage of existing code
and all that), I don't see why the binary search function shouldn't be
publicly available. This is exactly the kind of abuse of `private` I was
talking about: the function is clearly there and ready to use, but the
author for various reasons decided that no, you're not allowed to just
call the function, you have to jump through this here set of hoops to
prove your worthiness first.


T

-- 
My father told me I wasn't at all afraid of hard work. I could lie down right next to it and go to sleep. -- Walter Bright


More information about the Digitalmars-d mailing list