Algorithms should be free from rich types

Fri Jun 30 16:33:31 UTC 2023

On Fri, Jun 30, 2023 at 02:41:00PM +0000, bachmeier via Digitalmars-d wrote:
> On Friday, 30 June 2023 at 11:07:33 UTC, Atila Neves wrote:
> 
> > API design is indeed hard. Which makes it all the more imperative to
> > not accidentally design one with implementation details that users
> > downstream start depending on. That is: API design needs to be a
> > conscious opt-in decision and not "I guess I didn't think about the
> > consequences of leaving the door to my flat open all the time and
> > now there are people camping in my living room".
> 
> Private is more like locking everyone else's doors for their own
> safety. In the cases that it keeps an intruder out, it was helpful to
> them. When grandma had to sleep on the sidewalk, not so much. Many
> times library authors have prevented me from doing my work because of
> arbitrarily preventing access to implementation details. I should have
> the option to override those decisions. If something blows up, or if
> my code gets broken in the future, it's my fault, because I was the
> one that made that decision.

The thing is, both of the above are true.

Private does have its uses: to hide implementation details from
unrelated parts of the code so that, especially in a large project with
many contributors, you don't end up with accidental dependencies between
parts of the code that really shouldn't depend on each other. Hairball
dependencies among unrelated modules is a major factor of
unmaintainability in large projects, and preventing this goes a long way
to reduce long-term maintenance costs.

The other side to this, however, is that deciding what should be private
and what shouldn't is a hard problem, and most people either can't
figure it out, or can't be bothered to put in the effort to get it
right, so they slap private on everything, making it hard to reuse their
code outside of the narrow confines of how they initially envisioned it.
So you end up with an API that covers the most common use cases but not
others, which causes a lot of frustration when downstream code wants to
do something but can't via the API, so they have to resort to copy-pasta
or breaking private. (See: API design is hard.)

Most people design APIs around how they envision the module would be (or
ought to be) used, at a relatively high level of abstraction, without
regard to the core algorithms that would be used to implement this. What
we may call a "use-centric API".  Contrary to popular belief, this is
actually a mistake.  It frequently leads to the situation where a useful
algorithm that might benefit other parts of the code gets locked behind
the private implementation of the module, because it doesn't directly
map to the external API. This in turn promotes code duplication: if my
module also needs some variant of the same algorithm, I have to
copy-n-paste it or re-implement it from scratch in my own module --
usually also behind `private`, so the next person that comes along will
need to do it again. It actually *reduces* code reuse. It also fosters
the desire to break private: I realize that the algorithm is already
implemented, so I wish I could break private in order to avoid rewriting
it myself.

A better approach is an algorithm-centric API design: in the course of
implementing a module (or library), identify the core algorithms that
solve the main problems that the module/library is trying to solve, and
design the API around exposing this algorithm to user code.  Then on top
of that, add some syntactic sugar that maps this to the high-level usage
of the algorithm (the use-centric API). There may still be private parts
(internal details of the algorithms that the user really doesn't need to
know), but these are confined to things that outside code truly doesn't
need to know, not a blanket default that may unintentionally exclude
certain unusual, but valid, use cases.

There is an important philosophical difference between these two
approaches. The first approach tends towards the philosophy of "you have
problem X, no problem, hand it over to us (the library), we'll perform
the magic to solve it, and we'll give you back the result Y". The method
of solution is opaque and hidden from user code. IOW, the hood is welded
shut; your only recourse in case of problems is to take it back to the
dealer (the library author). The second approach has the philosophy "you
have problem X, we (the library) will give you tools A, B, C, that you
can use to solve problem X. In addition, we provide you special combo D
(syntactic sugar functions) that will solve X the usual way without you
having to figure out how to combine A, B, and C in the right way." The
hood is open and you may fiddle with the things inside if you know what
you're doing. But most of the time you won't need to -- the syntactic
sugar functions handle the most common use cases for you.

The first approach empowers the library writer, the second approach
empowers the user.  My argument is that the second approach is superior.
No abstraction is perfect (otherwise it wouldn't be an abstraction!);
there will always be cases where you need to go under the hood and do
something the library author didn't envision initially. Give him the
tools to do so without breaking encapsulation, instead of forcing him to
come back to you for help.

T

-- 
Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine