Algorithms should be free from rich types

Wed Jun 28 00:48:19 UTC 2023

On Tue, Jun 27, 2023 at 02:53:59PM -0700, Ali Çehreli via Digitalmars-d wrote:
[...]
> First, an aside: You may remember my minor complaint about 'private'
> during a DConf presentation years ago. Today, I feel even stronger
> that disallowing access to parts of software "just because" of good
> design is a mistake. I've seen multiple examples of this in
> professional life where a developer uses 'private' only because it is
> "of course" better to do so. (The Turkish word "işgüzar" and the
> German word "verschlimmbessern" describe the situation pretty well for
> me but the English language lacks such a word.)

I can't resist me a Walter quote here:

	I've been around long enough to have seen an endless parade of
	magic new techniques du jour, most of which purport to remove
	the necessity of thought about your programming problem.  In the
	end they wind up contributing one or two pieces to the
	collective wisdom, and fade away in the rearview mirror. --
	Walter Bright

When you start doing something with the code because that's what
everybody else does, or because it's what everyone else says is "the
Right Thing(tm)", then it's just cargo-culting, which inevitably leads
to problems down the road.

> To give an example from D's ecosystem, the D runtime's garbage
> collector statistics object used to be 'private'. (I think there is an
> interface for it now.) What an inconvenience it was to copy/paste that
> type's definition from the runtime to user code, get the compiled
> symbol of the object from the library, and pointer cast it to be able
> to access the members! A 'static assert' attempts to protect the
> project from changes to that type...

Thing is, things like these usually come from temporary hacks in the
code that the original coder didn't want to set in stone, but that end
up staying put because of inertia and becoming de facto set in stone.

> The idea of 'private' should be to just give the developer freedom to
> change the implementation in the future. It should not impede use
> cases that people come up with. That can be achieved practically with
> an underscore: Make everything 'public' and name your implementation
> details with an underscore.  People who need them will surely know
> they are implementation details that can change in the future but they
> will be happy: They will get things done.

IOW, empower the user instead of straitjacketing them. My favorite
programming modus operandi. Along the same lines as my philosophy of
"everything should be a library, main() is just a convenient (thin)
interface to access the library API".

[...]
> The main topic here is about the harm caused by rich types surrounding
> algorithms. Let's say I am interested in using an open source
> algorithm that works with a memory area. (Not related to D.) We all
> know that a memory area can be described by a fat pointer like D's
> slices. So, that is what the algorithm should take.
>
> Unfortunately, the poor little algorithm is not free to be used: It is
> written to work with a custom type of that library; let's call it
> MySlice, which is produced by MyMemoryMappedFile, which is produced by
> MyFile, which is initialized only by types like MyFilePath. (I may
> have gotten the relationships wrong there.)

That's a sign of poorly-factored code. The logically-separate parts of
the code are not properly separated out, causing them to be dependent on
each other where they technically should not be.  Doing this right is
actually a lot harder than it looks; it often requires significant
amounts of refactoring after your initial implementation, because until
you write the thing out in code, it isn't always clear which parts are
actually dependent and which parts can be separated.

Idioms like pipeline programming with ranges help to identify
independent pieces of the logic, and abstractions like the range API
help you actually separate out the pieces in a clean way. Without a
unifying common API like ranges, it's pretty tough to write code in
composable pieces that can be freely mixed-and-matched with each other.

	https://wiki.dlang.org/Component_programming_with_ranges

Well, obviously you already know about this article, but one of my
motivations for writing that article was precisely what you describe
above.

> But my data is already in a memory area that I own! How can I call
> that algorithm? Should I write it to a file first and then use those
> rich types to access the algorithm? That should not be necessary...
> 
> Of course I understand the benefits of all those types but the core
> algorithm should be as free as possible. So, this is simply wrong. I
> think us, software developers, have been on the wrong path. Our task
> should primarily be about getting things done first.

Over the years, I've been dreaming about the ideal situation where
there would be libraries of algorithms that are not tied to a specific
implementation (i.e., bound to concrete types and parameter values), but
are written in a form that encapsulates only its core logic.  You'd then
pull in the algorithm by specifying which concrete type(s) to bind its
various parts to, and it'd Just Work(tm).  That's the way things should
have been from the beginning.

But the situation today is far from that ideal: you have libraries that
solve some particular programming problem X, but to use the library's
solution you need to use also Y, Z, and W that the author of that
library happened to choose. For instance, the FreeType library
implements rasterization algorithms, but you can't access those
algorithms directly. You have to use the library API, which abstracts
away file handling, memory management, image type, etc.. In order to
cater to different user needs, an entire complicated API is invented to
allow the user to specify certain parameters the authors deem tweakable,
while an elaborate scheme is designed to hide the rest of the
information away. You can't effectively use the rasterization algorithm
without also using all of these other peripheral types; and when you
need to interface FreeType with another library that uses other,
different concrete types, you end up having to write lots of shunt code
whose sole purpose is to bridge between incompatible types that actually
do equivalent things.

> I could work with those types if they had virtual interfaces. But no.
> They are un-subtypable C++ 'class'es.
> 
> I think it could also work if the algorithm was templatized; but
> again, no...
[...]

In cases like this, I often get really tempted to copy-n-paste the code
and templatize it myself. :-D  Of course, in practice that's usually
impractical, so the next best thing is to use D's compile-time
introspection capabilities to autogenerate boilerplate shunt code to
work around API infelicities in the target library, and export a nicer
API on the D side. :-D  Not always possible, of course, like in your
case, where you'd have to either copy-n-paste code and do un- at safe
casts, or live with infelicities like writing stuff to a file and
opening it via the official API.

(I had to do something similar once in my day job, interfacing with a
grossly over-engineered C++ framework that nobody fully understood nor
wanted anything to do with if they could help it -- I ended up having to
write a hack where a single function call involved 7 layers of
abstraction, one of which involved writing a struct to a temporary file
on one side of an RPC call and having the other side (a daemon process)
read from the file and cast it back to the struct.  The result was the
stuff of nightmares that, to everyone's great relief, was phased out a
couple of releases later. We relished every moment of typing `\rm -rf`
on that entire old codebase after its replacement became fully
functional.)

T

-- 
2+2=4. 2*2=4. 2^2=4. Therefore, +, *, and ^ are the same operation.