Are we getting better at designing programming languages?
JS
js.mdnq at gmail.com
Sat Jul 27 08:07:22 PDT 2013
On Friday, 26 July 2013 at 23:19:45 UTC, H. S. Teoh wrote:
> On Fri, Jul 26, 2013 at 03:02:32PM +0200, JS wrote:
>> I think the next step in languages it the mutli-level
>> abstraction.
>> Right now we have the base level core programming and the
>> preprocessing/template/generic level above that. There is no
>> reason
>> why language can't/shouldn't keep going. The ability to
>> control and
>> help the compiler do it's job better is the next frontier.
>>
>> Analogous to how C++ allowed for abstraction of data, template
>> allow
>> for abstraction of functionality, we then need to abstract
>> "templates"(or rather meta programming).
>
> There is much value to be had for working with the minimum
> possible
> subset of features that can achieve what you want with a
> minimum of
> hassle. The problem with going too far with abstraction is that
> you
> start living in an imaginary idealistic dreamworld that has
> nothing to
> do with how the hardware actually implements the stuff. You
> start
> writing some idealistic code and then wonder why it doesn't
> work, or why
> performance is so poor. As Knuth once said:
>
> By understanding a machine-oriented language, the programmer
> will tend to use a much more efficient method; it is much
> closer
> to reality. -- D. Knuth
>
> People who are more than casually interested in computers
> should
> have at least some idea of what the underlying hardware is
> like.
> Otherwise the programs they write will be pretty weird. -- D.
> Knuth
>
> If I ever had the chance to teach programming, my first course
> would be
> assembly language programming, followed by C, then by other
> languages,
> starting with a functional language (Lisp, Haskell, or
> Concurrent
> Clean), then other imperative languages, like Java. (Then
> finally I'll
> teach them D and tell them to forget about the others. :-P)
>
> Will I expect my students to write large applications in
> assembly
> language or C? Nope. But will I require them to pass the final
> exam in
> assembly language? YOU BETCHA. I had the benefit of learning
> assembly
> while I was still a teenager, and I can't tell you how much that
> experience has shaped my programming skills. Even though I
> haven't
> written a single line of assembly for at least 10 years,
> understanding
> what the machine ultimately runs gave me deep insights into why
> certain
> things are done in a certain way, and how to take advantage of
> that. It
> helps you build *useful* abstractions that map well to the
> underlying
> machine code, which therefore gives you good performance and
> overall
> behaviour while providing ease of use.
>
I used to program in assembly and loved it. The problem was that
one could not write large programs. Not because it is impossible
in assembly but because there was no(or little) ways to abstract.
Large programs MUST be able to be broken into manageable pieces.
OOP is what allows the large programs of our times... Irrelevant
if it was done in assembly or not.
E.g., it is not impossible to think of an assembly like
language(low level) that has many high level concepts(classes,
templates, etc...) and a compiler that has many safety
features(type checking, code analysis, bounds checking, etc...)...
But what you end up with then is probably something similar to D
with every function's body written in asm.
> By contrast, my encounters with people who grew up with Java or
> Pascal
> consistently showed that most of them haven't the slightest
> clue how the
> machine even works, and as a result, just like Knuth said, they
> tend to
> have some pretty weird ideas about how to write their programs.
> They
> tend to build messy abstractions or idealistic abstractions
> that don't
> map well to the underlying hardware, and as a result, their
> programs are
> often bloated, needlessly complex, and run poorly.
>
>
> [...]
>> For example, why are there built in types?
>
> You need to learn assembly language to understand the answer to
> that
> one. ;-)
>
I've spend several years in assembly when I was young... But you
need to go a step further. Electronics deals with only 1's and
0's... not nibbles, bytes, words, dwords, qwords, etc.... These
groups only help people, not computers.
Surely we generally have optimal performance by using types that
are multiples of the bus size, but that is irrelevant. Sure many
cpu's have some idea of the basic types but this is only because
it is in hardware and they can't predict the type you want to
create. For the most part it is irrelevant because all complex
types are built from fundamental types.
BUT we are talking about compilers and not cpu's. Compilers are
software and can be written to "know" future types(by
modifying/informing the compiler).
Everything that can be done in any HLL can be done with 1's and
0's in a hex editor... in fact, it must be so or you end up with
a program that can't run.
So this alone proves that a HLL is only for abstraction to make
life easier(essentially to mimic human thinking the best it can).
The problem that I've always run across is that all compilers are
rather limited in some way that makes life harder, not easier.
Sometimes this is bugs, other times it is lack of as simple
feature that would make thinks much easier to deal with.
D goes a long way in the feature set but seems to have a lot more
bugs than normal and has a few down sides. D is what got me back
into programming. I went into C# for a while and really like the
language(I find it very cohesive and well thought out) but
unfortunately do not want to be restricted to .NET(a great
library and well put together too).
>> There is no inherit reason this is so except this allows
>> compilers to
>> achieve certain performance results...
>
> Nah... performance isn't the *only* reason. But like I said,
> you need to
> understand the foundations (i.e., assembly language) before you
> can
> understand why, to use a physical analogy, you can't just
> freely move
> load-bearing walls around.
Again, at the lowest level cpu's work on bits, nothing else. Even
most cpu's are abstracted for performance reasons, but it doesn't
change that fact.
By bits I do not mean a 1-bit computer but simply a computer that
works on a bit stream with no fixed size "word". Think of a
turing machine.
>
>> but having a higher level of abstraction of meta programming
>> should
>> allow us to bridge the internals of the compiler more
>> effectively.
>
> Andrei mentions several times in TDPL that we programmers don't
> like
> artificial distinctions between built-in types and user-defined
> types,
> and I agree with that sentiment. Fortunately, things like alias
> this and
> opCast allow us to define user-defined types that, for all
> practical
> purposes, behave as though they were built-in types. This is a
> good
> thing, and we should push it to the logical conclusion: to allow
> user-defined types to be optimized in analogous ways to
> built-in types.
> That is something I've always found lacking in the languages I
> know, and
> something I'd love to explore, but given that we're trying to
> stabilize
> D2 right now, it isn't gonna happen in the near future.
>
> Maybe if we ever get to D3...
>
I think those are cool features, and it's the kinda stuff that
draws me to the language. Stuff that makes life easier rather
than harder. But all these concepts are simply "configuring" the
compiler to do things that traditionally they didn't do.
Alias this is telling the compiler to treat a class as a specific
type or to replace it's usage with a function. Where did this
come from? C/C++ doesn't have it! Why? Because it either wasn't
thought of or wasn't thought of as useful. Those kinds of things
hold compilers back. If someone will use it then it is useful. I
understand that in reality there are limitations but when someone
makes the decision "No one will need this" then every ultimately
suffers. It's a very egocentric decision that assumes that the
person knows everything. Luckily Walter seems to have had the
foresight to avoid making such decisions.
> Nevertheless, having said all that, if you truly want to make
> the
> machine dance, you gotta sing to its tune. In the old days, the
> saying
> was that premature optimization is the root of all evils. These
> days,
> I'd like to say, premature *generalization* is the root of all
> evils.
>
Sure. Ultimately someone designed the cpu in a certain way and
for you to take advantage of all it's potential you have to work
within the limitations/constraints they set up(which, sometimes,
is not known fully). Also, it is useless, from a practical
matter, to create a program in a language that can never be ran
but not theoretically useless.
It ultimately depends on the goals... I imagine when someone
wants to create the best they can, then sometimes it's very easy
to go overboard... sometimes the tools are simply not available
to reach the goal. But such attitudes is what pushes the
boundaries and generally pays off in the long run... without them
we would at the most still be using punch cards.
> I've seen software that suffered from premature
> generalization... It was
> a system that was essentially intended to be a nice interface
> to a
> database, with some periodic background monitoring functions.
> The
> person(s) who designed it decided to build this awesome generic
> framework with all sorts of fancy features. For example, users
> don't
> have to understand what SQL stands for, yet they can formulate
> complex
> queries by means of nicely-abstracted OO interfaces. Hey, OO is
> all the
> rage these days, so what can be better than to wrap SQL in OO
> in such a
> way that the user wouldn't even know it's SQL underneath? I
> mean, what
> if we wanted to switch to, oh, Berkeley DB one of these days?!
> But
> abstracting a database isn't good enough. There's also this
> incredible
> generic framework that handles timers and events, such that you
> don't
> have to understand what an event loop is and you can write
> event-driven
> code, just like that. Oh, and to run all of these complicated
> fancy
> features, we have to put it inside its own standalone daemon,
> so that if
> it crashes, we can use another super-powerful generic framework
> to
> handle crashes and automatically restart so that the user
> doesn't even
> have to know the database engine is crashing underneath him;
> the daemon
> will pick up the query and continue running it after it
> restarts! Isn't
> that cool? But of course, since it runs as a separate daemon,
> we have to
> use IPC to interface it with user code. It all makes total
> sense!
>
>
> ...
>
> After about 3 years worth of this, the system has become a giant
> behemoth, awesome (and awful) to behold, slumbering onwards in
> unyielding persistence, soaking up all RAM everywhere it can
> find any,
> and peaking at 99% CPU when you're not looking (gotta keep
> those savvy
> customers who know how to use 'top' happy, y'know?). The old OO
> abstraction layers for the database are mostly no longer used,
> nowadays
> we're just writing straight SQL anyway, but some core code
> still uses
> them, so we daren't delete them just yet. The resource
> acquisition code
> has mutated under the CPU's electromagnetic radiation, and has
> acquired
> 5 or 6 different ways of acquiring mutex locks, each written by
> different people who couldn't figure out how to use the previous
> person's code. None of these locks could be used simultaneously
> with
> each other, for they interact in mysterious, and often
> disastrous, ways.
> Adding more features to the daemon is a road through a
> minefield filled
> with the remains of less savvy C++ veterans.
>
> Then one day, I was called upon to implement something that
> required
> making an IPC call to this dying but stubbornly still-surviving
> daemon.
> Problem #1: the calling code was part of a C library that, due
> to the
> bloatedness of the superdooper generic framework, is completely
> isolated
> from it. Problem #2: as a result, I was not allowed to link the
> C++ IPC
> wrapper library to it, because that would pull in 8000+ IPC
> wrapper
> functions from that horrific auto-generated header file, which
> in turn
> requires linking in all the C++-based framework libraries,
> which in turn
> pulls in yet more subsidiary supporting libraries, which if you
> add it
> all up, adds about 600MB to the C library size. Which is Not
> Acceptable(tm). So what to do? Well, first write a separate
> library to
> handle interfacing with the 1 or 2 IPC calls that I can't do
> without, to
> keep the nasty ugly hacks in one place. Next, in this library,
> since we
> can't talk to the C++ part directly, write out function
> arguments using
> fwrite() into a temporary file, then fork() and exec() a C++
> wrapper
> executable that *can* link with the C++ IPC code. This wrapper
> executable then reads the temporary file and unpacks the
> function
> arguments, then hands them over to the IPC code that repacks
> them in the
> different format understood by the daemon, then sends it off.
> Inside the
> daemon, write more code to recognize this special request,
> unpack its
> arguments once again, then do some setup work (y'know, acquire
> those
> nasty mutexes, create some OO abstraction objects, the works),
> then
> actually call the real function that does the work. But we're
> not done;
> that function must return some results, so after carefully
> cleaning up
> after ourselves (making sure that the "RAII" objects are
> destructed in
> the right order to prevent nasty problems like deadlocks or
> double-free()'s), we repackage the function's return value and
> send it
> back over the IPC link. On the other end, the IPC library
> decodes that
> and returns it to the wrapper executable, which now must
> fwrite() it
> into another temporary file, and then exit with a specific exit
> code so
> that the C library that fork-and-exec'd it will know to look
> for the
> function results in the temporary file, so that it can read
> them back
> in, unpack them, then return to the original caller. This nasty
> piece of
> work was done EVERY SINGLE TIME AN IPC FUNCTION WAS CALLED.
>
> What's that you say? Performance is poor? Well, that's just
> because you
> need to upgrade to our new, latest-release, shiny hardware!
> We'll double
> the amount of RAM and the speed of the CPU -- we'll throw in an
> extra
> core or two, too -- and you'll be up and running in no time!
> Meanwhile,
> back in the R&D department (nicely insulated from customer
> support), I
> say to myself, gee I wonder why performance is so bad...
>
> After years of continual horrendous problems, nasty deadlock
> bugs,
> hair-pulling sessions, bugfixes that introduced yet more bugs
> because
> the whole thing has become a tower of cards, the PTBs finally
> was
> convinced that we needed to do something about it. Long story
> short, we
> trashed the ENTIRE C++ generic framework, and went back to using
> straight int's and char's and good ole single-threaded C code,
> with no
> IPCs or mutex RAII objects or 5-layer DB abstractions -- the
> result was
> a system at the most 20% of the size of the original, ran 5
> times
> faster, and was more flexible in handling DB queries than the
> previous
> system ever could.
>
> These days, whenever I hear the phrase "generic framework",
> esp. if it
> has "OO" in it, I roll my eyes and go home and happily work on
> my D code
> that deals directly with int's and char's. :)
>
> That's not to say that abstractions are worthless. On the
> contrary,
> having the *right* abstractions can be extremely powerful --
> things like
> D ranges, for example, literally revolutionized the way I write
> iterative code. The *wrong* abstractions, OTOH... let's just
> say it's on
> the path toward that proverbial minefield littered with the
> remains of
> less-than-savvy programmer-wannabes. What constitutes a *good*
> abstraction, though, while easy to define in general terms, is
> rather
> elusive in practice. It takes a lot of skill and experience to
> be able
> to come up with useful abstractions. Unfortunately, it's all
> too easy to
> come up with idealistic abstractions that actually detract,
> rather than
> add -- and people reinvent them all the time. The good thing is
> that
> usually other people will fail to see any value in them ('cos
> there is
> none) so they get quickly forgotten, like they should be. The
> bad thing
> is that they keep coming back through people who don't know
> better.
>
I agree, it is very hard, and I think that is why the compiler
must make such things easier. I think the problems tend to be
more that compilers get in the way or are difficult to implement
the abstracts and end up causing problems down the road due to
hacks and workarounds rather than the other way around. While it
is true that it is difficult to abstract things and take into
account unforeseen events, a properly abstracted system should be
able to be general enough to deal with them... when it's not,
then I'm sure it is much more difficult to rectify than a
concrete implementation.
Abstraction is difficult, requires a good memory, and the
intelligence to deal with the complexity involved... but the
rewards are well worth it. We, as a civilization, can't get
better at it without workings at it. You can't expect everyone to
get it right all the time.
Most people are idiots, simple as that. You can't expect most
people to comprehend complex systems, or even have the desire to
do so if they are capable. Most people want to blindly apply a
familiar pattern that has worked before they don't know any
better. This is not necessarily bad except when the pattern isn't
the right one.
Even the real intelligent people that are capable of dealing with
the complexity can only do so for so long. A human brain, no
matter how good, can't deal with exponential factors... at some
point it will become too much to handle.
I'm one of those believers that at some point you have to scrap
the broken way and start afresh, learned from your mistakes to
make something better. This is what you guys did when
implementing a simpler system... as what you learned was simple
was better.
One of the great things though, is that breaking complexity into
simpler parts always arrives at a set of simple enough pieces
that can be dealt with. The problem is that someone still has to
understand all the complexity more or less.
> Again I come back to Knuth's insightful quote -- to truly build
> a useful
> abstraction, you have to understand what it translates to. You
> have to
> understand how the machine works, and how all lower layers of
> abstraction works, before you can build something new *and*
> useful.
> That's why I said that in order to make the machine dance, you
> must sing
> its tune. You can't just pull an abstraction out of thin air
> and expect
> that it will all somehow work out in the end. Before we master
> the new
> abstractions introduced by D -- like ranges -- we're not really
> in the
> position to discover better abstractions that improve upon them.
>
I think we agree, basically what I was getting at above.
>> I don't see anything like this happening so depending on your
>> scale, I
>> don't think we are getting better, but just chasing our
>> tails... how
>> many more languages do we need that just change the syntax of
>> C++? Why
>> do people think syntax matters? Semantics is what is important
>> but
>> there seems to be little focus on it. Of course, we must
>> express
>> semantics through syntax so for practical purposes it mattes
>> to some
>> degree.... But not nearly as much as the number of programming
>> languages suggest.
>
> Actually, I would say that in terms of semantics, it all
> ultimately maps
> to Turing machines anyway, so ultimately it's all the same. You
> can
> write anything in assembly language. (Or, to quote Larry Wall's
> tongue-in-cheek way of putting it: you can write assembly code
> in any
> language. :-P) That's already been known and done.
>
Yes! So what is different is only what the language itself has to
offer to make life easier to abstract... if we didn't want
abstraction we would just write in 0's and 1's... or if we had
the memory and intelligence, it would be the easiest way(instead
of waiting for multiple alias this's ;))
> What matters is, what kind of abstractions can we build on top
> of it,
> that allows us maximum expressivity and usability? The seeming
> simplicity of Turing machines (or assembly language) belies the
> astounding computational power hidden behind the apparent
> simplicity.
> The goal of programming language design is to discover ways of
> spanning
> this range of computational power in a way that's easy to
> understand,
> easy to use, and efficient to run in hardware. Syntax is part
> of the
> "easy to use" equation, a small part, but no less important
> (ever tried
> writing code in lambda calculus?).
>
> The harder part is the balancing act between expressiveness and
> implementability (i.e. efficiency, or more generally, how to
> make your
> computation run in a reasonable amount of time with a
> reasonable amount
> of space -- a program that can solve all your problems is
> useless if it
> will take 10 billion years to produce the answer; so is a
> program that
> requires more memory than you can afford to buy). That's where
> the
> abstractions come in -- what kind of abstractions will you
> have, and how
> well do they map to the underlying machine? It's all too easy
> to think
> in terms of the former, and neglect the latter -- you end up
> with
> something that works perfectly in theory but requires
> unreasonable
> amounts of time/memory in practice or is just a plain mess when
> mapped
> to actual hardware.
>
>
One of the most useful aspects of programming, and what makes it
so powerful, is the ability to leverage what others have done.
Unfortunately what happens is people write the same old stuff
other people have written. This problem has gotten better over
the last few years with the internet making it easier but still
is an issue.
Just think of all the man hours wasted due to people writing the
same code, debugging code because of bugs, or giving up because
they didn't have the code they needed(but it existed). Think
things were optimal from the get go.... how much further we'd be
along.
I don't mind people making mistakes in the theoretical vs
practical tug of war... because I think the theoretical is what
pushes boundaries and the practical is with strengthens them.
Much of recent mathematics is purely theoretical, which in turn
has fueled practical things... mathematics started out purely
practical. Maybe this ebb and flow is a natural thing in life.
But I believe just because sometime is practical doesn't mean it
is best. For example, suppose you are able to design a
programming language that somehow is provably better than all
other languages combined. The problem is, it requires a new CPU
design that is difficult and expensive to create. Should the
language not be used because it is not practical? Of course not.
Luckily all of humanity does not have to stop while such things
are being built.
We need people pushing the boundaries to keep us from spinning
our wheels and we need people keeping the wheels turning.
More information about the Digitalmars-d
mailing list