Found on proggit: Krug, a new experimental programming language, compiler written in D

Tue May 1 17:13:13 UTC 2018

On Friday, 27 April 2018 at 00:03:34 UTC, H. S. Teoh wrote:
> On Thu, Apr 26, 2018 at 07:14:17PM -0400, Nick Sabalausky 
> (Abscissa) via Digitalmars-d wrote:
>> On 04/26/2018 06:47 PM, H. S. Teoh wrote:
>> > 
>> > If "less is more" were universally true, we'd be programming 
>> > in BF instead of D.  :-O  (Since, after all, it's 
>> > Turing-complete, which is all anybody really needs. :-P)
>> > 
>> 
>> Yea. Speaking of which, I wish more CS students were taught 
>> the the inherent limitations of "Turing-complete" vs (for 
>> example) "Big-O". There's faaaar too many people being taught 
>> "Turing-complete means it can do anything" which, of course, 
>> is complete and total bunk in more (important) ways than one.
>
> Actually, Turing-complete *does* mean it can do anything... 
> well, anything that can be done by a machine, that is.  There 
> are inherently unsolvable problems that no amount of 
> Turing-completeness will help you with.
>
> The problem, however, lies in how *practical* a particular form 
> of Turing-completeness is.  You wouldn't want to write a GUI 
> app with lambda calculus, for example, even though in theory 
> you *could*. (It would probably take several lifetimes and an 
> eternity of therapy afterwards, but hey, we're talking theory 
> here. :-P)  Just like you *could* write basically anything in 
> machine language, but it's simply not practical in this day and 
> age.
>
>
> And actually, speaking of Big-O, one thing that bugs me all the 
> time is that the constant factor in front of the Big-O term is 
> rarely considered.  When it comes to performance, the constant 
> factor *does* matter.  You can have O(log n) for your 
> algorithm, but if the constant in front is 1e+12, then my 
> O(n^2) algorithm with a small constant in front will still beat 
> yours by a mile for small to medium sized use cases.  The O(log 
> n) won't mean squat unless the size of the problem you're 
> solving is commensurate with the constant factor in the front. 
> You can sort 5 integers with an on-disk B-tree and rest assured 
> that you have a "superior" algorithm, but my in-memory bubble 
> sort will still beat yours any day.  The size of the problem 
> matters.
>
> Not to mention, constant-time setup costs that's even more 
> frequently
> disregarded when it comes to algorithm analysis.  You may have a
> O(n^1.9) algorithm that's supposedly superior to my O(n^2) 
> algorithm,
> but if it takes 2 days to set up the data structures required 
> to run
> your O(n^1.9) algorithm, then my O(n^2) algorithm is still 
> superior
> (until the problem size becomes large enough it will take more 
> than 2
> days to compute).  And if your O(n^1.9) algorithm has a setup 
> time of 10
> years, then it might as well be O(2^n) for all I care, it's 
> pretty much
> useless in practice, theoretical superiority be damned.
>
> And that's not even beginning to consider practical factors 
> like the hardware you're running on, and why the 
> theoretically-superior O(1) hash is in practice inferior to 
> supposedly inferior algorithms that nevertheless run faster 
> because they are cache-coherent, whereas hashing essentially 
> throws caching out the window.  Thankfully, recently there's 
> been a slew of papers on cache-aware and cache-oblivious 
> algorithms that are reflect reality closer than ivory-tower 
> Big-O analyses that disregard reality.
>
>
>> I see the same thing in other areas of CS, too, like parser 
>> theory. The formal CS material makes it sound as if LR parsing 
>> is more or less every bit as powerful as LL (and they often 
>> straight-up say so in no uncertain terms), but then they all 
>> gloss over the fact that: That's ONLY true for "detecting 
>> whether an input does or doesn't match the grammar", which is 
>> probably the single most UNIMPORTANT characteristic to 
>> consider when ACTUALLY PARSING.  Outside of the worthless 
>> "does X input satisfy Y grammar: yes or no" bubble, LL-family 
>> is vastly more powerful than LR-family, but you'd never know 
>> it going by CS texts (and certainly not from those 
>> legendary-yet-overrated Dragon texts).
>
> Well, LR parsing is useful for writing compilers that tell you 
> "congratulations, you have successfully written a program 
> without syntax errors!".  What's that?  Where's the executable?
>  Sorry, I don't know what that word means.  And what?  Which 
> line did the syntax error occur in?  Who knows!  That's your 
> problem, my job is just to approve or reject the program in its 
> entirety! :-P
>
> (And don't get me started on computability theory courses where 
> the sole purpose is to explore the structure of the hierarchy 
> of unsolvable problems.  I mean, OK, it's kinda useful to know 
> when something is unsolvable (i.e., when not to waste your time 
> trying to do something that's impossible), but seriously, what 
> is even the point of the tons of research that has gone into 
> discerning entire *hierarchies* of unsolvability?!  I recall 
> taking a course where the entire term consisted of proving 
> things about the length of proofs. No, we weren't actually 
> *writing* proofs. We were proving things *about* proofs, and I 
> might add, the majority of the "proofs" we worked with were of 
> infinite length.  I'm sure that it will prove (ha!) to be 
> extremely important in the next iPhone release, which will also 
> cure world hunger and solve world peace, but I'm a bit fuzzy on 
> the details, so don't quote me on that.  :-P)
>
>
> T

The point of O is for the most dominant rate of growth(asymptotic 
behavior). In mathematics, one only cares about n as it 
approaches infinity and so any constant term will eventually be 
dwarfed. So technically, in the theory, it is "incorrect" to have 
any extraneous terms.

In CS, it is used to approximate the time for large n to be able 
to compare different algorithms in the "long run". Since 
computers cannot process infinite n, n will be finite and 
generally relatively small(e.g., less than 10^100, which is quite 
small compared to infinity).

So the failure you are pointing out is really because the 
application. In some cases the constant term may be applicable 
and same cases it isn't. Since it depends on n and we cannot use 
the simplification that n is infinity, it matters what n is. This 
is why it is also important to know which algorithms do better 
for small n because if n is small during program use one might be 
using the wrong algorithm.

But once one goes down this road one then has to not count ideal 
cycles but real cycles and include all the other factors 
involved. Big O was not mean to give real world estimates because 
it is a mathematical domain. It may or may not work well 
depending on how poorly it is used, sorta like statistics. 
Generally though, they are such a great simplification tool that 
it works for many processes that are well behaved.

Ideally it would be better to count the exact number of cycles 
used by an algorithm and have them normalized to some standard 
cycle that could be compared across different architectures. 
Machines can do the accounting easily. Even then, many anomalous 
behavior will generally be seen but it would be more accurate, 
although possibly not much more informative, than Big O.