Inherent code performance advantages of D over C?

Sat Dec 14 09:39:51 PST 2013

On Fri, Dec 13, 2013 at 06:01:10PM -0800, Walter Bright wrote:
> On 12/13/2013 6:52 AM, Dicebot wrote:
[...]
> >>3. unicode
> >
> >Number one my list of advantages. Does not apply to plenty of
> >projects though.
> 
> It applies more than you might think. Most D apps will be inherently
> unicode correct. Very, very few C programs are unless the programmer
> went to some effort to make it so. And, take a look at all those
> miserable UNICODE macros in windows.h.

Yeah, in C, you have to proactively write code to be unicode-compatible.
And I daresay a very large percentage of C code are *not*, and it's a
major effort to make them compatible.

> >>4. wchar_t that is actually usable
> >
> >Same as (3)
> 
> Almost nobody uses wchar_t in C code because it is unusable. Windows
> uses it for the "W" api functions, but surrogate pairs are broken in
> just about every C program, because C has no idea what a surrogate
> pair is. Furthermore, you're just fscked if you try to port wchar_t
> code from Windows to Linux, you're looking at line-by-line rewrite of
> all of that code.

I tried writing wchar_t code before. I tried making it portable by
following only the wchar_t functions described in official C standards.
I discovered that the C standards are incomplete w.r.t. wchar_t: there
are many unspecified and underspecified areas, such as, to take a major
example, the non-commitment to providing some means to ensure a Unicode
locale. In every official doc that I can find, it depends on setting
locale strings, the interpretation of which is "implementation-
dependent" (i.e., you're on your own). There isn't even a way to
reliably check whether you're currently in a Unicode locale (y'know,
when you give up on trying to set the locale string yourself and
(questionably) rely on the user to do it, but your code assumes UTF-8
and you need a way to detect an incompatible locale setting). And the
semantics of many wchar_t functions are vague and underspecified
("depends on locale setting"), and some key functions are missing, or
wrapped behind very inconvenient APIs (*ahem*mbtowcs*wcstomb*cough*).

Long story short, let's just say that even writing wchar_t code from
scratch is a royal pain in the neck, *and* there's no guarantee the end
product will actually work correctly. Unless you reinvent the wheel,
disregard wchar_t, and rewrite your own UTF-8 implementation. Don't even
speak of converting an existing C program to wchar_t.

I was so scarred from the experience that when I saw that D supported
unicode natively, I was totally sold.

[...]
> >>6. no global errno being set by the math library functions
> >This has made me smile :) It shows how different applications we have
> >in mind speaking about "C domain".
> 
> Do you mean people don't do math in C apps?

Weird. Most of my personal projects (originally C/C++, now D) are
math-related. :)

[...]
> >>12. forward referencing (no need to declare everything twice)
> >Not an issue. C programmers are not tired from typing.
> 
> C programs tend to be written "bottom up" to avoid forward
> references. This is not convenient.

I still do that even in D programs, because DMD's handling of forward
references is, shall we say, quirky? It works most of the time, but
sometimes you get odd errors because certain symbol resolution
algorithms used by dmd will produce unexpected results if you don't
declare certain symbols beforehand. So it's not completely order-free,
but also not completely order-dependent, but something nebulous in
between. Me, I play it safe and just write things the C way, so that I
never run into these kinds of issues.

[...]
> >>20. no global locale madness
> >(no idea what this means)
> 
> strtod's behavior (for example) is dependent on the current locale.
> The fact that you didn't know this is indicative of its problems.
> This is not the only locale-dependent behavior. C has a number of
> issues with global state.

Yeah, like errno, one of the ugliest hacks to be made an official
standard.

And the entire wchar_t train-wreck, every bit of which is officially
declared "locale-dependent", meaning they change their behaviour
depending on the locale string you set, and of course the locale strings
themselves are "implementation-dependent", so there's basically zero
commitment to make portable code possible at all. Sure, to make your
program truly portable you do have to invest some effort into it, but
given the amount of ugliness you have to endure to work with wchar_t in
the first place, you might as well just reinvent your own UTF
implementation from scratch (the API would be cleaner, for one thing).

And that's not even scratching the surface of things like strtod, like
Walter mentioned, that almost everyone *assumes* works a certain way,
but may have unexpected results once you insert a setlocale() call into
your program. Action-at-a-distance FTW.

T

-- 
Do not reason with the unreasonable; you lose by definition.