Top 5

Sat Oct 11 11:58:44 PDT 2008

Benji Smith wrote:
> Sascha Katzner wrote:
>> Sergey Gromov wrote:
>>> This is the whole point.  The benchmark is valid because it performs
>>> the same *task*, and the task is somewhat close to real world.  It
>>> measures *time*, which is universal.  The compared languages use
>>> different approaches and techniques to achieve the goal, that's why
>>> benchmark is useful.  It allows to justify usefulness of these
>>> languages for a particular class of tasks.
>>
>> My point was, that it is *not* the same task both programs perform. The
>> D version has to do a lot more because it accounts for multi-byte
>> codepoints in UTF8, but the Java version doesn't account for surrogate
>> pairs. I bet if you simply scan byte-wise through the D UTF8 array for
>> whitespaces without converting them to UTF32 it would perform even
>> better, but that wouldn't be a fair comparison neither. ;-)
>>
>> It's like if you would remove all runtime security checks and 
>> exception code from a programm and benchmark it against the original 
>> version... it simply doesn't make much sense. ;-)
> 
> And my whole point was that Java's design decision to always use 
> two-byte characters is a superior choice, since performance is not an 
> issue, and since having a single character type makes the programmer's 
> life a helluva lot simpler.
> 
> The D design makes things pointlessly complex, and now you want brownie 
> points for dealing with that pointless complexity?
> 

Many C libraries work on char arrays instead of wchar_t arrays.

This can be tackled with by defaulting string literals to wstring, but 
in all ways, the string/wstring division cannot be lifted.

I agree that wstring is easier to work with if you expect non-English text.

> And, btw, you *can't* scan bytewise through a D string to find space 
> characters, because the value '32' can occur as the 
> least-significant-byte in a multi-byte non-whitespace character. Any 
> code that iterates bytewise through a char[] array is fundamentally broken.
> 
> But D's strings *look* like they can be iterated byte-by-byte, because 
> they're arrays. And all other kinds of arrays in D can be iterated that 
> way. You can't retrieve a long value from an int array, because it 
> doesn't make sense. And it doesn't make sense to foreach through a 
> collection of dchars in a char[] array.
> 
> The purpose of this benchmark is not to show Java's speed advantage 
> (because my primary concern with string processing is not speed). The 
> purpose was to show that the speed justifications for D's wonky design 
> are not valid.
> 
> D strings are a trainwreck not because of a few milliseconds of 
> execution time. They're a trainwreck because they break the rules of the 
> language.
> 
> --benji