Top 5
KennyTM~
kennytm at gmail.com
Sat Oct 11 11:58:44 PDT 2008
Benji Smith wrote:
> Sascha Katzner wrote:
>> Sergey Gromov wrote:
>>> This is the whole point. The benchmark is valid because it performs
>>> the same *task*, and the task is somewhat close to real world. It
>>> measures *time*, which is universal. The compared languages use
>>> different approaches and techniques to achieve the goal, that's why
>>> benchmark is useful. It allows to justify usefulness of these
>>> languages for a particular class of tasks.
>>
>> My point was, that it is *not* the same task both programs perform. The
>> D version has to do a lot more because it accounts for multi-byte
>> codepoints in UTF8, but the Java version doesn't account for surrogate
>> pairs. I bet if you simply scan byte-wise through the D UTF8 array for
>> whitespaces without converting them to UTF32 it would perform even
>> better, but that wouldn't be a fair comparison neither. ;-)
>>
>> It's like if you would remove all runtime security checks and
>> exception code from a programm and benchmark it against the original
>> version... it simply doesn't make much sense. ;-)
>
> And my whole point was that Java's design decision to always use
> two-byte characters is a superior choice, since performance is not an
> issue, and since having a single character type makes the programmer's
> life a helluva lot simpler.
>
> The D design makes things pointlessly complex, and now you want brownie
> points for dealing with that pointless complexity?
>
Many C libraries work on char arrays instead of wchar_t arrays.
This can be tackled with by defaulting string literals to wstring, but
in all ways, the string/wstring division cannot be lifted.
I agree that wstring is easier to work with if you expect non-English text.
> And, btw, you *can't* scan bytewise through a D string to find space
> characters, because the value '32' can occur as the
> least-significant-byte in a multi-byte non-whitespace character. Any
> code that iterates bytewise through a char[] array is fundamentally broken.
>
> But D's strings *look* like they can be iterated byte-by-byte, because
> they're arrays. And all other kinds of arrays in D can be iterated that
> way. You can't retrieve a long value from an int array, because it
> doesn't make sense. And it doesn't make sense to foreach through a
> collection of dchars in a char[] array.
>
> The purpose of this benchmark is not to show Java's speed advantage
> (because my primary concern with string processing is not speed). The
> purpose was to show that the speed justifications for D's wonky design
> are not valid.
>
> D strings are a trainwreck not because of a few milliseconds of
> execution time. They're a trainwreck because they break the rules of the
> language.
>
> --benji
More information about the Digitalmars-d
mailing list