Top 5

Benji Smith dlanguage at benjismith.net
Sat Oct 11 11:29:51 PDT 2008


Sascha Katzner wrote:
> Benji Smith wrote:
>> Actually, when it comes to string processing, D is decidedly *not* a 
>> "performance language".
>>
>> Compared to...say...Java (which gets a bum rap around here for being 
>> slow), D is nothing special when it comes to string processing speed.
>>
>> I've attached a couple of benchmarks, implemented in both Java and D 
>> (the "shakespeare.txt" file I'm benchmarking against is from the 
>> Gutenburg project. It's about 5 MB, and you can grab it from here: 
>> http://www.gutenberg.org/dirs/etext94/shaks12.txt )
>>
>> In some of those benchmarks, D is slightly faster. In some of them, 
>> Java is a lot faster. Overall, on my machine, the D code runs in about 
>> 12.5 seconds, and the Java code runs in about 2.5 seconds.
>>
>> Keep in mind, all java characters are two-bytes wide. And you can't 
>> access a character directly. You have to retrieve it from the String 
>> object, using the charAt() method. And splitting a string creates a 
>> new object for every fragment.
>>
>> I admire the goal in D to be a performance language, but it drives me 
>> crazy when people use performance as justification for an inferior 
>> design, when other languages that use the superior design also 
>> accomplish superior performance.
> 
> I think your benchmark is not very meaningful. Without going into 
> implementation details of Tango (because I don't use Tango) here are 
> some notes:
> 
> - The D version uses UTF8 strings whereas the Java version uses 
> "wanna-be" UTF16 (Java has a lot of problems with surrogates). This 
> means you are comparing apples with pears (D has to *parse* an UTF8 
> string and Java simply uses an wchar array without proper surrogate 
> handling in *many* cases).
> 
> - At least in runCharIterateTest() you also convert the D UTF8 string 
> also additionally into an UTF32 string, in the Java version you did not 
> do this.
> 
> - The StringBuilder in the Java version is *much* faster because it 
> doesn't have to allocate a new memory block in each step. You can use a 
> similar class in D too, without the need of a special string class/object.
> 
> ...
> 
> LLAP,
> Sascha

Nonsense!

The benchmark is valid because I use the best string processing tools 
that each language provides. If D had anything like a StringBuilder, I 
would use it. If D had any way of iterating over the characters in a 
string without converting them to UTF-32, I'd use that too.

People argue that D string processing uses these funky idioms for 
performance reasons, and that using a more elegant design, with objects 
and polymorphism would be hopelessly slow. I'm just showing that those 
idioms don't actually provide the performance that people claim.

--benji



More information about the Digitalmars-d mailing list