Top 5

Sergey Gromov snake.scaly at gmail.com
Sat Oct 11 05:49:34 PDT 2008


Sat, 11 Oct 2008 12:16:43 +0200,
Sascha Katzner wrote:
> Benji Smith wrote:
> > Actually, when it comes to string processing, D is decidedly *not* a 
> > "performance language".
> > 
> > Compared to...say...Java (which gets a bum rap around here for being 
> > slow), D is nothing special when it comes to string processing speed.
> > 
> > I've attached a couple of benchmarks, implemented in both Java and D 
> > (the "shakespeare.txt" file I'm benchmarking against is from the 
> > Gutenburg project. It's about 5 MB, and you can grab it from here: 
> > http://www.gutenberg.org/dirs/etext94/shaks12.txt )
> > 
> > In some of those benchmarks, D is slightly faster. In some of them, Java 
> > is a lot faster. Overall, on my machine, the D code runs in about 12.5 
> > seconds, and the Java code runs in about 2.5 seconds.
> > 
> > Keep in mind, all java characters are two-bytes wide. And you can't 
> > access a character directly. You have to retrieve it from the String 
> > object, using the charAt() method. And splitting a string creates a new 
> > object for every fragment.
> > 
> > I admire the goal in D to be a performance language, but it drives me 
> > crazy when people use performance as justification for an inferior 
> > design, when other languages that use the superior design also 
> > accomplish superior performance.
> 
> I think your benchmark is not very meaningful. Without going into 
> implementation details of Tango (because I don't use Tango) here are 
> some notes:
> 
> - The D version uses UTF8 strings whereas the Java version uses 
> "wanna-be" UTF16 (Java has a lot of problems with surrogates). This 
> means you are comparing apples with pears (D has to *parse* an UTF8 
> string and Java simply uses an wchar array without proper surrogate 
> handling in *many* cases).

This is the whole point.  The benchmark is valid because it performs the 
same *task*, and the task is somewhat close to real world.  It measures 
*time*, which is universal.  The compared languages use different 
approaches and techniques to achieve the goal, that's why benchmark is 
useful.  It allows to justify usefulness of these languages for a 
particular class of tasks.

> - At least in runCharIterateTest() you also convert the D UTF8 string 
> also additionally into an UTF32 string, in the Java version you did not 
> do this.

Same as above.  If they were using the same approach there wouldn't be 
much to benchmark.  Why don't you mention, for instance, that Java is a 
virtual machine?

> - The StringBuilder in the Java version is *much* faster because it 
> doesn't have to allocate a new memory block in each step. You can use a 
> similar class in D too, without the need of a special string class/object.

I agree here.  Both word tango.text.Util.split and runConcatenateTest 
use default array appending which is currently dead slow.  Benji, to 
actually compare the speed of string operations you better use one of 
array builders discussed in this group.



More information about the Digitalmars-d mailing list