Top 5
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sat Oct 11 07:05:26 PDT 2008
Sergey Gromov wrote:
> Sat, 11 Oct 2008 12:16:43 +0200,
> Sascha Katzner wrote:
>> Benji Smith wrote:
>>> Actually, when it comes to string processing, D is decidedly *not* a
>>> "performance language".
>>>
>>> Compared to...say...Java (which gets a bum rap around here for being
>>> slow), D is nothing special when it comes to string processing speed.
>>>
>>> I've attached a couple of benchmarks, implemented in both Java and D
>>> (the "shakespeare.txt" file I'm benchmarking against is from the
>>> Gutenburg project. It's about 5 MB, and you can grab it from here:
>>> http://www.gutenberg.org/dirs/etext94/shaks12.txt )
>>>
>>> In some of those benchmarks, D is slightly faster. In some of them, Java
>>> is a lot faster. Overall, on my machine, the D code runs in about 12.5
>>> seconds, and the Java code runs in about 2.5 seconds.
>>>
>>> Keep in mind, all java characters are two-bytes wide. And you can't
>>> access a character directly. You have to retrieve it from the String
>>> object, using the charAt() method. And splitting a string creates a new
>>> object for every fragment.
>>>
>>> I admire the goal in D to be a performance language, but it drives me
>>> crazy when people use performance as justification for an inferior
>>> design, when other languages that use the superior design also
>>> accomplish superior performance.
>> I think your benchmark is not very meaningful. Without going into
>> implementation details of Tango (because I don't use Tango) here are
>> some notes:
>>
>> - The D version uses UTF8 strings whereas the Java version uses
>> "wanna-be" UTF16 (Java has a lot of problems with surrogates). This
>> means you are comparing apples with pears (D has to *parse* an UTF8
>> string and Java simply uses an wchar array without proper surrogate
>> handling in *many* cases).
>
> This is the whole point. The benchmark is valid because it performs the
> same *task*, and the task is somewhat close to real world. It measures
> *time*, which is universal. The compared languages use different
> approaches and techniques to achieve the goal, that's why benchmark is
> useful. It allows to justify usefulness of these languages for a
> particular class of tasks.
>
>> - At least in runCharIterateTest() you also convert the D UTF8 string
>> also additionally into an UTF32 string, in the Java version you did not
>> do this.
>
> Same as above. If they were using the same approach there wouldn't be
> much to benchmark. Why don't you mention, for instance, that Java is a
> virtual machine?
>
>> - The StringBuilder in the Java version is *much* faster because it
>> doesn't have to allocate a new memory block in each step. You can use a
>> similar class in D too, without the need of a special string class/object.
>
> I agree here. Both word tango.text.Util.split and runConcatenateTest
> use default array appending which is currently dead slow. Benji, to
> actually compare the speed of string operations you better use one of
> array builders discussed in this group.
If anyone wants to try it, I'm pasting the draft version of Appender
from std.array below.
Andrei
struct Appender(A : T[], T)
{
private T[] * pArray;
private size_t _capacity;
this(T[] * p)
{
pArray = p;
if (!pArray) pArray = (new typeof(*pArray)[1]).ptr;
_capacity = .capacity(pArray.ptr) / T.sizeof;
}
T[] data()
{
return pArray ? *pArray : null;
}
size_t capacity() const { return _capacity; }
void write(T item)
{
if (!pArray) pArray = (new typeof(*pArray)[1]).ptr;
if (pArray.length < _capacity)
{
// Should do in-place construction here
pArray.ptr[pArray.length] = item;
*pArray = pArray.ptr[0 .. pArray.length + 1];
}
else
{
// Time to reallocate, do it and cache capacity
*pArray ~= item;
_capacity = .capacity(pArray.ptr) / T.sizeof;
}
}
static if (is(const(T) : T))
{
alias const(T) AcceptedElementType;
}
else
{
alias T AcceptedElementType;
}
void write(AcceptedElementType[] items)
{
for (; !items.empty(); items.next()) {
write(items.head());
}
}
static if (is(const(T) == const(char))) {
void write(in wchar wc) { assert(false); }
void write(in wchar[] wcs)
{
encode!(T)(wcs, *this);
}
void write(in dchar dc) { assert(false); }
void write(in dchar[] dcs)
{
encode!(T)(dcs, *this);
}
}
void clear()
{
if (!pArray) return;
pArray.length = 0;
_capacity = .capacity(pArray.ptr) / T.sizeof;
}
}
auto appender(T)(T[] * t)
{
Appender!(T[]) r = Appender!(T[])(t);
return r;
}
More information about the Digitalmars-d
mailing list