stdio performance in tango, stdlib, and perl

Andrei Alexandrescu (See Website For Email) SeeWebsiteForEmail at erdani.org
Thu Mar 22 11:32:32 PDT 2007


kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> torhu wrote:
>>>
>>>> torhu wrote:
>>>> <snip>
>>>>
>>>>> Fastest first:
>>>>>
>>>>> tango.io.Console, no flushing (Andrei's): ca 1.5s
>>>>>
>>>>> C, reusing buffer, gcc & msvc71: ca 3s
>>>>>
>>>>> James' C++, gcc: 3.5s
>>>>>
>>>>> Phobos std.cstream, reused buffer: 11s
>>>>>
>>>>> C w/malloc and free each line, msvc71: 23s
>>>>>
>>>>> Andrei's C++, gcc: 27s
>>>>>
>>>>> C w/malloc and free each line, gcc: 37s
>>>>>
>>>>> Andrei's C++, msvc71: 50s
>>>>>
>>>>> James' C++,  msvc: 51s
>>>>
>>>>
>>>>
>>>> I've run some of the tests with more accurate timing. Andrei's Tango 
>>>> code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
>>>> flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
>>>> clocks in at 1.3 seconds.
>>>
>>>
>>>
>>> Just for jollies, a briefly optimized tango.io was tried also: it 
>>> came in at around 0.7 seconds. On a tripled file-size (3 million 
>>> lines), that version is around 23% faster than bog-standard tango.io
>>
>>
>> That's great news!
>>
>>> Thanks for giving it a whirl, tohru :)
>>>
>>>
>>> p.s. perhaps Andrei should be using tango for processing those vast 
>>> files he has?
>>
>>
>> Is it compatible with C's stdio? IOW, would this sequence work?
>>
>> readln(line);
>> int c = getchar();
>>
>> Is 'c' the first character on the next line?
> 
> 
> Nope. Tango is for D, not C. In order to make a arguably better library, 
> one often has to step away from the norm. Both yourself and Walter have 
> been saying "it needs to be fast and simple", and that's exactly what 
> Tango is showing: for those who care deeply about such things, tango.io 
> is shown to be around four times faster than the fastest C 
> implementation tried (for Andrei's test under Win32), and a notable 
> fourteen or fifteen times faster than the shipping phobos equivalent.

That's not what my tests show on Linux, where Perl and readln beat Tango 
by a large margin.

> If "interaction" between D & C on a shared, global file-handle becomes 
> some kind of issue due to buffering (and only if) we'll cross that 
> bridge at that point in time. I'm sure there's a number of solutions 
> that don't involve restricting D to using a lowest common denominator 
> approach. There's lots of smart people here who would be willing to help 
> resolve that if necessary.

Exactly. What I argue for is not adding _gratuitous_ incompatibility. 
I'm seeing that using read instead of getline on Linux does not add any 
speed. They why not use getline and be done with it. Everybody would be 
happy.

> The tango.io package is intended to be clean, extensible, simple, and a 
> whole lot more coherent than certain others. We feel it meets those 
> goals, and it happens to be quite efficient at the same time. Seems a 
> bit like sour-grapes to start looking for "issues" with that intent, 
> particularly when compared to an implementation that proclaims "It peeks 
> under the hood of C's stdio implementation, meaning it's customized for 
> Digital Mars' stdio, and gcc's stdio" ?

I'm not sure understand this. For all it's worth, there's no sour grapes 
in the mix. I *wanted* to switch to Tango to save me future aggravation.

> Tango is not meant to be a phobos clone; it doesn't make the same claims 
> as phobos and it doesn't follow the same rules as phobos. If you need 
> phobos rules, then use phobos. If you don't like tango.io speed, 
> extensibility and simplicity, without all the special cases of C IO, 
> then use phobos. If you want both then, at some point, we'll consider 
> figuring out how to make your C-oriented corner-cases work with tango.io

They aren't C-oriented. They are stream-oriented. It just so happens 
that the OS opens some streams and serves them to you in FILE* format. I 
have programs that read standard input and write to standard output. 
They are extremely easy to combine, parallelize, and run on a cluster. 
After switching form Perl to D for performance considerations, I was in 
a position of a net loss. Then I've been to hell and back figuring what 
the problem was and fixing it. Then I thought, hmmm, maybe I could have 
avoided all that by switching to Tango. So I tried Tango and it was 
again a net loss. Perl's I/O beats Tango's Cin.

> Walter wrote: "One of my goals with D is to fix that - the 
> straightforward, untuned code should get you most of the possible speed."
> 
> Andrei wrote: "Just make the clear and simple code fastest. One thing I 
> like about D is that it clearly strives to achieve best performance for 
> simply-written code."
> 
> That sentiment is very much what Tango itself is about.
> 
> You began this thread by titling it "stdio and Tango IO performance" and 
> noting the following: "has anyone verified that Tango's I/O performance 
> is up to snuff? I see it imposes the dynamic-polymorphic approach, and 
> unless there was some serious performance work going on, it's possible 
> it's even slower than stdio. "
> 
> Given the results shown above, I hope we can put that to rest at this time.

Of course you can, it's your library. You look at the results that 
please you most, I look at the results of my concrete application. I 
simply can't afford a 50%+ loss in I/O throughput, so I need to stay 
with Phobos. Why, I don't understand.


Andrei



More information about the Digitalmars-d mailing list