stdio performance silliness

kris foo at bar.com
Thu Mar 22 12:00:02 PDT 2007


Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
> 
>> Andrei Alexandrescu (See Website For Email) wrote:
>>
>>> kris wrote:
>>>
>>>> torhu wrote:
>>>>
>>>>> torhu wrote:
>>>>> <snip>
>>>>>
>>>>>> Fastest first:
>>>>>>
>>>>>> tango.io.Console, no flushing (Andrei's): ca 1.5s
>>>>>>
>>>>>> C, reusing buffer, gcc & msvc71: ca 3s
>>>>>>
>>>>>> James' C++, gcc: 3.5s
>>>>>>
>>>>>> Phobos std.cstream, reused buffer: 11s
>>>>>>
>>>>>> C w/malloc and free each line, msvc71: 23s
>>>>>>
>>>>>> Andrei's C++, gcc: 27s
>>>>>>
>>>>>> C w/malloc and free each line, gcc: 37s
>>>>>>
>>>>>> Andrei's C++, msvc71: 50s
>>>>>>
>>>>>> James' C++,  msvc: 51s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I've run some of the tests with more accurate timing. Andrei's 
>>>>> Tango code uses 0.9 seconds, with no flushing, and 1.6 seconds with 
>>>>> flushing.  I also tried cat itself, from the gnuwin32 project.  cat 
>>>>> clocks in at 1.3 seconds.
>>>>
>>>>
>>>>
>>>>
>>>> Just for jollies, a briefly optimized tango.io was tried also: it 
>>>> came in at around 0.7 seconds. On a tripled file-size (3 million 
>>>> lines), that version is around 23% faster than bog-standard tango.io
>>>
>>>
>>>
>>> That's great news!
>>>
>>>> Thanks for giving it a whirl, tohru :)
>>>>
>>>>
>>>> p.s. perhaps Andrei should be using tango for processing those vast 
>>>> files he has?
>>>
>>>
>>>
>>> Is it compatible with C's stdio? IOW, would this sequence work?
>>>
>>> readln(line);
>>> int c = getchar();
>>>
>>> Is 'c' the first character on the next line?
>>
>>
>>
>> Nope. Tango is for D, not C. In order to make a arguably better 
>> library, one often has to step away from the norm. Both yourself and 
>> Walter have been saying "it needs to be fast and simple", and that's 
>> exactly what Tango is showing: for those who care deeply about such 
>> things, tango.io is shown to be around four times faster than the 
>> fastest C implementation tried (for Andrei's test under Win32), and a 
>> notable fourteen or fifteen times faster than the shipping phobos 
>> equivalent.
> 
> 
> That's not what my tests show on Linux, where Perl and readln beat Tango 
> by a large margin.
> 
>> If "interaction" between D & C on a shared, global file-handle becomes 
>> some kind of issue due to buffering (and only if) we'll cross that 
>> bridge at that point in time. I'm sure there's a number of solutions 
>> that don't involve restricting D to using a lowest common denominator 
>> approach. There's lots of smart people here who would be willing to 
>> help resolve that if necessary.
> 
> 
> Exactly. What I argue for is not adding _gratuitous_ incompatibility. 
> I'm seeing that using read instead of getline on Linux does not add any 
> speed. They why not use getline and be done with it. Everybody would be 
> happy.
> 
>> The tango.io package is intended to be clean, extensible, simple, and 
>> a whole lot more coherent than certain others. We feel it meets those 
>> goals, and it happens to be quite efficient at the same time. Seems a 
>> bit like sour-grapes to start looking for "issues" with that intent, 
>> particularly when compared to an implementation that proclaims "It 
>> peeks under the hood of C's stdio implementation, meaning it's 
>> customized for Digital Mars' stdio, and gcc's stdio" ?
> 
> 
> I'm not sure understand this. For all it's worth, there's no sour grapes 
> in the mix. I *wanted* to switch to Tango to save me future aggravation.
> 
>> Tango is not meant to be a phobos clone; it doesn't make the same 
>> claims as phobos and it doesn't follow the same rules as phobos. If 
>> you need phobos rules, then use phobos. If you don't like tango.io 
>> speed, extensibility and simplicity, without all the special cases of 
>> C IO, then use phobos. If you want both then, at some point, we'll 
>> consider figuring out how to make your C-oriented corner-cases work 
>> with tango.io
> 
> 
> They aren't C-oriented. They are stream-oriented. It just so happens 
> that the OS opens some streams and serves them to you in FILE* format. I 
> have programs that read standard input and write to standard output. 
> They are extremely easy to combine, parallelize, and run on a cluster. 
> After switching form Perl to D for performance considerations, I was in 
> a position of a net loss. Then I've been to hell and back figuring what 
> the problem was and fixing it. Then I thought, hmmm, maybe I could have 
> avoided all that by switching to Tango. So I tried Tango and it was 
> again a net loss. Perl's I/O beats Tango's Cin.
> 
>> Walter wrote: "One of my goals with D is to fix that - the 
>> straightforward, untuned code should get you most of the possible speed."
>>
>> Andrei wrote: "Just make the clear and simple code fastest. One thing 
>> I like about D is that it clearly strives to achieve best performance 
>> for simply-written code."
>>
>> That sentiment is very much what Tango itself is about.
>>
>> You began this thread by titling it "stdio and Tango IO performance" 
>> and noting the following: "has anyone verified that Tango's I/O 
>> performance is up to snuff? I see it imposes the dynamic-polymorphic 
>> approach, and unless there was some serious performance work going on, 
>> it's possible it's even slower than stdio. "
>>
>> Given the results shown above, I hope we can put that to rest at this 
>> time.
> 
> 
> Of course you can, it's your library. You look at the results that 
> please you most, I look at the results of my concrete application. I 
> simply can't afford a 50%+ loss in I/O throughput, so I need to stay 
> with Phobos. Why, I don't understand.

Oh, come now. Yesterday Tango was the "fastest" on your machine, and 
today it is not. And you're now claiming a 50% loss in throughput?

I put it to you that you're not being very forthcoming in allowing for 
changes in tango.io to address this anomoly in your timings? Yesterday I 
pointed out where to make the change so that you could try tango without 
the automatic chomp; you didn't bother to do that. There is a change in 
SVN implementing your request, but you're not bothering to try that either.

Instead, you appear to be using empty rhetoric and exaggeration to pit 
one library against another. That's hardly being helpful, Andrei.

Tango has been shown to be very efficient on Win32, and there's no 
reason to assert that it can't be so on linux. We've seen that flush() 
is a no-no for linux, and that it has some impact on Win32 also. That 
can be rectified, as Walter kindly pointed out. If you're serious about 
giving Tango a shot, then give it some time for the different platform 
specifics to be addressed. Is that really too much to ask? Of a beta 
release?



More information about the Digitalmars-d mailing list