Streaming library

s50 shinji.igarashi at gmail.com
Wed Oct 13 17:04:57 PDT 2010


I think st_blksize is often used to determine buffering size for I/O.
Calling I/O syscall many times causes inefficiency.
Buffered read, lookahead is essentially a gamble.
Reading data of st_blksize in one time is usually faster than
reading same size of the data in divided syscall.
But, in the first place, if you want to read less size than st_blksize?
And then if you want to seek the file far ahead from there?
I think there is no buffering strategy works well in all use cases.
So I believe the library should accept some hints from client codes.

2010/10/14 Denis Koroskin <2korden at gmail.com>:
> On Thu, 14 Oct 2010 03:06:30 +0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 10/13/10 17:23 CDT, Denis Koroskin wrote:
>>>
>>> On Thu, 14 Oct 2010 02:01:24 +0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> On 10/13/10 16:05 CDT, Denis Koroskin wrote:
>>>>>
>>>>> On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
>>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>
>>>> To substantiate my brief answer:
>>>>
>>>>> This library code needs to be put somewhere. I just believe it belongs
>>>>> to line-reader, not a generic stream. By putting line reading into a
>>>>> stream interface, you want make it more efficient.
>>>>
>>>> I assume you meant "won't" instead of "want". So here you're saying
>>>> that line-oriented I/O does not belong in the interface because it
>>>> won't make things more efficient.
>>>>
>>>> But then your line reading code is extremely efficient by using the
>>>> interface you yourself embraced.
>>>>
>>>
>>> By adding readln() to the stream interface you will only move that code
>>> from ByLine to Stream implementation. The code would be still the same.
>>>
>>> How can you make it any more efficient? I've read fgets() source code
>>> that comes with Microsoft CRT, and it does exactly the same what I did
>>> (i.e. fill buffer, read byte-by-byte, copy to output string).
>>> It also puts restrictions on line size while I didn't (that's why I had
>>> to use an Appender). I also did a line copy (dup) so that I could return
>>> immutable string.
>>>
>>> You see, it's not the Stream interface that make that code less
>>> efficient, it's the additional functionality over C API it provides.
>>
>> Gnu offers two specialized routines:
>> http://www.gnu.org/s/libc/manual/html_node/Line-Input.html. It is many times
>> more efficient than anything that can be done in client code using the stdio
>> API. I'm thinking along those lines.
>>
>
> I can easily implement similar interface on top of chunked read:
> ubyte[] readLine(ubyte[] lineBuffer); or bool readLine(ref ubyte[]
> lineBuffer);
>
> I've quickly looked through an implementation, too, and it's still filling a
> buffer first, and then copying character byte-by-byte to the output string
> (making realloc when needed) until a delimiter is found.
> It is exactly as efficient as implemented externally. It does the same
> amount of copying and memory allocations. "Many times more efficient" is
> just an overestimation.
>
> BTW, did you see my message about std.concurrency?
>
>>>> Andrei
>>>>
>>>> P.S. I think I figured the issue with your fonts: the header
>>>> Content-type contains "charset=KOI8-R". That charset propagates
>>>> through all responses. Does anyone know how I can ignore it?
>>>
>>> I've changed that to utf-8. Did it help?
>>
>> Yes, looking great. Thanks!
>>
>>
>> Andrei
>


More information about the Digitalmars-d mailing list