string-ish range/stream from curl ubyte[] chunks?

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri May 16 14:35:04 PDT 2014


On Fri, 16 May 2014 16:57:41 -0400, Vlad <b100dian at gmail.com> wrote:

> Hello D programmers,
>
> I am toying with writing my own HTML parser as a pet project, and I  
> strive to have a range API for the tokenizer and the parser output  
> itself.
>
> However it occurs to me that in real-life browsers the advantage of this  
> type of 'streaming' parsing would be given by also having the string  
> that plays as input to the tokenizer treated as a 'stream'/'range'.
>
> While D's *string classes do play as ranges, what I want to write is a  
> 'ChunkDecoder' range that would take curl 'byChunk' output and make it  
> consumable by the tokenizer.
>
> Now, the problem: string itself has ElementType!string == dchar.  
> Consuming a string a dchar at a time looks like a wasteful operation if  
> e.g. your string is UTF-8 or UTF-16.
>
> So, naturally, I would like to use indexOf() - instead of countUntil() -  
> and opSlice (without opDollar?) on my ChunkDecoder (forward) range.
>
> Q: Is anything like this already in use somewhere in the standard  
> library or a project you know?

There is an effort by myself and Dmitry Olshansky to create a stream API  
that looks like a range. I am way behind on getting it to work, but I have  
something that compiles.

The effort is to replace the underlying mechanism for std.stdio  
(optionally), and to replace std.stream

> Q2: Or do you have any pointers for what the smallest API would be for a  
> string-like range class?

I think Dmitry has a pretty good API. I will hopefully be posting my  
prototype soon. I hate to say wait for it, because I have been very lousy  
at getting things finished lately. But I want to have something to show  
before the conference.

The code I have will support all encodings, and provide a range API that  
works with dchar-like ranges. The idea is to be able to make code that  
works with both arrays and streams seamlessly.

> And bonus:
> Q3: any uses of such a string-ish range in other standard library  
> methods that you can think of and could be contributed to? e.g. suppose  
> this doesn't exist and I / we come up with a proposal of minimal API to  
> consume a string from left to right.

I hate for you to duplicate efforts, hold off until we get something  
workable. Then we can discuss the API.

Dmitry's message is here:  
http://forum.dlang.org/post/l9q66g$2he3$1@digitalmars.com

My updates have not been posted yet to github, I don't want to post  
half-baked code yet. Stay tuned.

-Steve


More information about the Digitalmars-d-learn mailing list