std.d.lexer requirements

Jonathan M Davis jmdavisProg at gmx.com
Thu Aug 2 01:38:22 PDT 2012


On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:
> On 8/2/2012 12:43 AM, Jonathan M Davis wrote:
> > It is for ranges in general. In the general case, a range of UTF-8 or
> > UTF-16 makes no sense whatsoever. Having range-based functions which
> > understand the encodings and optimize accordingly can be very beneficial
> > (which happens with strings but can't happen with general ranges without
> > the concept of a variably-length encoded range like we have with forward
> > range or random access range), but to actually have a range of UTF-8 or
> > UTF-16 just wouldn't work. Range-based functions operate on elements, and
> > doing stuff like filter or map or reduce on code units doesn't make any
> > sense at all.
> 
> Yes, it can work.

How? If you operate on a range of code units, then you're operating on 
individual code units, which almost never makes sense. There are plenty cases 
where a function which understands the encoding can avoid some of costs 
associated with decoding and whatnot, but since range-based functions operate 
on their elements, if the elementse are code units, a range-based function 
will operate on individual code units with _no_ understanding of the encoding 
at all. Ranges have no concept of encoding.

Do you really think that it makes sense for a function like map or filter to 
operate on individual code units? Because that's what would end up happening 
with a range of code units. Your average, range-based function only makes 
sense with _characters_, not code units. Functions which can operate on ranges 
of code units without screwing up the encoding are a rarity.

Unless a range-based function special cases a range-type which is variably-
lengthed encoded (e.g. string), it just isn't going to deal with the encoding 
properly. Either it operates on the encoding or the actual value, depending on 
what its element type is.

I concur that operating on strings as code units is better from the standpoint 
of efficiency, but it just doesn't work with a generic function without it 
having a special case which therefore _isn't_ generic.

- Jonathan M Davis


More information about the Digitalmars-d mailing list