Best way to make Until! into string

div0 div0 at users.sourceforge.net
Tue Jun 22 13:40:34 PDT 2010


On 22/06/2010 19:26, Jonathan M Davis wrote:
> div0 wrote:
>
>> On 22/06/2010 07:29, Jonathan M Davis wrote:
>>> Okay. If you call until like so
>>>
>>> str.until('\"')
>>>
>>> you get a Until!(pred,string,char). I want to turn that into a string.
>>> array() doesn't seem to do the trick right now. It used to work, but now
>>> it gives me
>>>
>>> main.d(47): Error: template std.array.array(Range) if (isForwardRange!
>>> (Range)) does not match any function template declaration
>>> main.d(47): Error: template std.array.array(Range) if (isForwardRange!
>>> (Range)) cannot deduce template function from argument types !()(Until!
>>> (pred,string,char))
>>>
>>> to!string just converts it into a string with the Until! stuff being
>>> included in the string rather than giving me the actual result, so that
>>> doesn't work.
>>>
>>> So, what is the correct and preferred way to convert the result of Until!
>>> to as string when you were searching on a string in the first place? The
>>> std.algorithm functions are definitely nice, but they have tendancy to
>>> return hard-to-use types.
>>>
>>> - Jonathan M Davis
>>
>> Could be wrong, but strings aren't (conceptually) arrays any more.
>
> As I understand it, they're definitely arrays. It's just that they because
> they're arrays of char (well immutable(char)) but are read as unicode code
> points, the type of the array isn't necessarily a full character and code
> that needs to read code points has to treat them as a range of code points
> rather than an array of char. So, whether you treat them as an array depends
> a bit on what you're doing with them. As long as you're not actually trying
> to intrepret them as code points, however, they're the same as any other
> array.

I think we're talking about the same thing but with slightly different 
terminology.

As far as I understand it, for a string (I'm specifically talking 
immutable(char)) each byte is not a 'code point' or a character.

It may take multiple bytes to encode a 'code point' and it can take 
multiple code points to encode a character. (Or maybe it's just 2 code 
points at most, I'm not clear on the details of combining characters)

So it's never valid to randomly access the bytes in a utf-8 string.
If you take a random byte out of string, you might be getting only one 
byte out a multi byte encoded 'code point'.

(I have the vague recollection that each byte of an encoded
'code point' is itself clearly defined to be an invalid utf
'code point' so you can't accidentally go using one)

Which is bad and therefore you should never conceptually treat a string 
as an array. (To me array implies random access and that's you are going 
to be doing such).

When you have an encoded 'code point' in a utf-8 string you have to 
start mucking about with bit shifting I believe to decode it.

Of course strings are arrays, but I think this is just an implementation 
detail and might be/probably should be changed in future.
Andrei was quite emphatic in one of his posts that strings would from 
now on be bidirectional ranges.

If you decode a string to dstring then you have a list of code points.
I'm not clear on whether randomly accessing a dstring is a good idea or 
bad idea though.

If any of that seems wrong please let me know. I'm a bit hazy on the 
finer points of utf myself and I don't want to carry on making invalid 
assumptions about it.

>
>>
>> They are bidirectional ranges which is why the array call doesn't work.
>> Though how you actually get a string back I don't know.
>>
>
> I wasn't clear enough. I was basically doing this:
>
> to!string(array(str.until('\"')));
>
> As I understand it, that forces the Until type into an array of whatever
> type (probably char[]) and then to!string would convert it to
> immutable(char)[]. It's the cleanest way that I found (well, actually, the
> only way I think) to convert the result of until() to string in spite of the
> fact that it was called with a string in the first place. It's one of the
> prices of flexibility, I guess.
>
> - Jonathan M Davis

Sry, means nothing to me. I'm still using dmd 2.028. :(

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk


More information about the Digitalmars-d-learn mailing list