dstring - was Re: next version of DWT?

Sun May 13 12:44:22 PDT 2007

I forgot to reply to this; comments embedded...

On Sun, 06 May 2007 05:36:23 -0400, Marcin Kuszczak <aarti at interia.pl>  
wrote:

> Chris Miller wrote:
>> Is this limitation really a problem for some of your code? I thought it
>> was still big enough, with room to spare. I don't recall ever having a
>> string of over 1_073_741_824 characters. Also, for a 64-bit program, the
>> limit is raised considerably, to thousands of terabytes (and
>> string.MAX_LENGTH will reflect this automatically).
>
>
> I did not get to this problem, mainly because I didn't used it. And I did
> not used it because your work still doesn't meet my criteria for  
> something
> what I would use in my development:
> 1. Because there will be *for sure* people who will get to this limit. If
> something can happen it will happen. And then you are in trouble, because
> you can no more easily interchange between your string class and d  
> character
> arrays, and your are in the start point again... In fact when assigning
> *any* char[] variable to your string, I should first check if it will fit
> into it...

Well, I just wasn't sure. I'm still wondering what others think about this  
limitation. I wrote dstring mainly to see how it would go.

A billion characters seems plenty to me; and this is just for 32-bit  
binaries. I could be wrong. I also figured those who need incredibly large  
strings will probably want to write special-purpose string handling code  
anyway, and it would seem odd that they would pass such large strings to  
functions that don't expect them to be so large (e.g. std.string.replace  
on a 1.5 gig string? yikes).

You don't need to check if it fits because it does that for you and throws  
an exception.

> 2. I want string to do more than just normal character arrays, and I
> shouldn't accept something what is in some areas better, but in some  
> worse.
> Higher abstraction has usually drawbacks - it needs more processing power
> and/or more memory. But I accept it as I need higher abstraction...
> 3. Allocating one additional byte in your struct probably will not be a  
> big
> deal for anyone...And it shouldn't break anything, should it?

8 bytes, nicely aligned struct, vs. 9 bytes? or maybe 12 bytes? It was  
designed to be easy to pass to functions and pack into other structures,  
like char[]. Adding to it will kill these benefits, especially the ability  
to return into registers.

> 4. There is still problem with optimization of memory consumption when
> adding dchar to string containing char[]. 4 times bigger memory  
> consumption
> than original char[] is too much for me. I think that your string struct
> should be default optimize for lower memory consumption, and has static
> fields (methods) to set policy for speed.

Any dchar added to it doesn't do it; it will only if it can't fit into a  
single char or wchar. To get to dchar requires characters outside the BMP  
even, which can be quite rare.

I believe Python is going to be using "dchar" for any Unicode strings  
beyond ASCII. I think dstring's way at least saves more than this.

> 5. It's not standard (not included in Phobos nor in Tango)

Agreed; I don't even use dstring at the moment.