dstring - was Re: next version of DWT?

Marcin Kuszczak aarti at interia.pl
Sun May 6 02:36:23 PDT 2007


Chris Miller wrote:
> Is this limitation really a problem for some of your code? I thought it
> was still big enough, with room to spare. I don't recall ever having a
> string of over 1_073_741_824 characters. Also, for a 64-bit program, the
> limit is raised considerably, to thousands of terabytes (and
> string.MAX_LENGTH will reflect this automatically).


I did not get to this problem, mainly because I didn't used it. And I did
not used it because your work still doesn't meet my criteria for something
what I would use in my development:
1. Because there will be *for sure* people who will get to this limit. If
something can happen it will happen. And then you are in trouble, because
you can no more easily interchange between your string class and d character
arrays, and your are in the start point again... In fact when assigning
*any* char[] variable to your string, I should first check if it will fit
into it...
2. I want string to do more than just normal character arrays, and I
shouldn't accept something what is in some areas better, but in some worse.
Higher abstraction has usually drawbacks - it needs more processing power
and/or more memory. But I accept it as I need higher abstraction...
3. Allocating one additional byte in your struct probably will not be a big
deal for anyone...And it shouldn't break anything, should it?
4. There is still problem with optimization of memory consumption when
adding dchar to string containing char[]. 4 times bigger memory consumption
than original char[] is too much for me. I think that your string struct
should be default optimize for lower memory consumption, and has static
fields (methods) to set policy for speed.
5. It's not standard (not included in Phobos nor in Tango)


When I answering your question I decided to quickly test two currently
available string classes for D language: dstring and Tango String class.
Below is quick comparison (It doesn't pretend to be exhaustive and/or very
accurate):

1. Assigning:
a. yes  // string s         // string s1 b. yes  // auto s         // auto s1 
2. Assigning again literals:
a. no // s b. no // s 
3. Assigning variables
a. yes // s b. no  // s 
4. Reading (d language shortcoming):
a. no // char[] d b. no // char[] d 
5. Concatenating
a. yes // s~=s1;
b. no  // s!=s1;

6. Proper slicing of Utf8 sequence by letters not bytes
a. yes
b. ???

7. Speed
a. 147 ms//
b.       //I couldn't get StopWatch to work, but it seems that it was longer :-)

Proper comparison would require more tests. These are just a few from top of
my head... Test programs attached.

Personally I think that that your implementation is more low level, but
saying that I think much more usefull in general case. Tango implementation
looks like advanced class for word-processors :-) I was disappointed seeing
this kind of string in it... Comming from C++ world I would be happy to see
string behaving similarly to STL string. This implementation should allow
easy conversion between it and character arrays (also implicit - sth. like
opImplicitCast what Andrei suggested on D NG) and also between character
arrays and string (your implementation cover almost fully this case).

I think that there is a place for both: templated methods taking arrays of
characters when speed is necessary and string struct/class for cases where
speed is not such a big concern. I am building library which will not demand
speed and adding 3 versions of every function would be lot of work. Leaving
only char[] versions would be also not good... In such a cases string
struct/class should be taken into account...


PS. Sorry for pointed-out-style of my e-mail. English is not my native
language, and it is easier for me to write like this. And additionally I
like such a style :-) But maybe someone can think that it's too simple way
of talking...

-- 
Regards
Marcin Kuszczak (Aarti_pl)
-------------------------------------
Ask me why I believe in Jesus - http://zapytaj.dlajezusa.pl (en/pl)
Doost (port of few Boost libraries) - http://www.dsource.org/projects/doost/
-------------------------------------

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_dstring.d
Url: http://lists.puremagic.com/pipermail/digitalmars-d-dwt/attachments/20070506/e79925e2/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_tangostring.d
Url: http://lists.puremagic.com/pipermail/digitalmars-d-dwt/attachments/20070506/e79925e2/attachment-0001.txt 


More information about the Digitalmars-d-dwt mailing list