string types: const(char)[] and cstring
renoX
renosky at free.fr
Sun May 27 05:39:57 PDT 2007
Marcin Kuszczak a écrit :
> Chris Miller wrote:
>
>> Actually, while we're at a change for strings, why not bring in something
>> similar to my dstring module, where slicing and indexing never result in
>> an invalid UTF sequence? http://www.dprogramming.com/dstring.php - the
>> code may not be ideal, but it's the concept I'm referring to.
>
> Yup. That's my opinion also...
>
> For me advantages of such a string are quite obvious:
> 1. Easy slicing and indexing of utf8 sequences (without corrupting this
> sequence - as mention above)
> 2. Common denominator for char[], wchar[] and dchar[]
> 3. For classes which doesn't need speed it simplifies API (only one version
> of functions instead of 3)
> 4. With some additional support from language (cast operators to different
> types and opImplicitCast) it can be fully interchangeable with every method
> taking char[], wchar[], dchar[].
>
> Having another 3 names for string is not very appealing for me. We would
> have 9 official versions of string available in D:
> char[], wchar[], dchar[], string, cwstring, cdstring, tango String!(char),
> tango String!(wchar), tango String!(dchar)
>
> To write nice, fully functional library you have to write 3 versions of
> every function which takes different string types (I know, templates makes
> it a little bit easier). Probably I will not be wrong when I say that
> reality is that people just write one version for char[], because it is
> convenient (see: SWT ported from Java). It causes that wchar and dchar are
> treated as second class citizens in D. Additionally when people design
> their program for char[], they mostly don't think about issues with slicing
> of char[] utf8 sequence (warning! assumption!), so default way of writing
> programs is *NOT SAFE*. When you write code and don't care about bare metal
> speed it is just tedious to do this additional work...
>
> Having one string, which hides differences between char[], wchar[] and
> dchar[] would solve problem nicely. Adding constness would also be easy.
> And you use only one reserved keyword - string - for everything.
>
> I would be happy to hear some other opinions from people on NG. Maybe I am
> wrong with above arguments, so probably someone can give
> counterarguments... I think it is very important issue as it seems that
> most developers over the world are non-native-english-speakers...
>
> PS. See also thread on DWT NG.
I agree with you, I don't think that the string should be a char[]
alias, wether it's const or not but a class with char[],dchar[],wchar[]
under the hood representation and safe slicing by default.
The difficulty is providing enough flexibility for managing correctly
the internal representation: there should be a possibility to say use
UTF8 even though there are multibyte characters for example (a size
optimization with some CPU cost).
renoX
More information about the Digitalmars-d-announce
mailing list