C strings - byte, ubyte or char? (Discussion from Bugzilla)

Thu Oct 4 09:17:59 PDT 2007

"Matti Niemenmaa" <see_signature at for.real.address> wrote in message 
news:fe2t70$2eka$1 at digitalmars.com...
<snip>
> Good idea. But note that I'm not talking only about C string-processing
> functions: in general, any functions which process strings without regard 
> to
> their encoding should use ubytes.
>
> Just about all of std.string are such, for instance.

Looks like I'll have to investigate....

> The Tango situation is
> better, since tango.text.Util is already templated for char/wchar/dchar: 
> ubyte
> would need to be added to the mix.
<snip>
> One problem with toStringz is efficiency. Its current implementation of 
> performs
> a string concatenation every time. If you know the string is zero 
> terminated and
> ASCII (or you just want it to be handled as encoding-agnostic), you should 
> just
> be able to pass it through.

I had no idea that the implementation had changed.

> But on second thought, having the cast (or a call to toStringz) be 
> necessary
> might be better. If you want UTF-8 to be handled as encoding-agnostic, a
> necessary cast may be a good idea, as it implies you know what you're 
> doing.

Why should I care that a function is encoding-agnostic if I know what 
encoding my text is in?  That sounds to me like suggesting that I should 
have to cast class instances explicitly to Object to prove I know that the 
function can use objects of any class.

Stewart.

-- 
My e-mail address is valid but not my primary mailbox.  Please keep replies 
on the 'group where everybody may benefit.