String implementations

Sun Jan 20 00:04:01 PST 2008

On 1/20/08, Jarrod <qwerty at ytre.wq> wrote:
> > Moreover, working with user-editable config files - I would have thought
> > that a job for a text editor, not a programming language. I'm confused.
>
> Indeed, you are a tad confused.

Yep. I said so! :-)

> I'm allowing the user to edit config
> files

How? With a GUI interface? With a program written in D? With their
favorite text editor of choice?

If the latter, then you cannot be sure of the encoding, and that's
hardly D's fault!

> so that my GUI application can read it in on startup and use it to
> populate a dialog display as well as fill out numerous options involving
> how it deals with a web interface. Because I don't know what the user is
> going to input I have to do a fair amount of converting.

Right, but converting from one encoding to another is the job of
specialised classes. Detecting whether a text file is in ISO-8859-1,
or Windows-1252, or MAC-ROMAN, or whatever, is not a trivial task. If
your application were going to do that, you'd have to provide the
implementation. (Or possibly Tango or some other third party library
already provides such converters - I don't know). In any case, it's
not a common enough task to warrant built-in language support.

But I still don't see what this has got to do with whether or not a[n]
should identify the (n+1)th character rather than the (n+1)th code
unit.

> Yes, this in indeed the main motivation behind this entire rant.

Cool. So what is the real world use case that necessitates that
sequences of UTF-8 code units must be addressable by character index
as the default?