String implementations

Kris foo at bar.com
Sun Jan 20 10:28:24 PST 2008


Jarrod: you might find something useful in the way the tango Text class 
operates? It attempts to make the common operation independent of indexing, 
in order to avoid some of these unit/point problems.


"Jarrod" <qwerty at ytre.wq> wrote in message 
news:fmv7s7$76h$2 at digitalmars.com...
> On Sun, 20 Jan 2008 08:04:01 +0000, Janice Caron wrote:
>
>>> I'm allowing the user to edit config
>>> files
>>
>> How? With a GUI interface? With a program written in D? With their
>> favorite text editor of choice?
>>
>> If the latter, then you cannot be sure of the encoding, and that's
>> hardly D's fault!
>
> It is the latter.
>
>
>> Right, but converting from one encoding to another is the job of
>> specialised classes. Detecting whether a text file is in ISO-8859-1, or
>> Windows-1252, or MAC-ROMAN, or whatever, is not a trivial task. If your
>> application were going to do that, you'd have to provide the
>> implementation. (Or possibly Tango or some other third party library
>> already provides such converters - I don't know). In any case, it's not
>> a common enough task to warrant built-in language support.
>>
>> But I still don't see what this has got to do with whether or not a[n]
>> should identify the (n+1)th character rather than the (n+1)th code unit.
>
> Because this issue isn't really to do with the input file itself, it's to
> do with the potential input characters given in the file. As far as I can
> tell (I'm using a C library to parse the input) it should be ascii or
> UTF-8 encoding.
> Anything else would probably cause the C lexer to screw up.
>
>
>> Cool. So what is the real world use case that necessitates that
>> sequences of UTF-8 code units must be addressable by character index as
>> the default?
>
> The most important one right now is splicing. I'm allowing both user-
> defined and program-defined macros in the input data. They can be
> anywhere within a string, so I need to splice them out and replace them
> with their correct counterparts. I hear the std lib provided with D is
> unreliable so I'm unwilling to use it. Plus even if it is fixed up I'd
> hate to limit string manipulation to regular expressions.
> I also wish to cut off input at a certain letter count for spacing issues
> in both the GUI and dealing with the webscript.
> I'll have to be converting certain characters to their URI equivalent
> too, that will probably take more splicing as well.
>
> The other thing I'm using is single-letter replacement. Simple stuff like
> capitalising letters and replacing spaces with underscores.
>
> I can think of a lot of other situations that would benefit from proper
> multibyte support too, for instance practically any application that
> takes frequent user input could benefit. A text editor is a very good
> example. Any coders who don't natively deal with Latin text would
> probably benefit greatly too ( think of the poor Japanese coders :< ).
> I've seen a lot of programs that print a specified number of characters
> before wrapping around or trailing off, too. The humble gnome console is
> a good example of that. Very handy to have character indexing in this
> case.
> String tokenizing and plain old character counting are two operations I
> can think of that could probably be done easier too.
>
>
> In the end I think I'm just tired of having to jump through hoops when it
> comes to string manipulation. I want to be able to say 'this is a
> character, I don't care what it is. Store it, change it, splice it, print
> it.' But instead it seems if I don't care what the character type it, it
> might not fit. Then I have to allocate then store it, find and change it,
> locate then splice it, convert then print it.
> Small annoyances build up over time and I'm pretty sure I'm not insured
> for blood vessels bursting in my eye.
>
> I live in the hope that one day in the future I'll see something magical
> happen, and I'll be able to type char chr = '?'; and chr will be a proper
> utf-8 character that I can print, insert into an array, and change.
> What a beautiful day that will be.
>
> Welp, I think I'm done ranting for now. Back to screwing around with
> strings. Or more accurately, procrastinating about screwing around with
> strings. 





More information about the Digitalmars-d mailing list