Higher level built-in strings

dsimcha dsimcha at yahoo.com
Mon Jul 19 13:19:58 PDT 2010


== Quote from Walter Bright (newshound2 at digitalmars.com)'s article
> bearophile wrote:
> > This odd post comes from reading the nice part about strings of chapter 4 of
> > TDPL. In the last few years I have seen changes in how D strings are meant
> > and managed, changes that make them less and less like arrays (random-access
> > sequences of mutable code units) and more and more what they are at high
> > level (immutable bidirectional sequences of code points).
> Strings in D are deliberately meant to be arrays, not special things. Other
> languages make them special because they have insufficiently powerful arrays.
> As for indexing by code point, I also believe this is a mistake. It is proposed
> often, but overlooks:
> 1. most string operations, such as copying and searching, even regular
> expressions, work just fine using regular indices.
> 2. doing the operations in (1) using code points and having to continually
> decode the strings would result in disastrously slow code.
> 3. the user can always layer a code point interface over the strings, but going
> the other way is not so practical.

4.  Sometimes one can make valid assumptions about the contents of a string.  For
example, in an internal utility app that will never be internationalized you may
get away with assuming a character is an ASCII byte.  If you know your input will
be in the Basic Multilingual Plane (for example if working with pre-sanitized
input), you can use wstrings and always assume a character is 2 bytes.

5.  For dchar strings, a code unit equals a code point.  Should the interface for
dchar strings be completely different than that for char and wchar strings?


More information about the Digitalmars-d mailing list