String implementations
Jarrod
qwerty at ytre.wq
Wed Jan 16 19:32:22 PST 2008
On Wed, 16 Jan 2008 10:27:53 -0500, Steven Schveighoffer wrote:
>
> The algorithmic penalties would be O(n) for a indexed lookup then
> instead of O(1).
I understand this, but the compiler could probably optimize this for most
situations. Most string access would be sequential and thus positions
could be cached on access when need be, and string literals that aren't
modified and have all single-byte chars could be optimized into normal
indexing. Furthermore, modern processors are incredibly good at
sequential iteration and I know from personal experience that they can
parse over massive chunks of memory in mere milliseconds (hashing entire
executables in memory for potential changes is a common example of this).
It shouldn't be noticeable at all to scan over a string. I do believe the
author of the article that bearophile linked agrees with me on this
regard, in his mention of charAt implementation.
> I think the correct method in this case is to convert to utf32 first,
> then index. Then at least you only take the O(n) penalty once.
Well, converting to dchar[] means a full iteration over the entire string
to split up the units. Then the program has to allocate space, copy chars
over, and add padding. Is it really all that much more efficient? And why
should the programmer have to worry about the conversion anyway? Good
languages avoid cognitive load on the programmers.
> Or why not just use dchar[] instead of char[] to begin with?
Yes, you could just use dchar[] all the time, but how many people do
that? It's very space-inefficient which is the whole reason utf-8 exists.
If dchar[]s were meant to be used more often Walter probably would have
made them the default string type.
Eh, I guess this is just one of those annoying little 'loose ends' I see
when I look at D.
More information about the Digitalmars-d
mailing list