An Invariant WTH?

Daniel Keep daniel.keep.lists at gmail.com
Sun Nov 4 20:18:49 PST 2007



Chad J wrote:
> <ramble thats somewhat OT>
> I frequently find myself in this situation where I want to iterate over
> a string, and at the same time be able to index the string an arbitrary
> number of elements ahead or behind the current one.  Either that, or I
> want to iterate through 2 strings in parallel, using the same index to
> do operations on both.  I forget why these things happen, but they do,
> and foreach is just not adequate for decoding the UTF stuff.  By the
> very act of iteration, it does define a way to index UTF8 strings, and I
> wish that definition could be applied to random access.  My other
> alternative is to use dchars, which just doesn't work that well with
> phobos, considering everything in phobos uses char[].  I haven't tried
> writing a dchar[] using program with tango yet.
> 
> I wish I could be more thorough about this though.  I believe it
> deserves more thought on my part.  It's also starting to be less
> relevant to the whole const discussion, so I'd rather just leave this at
> "string the keyword can probably be better allocated".
> </ramble thats somewhat OT>

To me, that suggests that the standard library is simply incomplete :P

The problem with random access in a UTF-8 or UTF-16 string is that,
AFAIK, there's no way to do it efficiently.  You have to decode the
*entire* string up to that index to find it.  The way the dstring
structure works is by switching to UTF-16 or UTF-32 as soon as you get
multiunit codepoints showing up.

I suppose in the end it really depends on what you're doing with your
strings, and how frequently you do it.

> My point is that const seemed to be somewhat of an experimental feature
> in D.  If the experiment fails, we need to remove it!

Agreed.  The trick will be finding an agreement for the conditions of
failure :P

>>> D2.0 just got closures, and I still get the feeling that I don't like
>>> const.
>>
>> <JohnCleese> I'm sorry, but... this is irrelevent.
>> <MichaelPalin> Leaping from tree to tree down the mighty rivers of
>> British Columbia...
>>
> 
> The problem is that D1.0 gets the shaft for new features that it would
> have otherwise benefited from.

True.  That said, since I have a fairly big, complex project written in
D 1.0, I'm really very happy with how the whole feature-freeze has
turned out.  It means I don't have to go fixing things that broke when I
updated the compiler to squash a bug (which actually happened a few times.)

> The pass-by-ref vs. pass-by-val seems juicy, but I don't think I
> understand entirely.  How does constness solve this?  Aren't strings
> ALWAYS referenced to, or are you referring to the reference itself being
> a value or something referred to?  A simple code snippet demonstrating
> this would be great (you can refer me to this if it's been done).

The funny thing is that an invariant array *works* like it's
pass-by-value, basically by virtue of the fact that no one else can
change it.

As an example, you can't do this:

int a = 1;
int b = a;
b += 5;
assert( a == 6 ); // Nope

But you *can* do this:

char[] a = "foo";
char[] b = a;
b[] = "bar";
assert( a == "bar" ); // Yup

While invariant strings act like 'int' does:

string a = "foo";
string b = a;
//b[] = "bar"; // Doesn't work
b = "bar";
assert( a == "bar" ); // Nope

A practical use of this: in DOM, you can do this:

char[] name = "tag-name";
new Element(name);

The problem is that I don't know who "owns" the string that gets passed
in.  Currently, for the sake of efficiency, I don't dup strings passed
in.  However, this means that if anyone even accidentally changes the
value pointed to by 'name', it can screw up the DOM tree.

If I use 'string', on the other hand, I can pass them around willy-nilly
as values.  It's the same as passing around an int and not having to
worry about the value of the int changing (unless you're using Java in
which case the value of an int *can* apparently change.)  You can't
redefine, say, 0 to mean 8.  Yes, they're still really references, but
they work like values.

So the ctor would probably become:

this(char[] tagName)
{
    this.tagName = tagName.idup; // Need an immutable copy
}

this(string tagName)
{
    this.tagName = tagName; // No copy needed
}

Incidentally, this also protects me from having people (ie: myself when
I'm not paying attention) go element.tagName[0] = 'x' or other silly
crap like that.

>>> It's all cost-benefit.  I'm seeing a lot of cost with little or dubious
>>> benefit.  So why should I be convinced that this is the right thing to
>>> do?  Why should I be willing to write D2.0 code?
>>
>> If you're happy with D 1.0, then stick with that (no, this isn't a "if u
>> dont leik it, get lost!" comment.)  If you want to use the other
>> features of D 2.0, start writing in it and point out where the const
>> system fails.  The example you gave at the start of this post is, in
>> fact, a prime case for *having* a const system, since you didn't even
>> realise you were making a mistake!
>>
> 
> Maybe I am stubborn, but that doesn't tell me that D needs a const
> system.  It tells me that the way string literals are handled is broken
> and platform dependent.

It's more like a bug in Windows :P  For instance, this cropped up in
#d.tango a while ago:

    char[] tmp = "000";
    Integer.format(tmp, cast(long) mode, Integer.Style.Octal);

What no one realised was that this piece of code actually *redefines*
the meaning of "000" every time it's run.  Here's a more obvious example:

    char[] tmp1 = "000";
    char[] tmp2 = "000";
    tmp1[1] = '!';

    writefln("tmp1: %s, tmp2: %s", tmp1, tmp2);

This outputs "tmp1: 0!0, tmp2: 0!0".  If you're unlucky, this will
manifest itself in your code as *really weird* errors that are
impossible to track down.

>> Me?  I'm going to wait until Walter unveils the changes to the const
>> system.  Arguing about something that's already been declared obsolete
>> is somewhat pointless :P
>>
>>     -- Daniel
> 
> While I am certainly venting a bit about this, I do want to resolve this
> somehow.  By "this" I don't mean constness, I mean my personal issues
> with constness.  Since const doesn't seem to be going away but rather
> just changing a bit, I am left with one other option: to make me, an
> angry const-hater, into someone who is not an angry const-hater.

I don't know if there's much I can say to convince you.  I think this is
something that, until you *need* it, just seems like a pain-in-the-arse
imposition.  *I* used to think that languages with const were just
trying to get in my way, until I started running into situations where
it actually made my code simpler.

	-- Daniel


More information about the Digitalmars-d-learn mailing list