Why Strings as Classes?

Thu Aug 28 23:03:23 PDT 2008

"Nick Sabalausky" wrote
> Ok, so you want foo() to be able to tell if the collection has fast or 
> slow indexing. What are you suggesting that foo() does when the collection 
> does have slow indexing?

No, I don't want to be able to tell.  I don't want to HAVE to be able to 
tell.  In my ideal world, the collection does not implement opIndex unless 
it is fast, so there is no issue.  i.e. you cannot call foo with a linked 
list.

I'm really tired of this argument, you understand my point of view, I 
understand yours.  To you, the syntax sugar is more important than the 
complexity guarantees.  To me, what the syntax intuitively means should be 
what it does.  So I'll develop my collections library and you develop yours, 
fair enough?  I don't think either of us is right or wrong in the strict 
sense of the terms.

To be fair, I'll answer your other points as you took the time to write 
them.  And then I'm done.  I can't really be any clearer as to what I 
believe is the best design.

> 1. Should it fail to compile because foo's implementation uses [] and the 
> slow-indexing collection doesn't implement []?

No, foo will always compile because opIndex should always be fast, and then 
I can specify the complexity of foo without worry.

Using an O(n) lookup operation should be more painful because it requires 
more time.  It makes users use it less.

> 2. Should foo revert to an alternate branch of code that doesn't use []?
>
> This behavior can be implemented via interfaces like I described. The 
> benefit of that is that [] can still serve as the shorthand it's intended 
> for (see below) and you never need to introduce the inconsistency of "Gee, 
> how do I get the Nth element of a collection?" "Well, on some collections 
> it's getNth(), and on other collections it's []."

I believe that you shouldn't really ever be calling getNth on a link-list, 
and if you are, it should be a red flag, like a cast.

Furthermore [] isn't always equivalent to getNth, see below.

>>> As for the risk that could create of accidentially sending a linked list 
>>> to a "search" (ie, a "search for an element which contains data X") that 
>>> uses [] internally instead of iterators (but then, why wouldn't it just 
>>> use iterators anyway?): I'll agree that in a case like this there should 
>>> be some mechanism for automatic choosing of an algorithm, but that 
>>> mechanism should be at a separate level of abstraction. There would be a 
>>> function "search" that, through either RTTI or template constraints or 
>>> something else, says "does collection 'c' implement 
>>> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
>>> collection have attribute ForewardDirectionIndexingComplexity that is 
>>> set equal to Complexity.Constant?", and based on that passes control to 
>>> either IndexingSearch or IteratorSearch.
>>
>> To me, this is a bad design.  It's my opinion, but one that is shared 
>> among many people.  You can do stuff this way, but it is not intuitive. 
>> I'd much rather reserve opIndex to only quick lookups, and avoid the 
>> possibility of accidentally using it incorrectly.
>>
>
> Preventing a collection from ever being used in a function that would 
> typically perform poorly on that collection just smacks of premature 
> optimization. How do you, as the collection author, know that the 
> collection will never be used in a way such that *occasional* use in 
> certain specific sub-optimal a manner might actually be necessary and/or 
> acceptable?

It's not premature optimization, it's not offering a feature that has little 
or no use.  It's like any contract for any object, you only want to define 
the interface for which your object is designed.  A linked list should not 
have an opIndex because it's not designed to be indexed.

If I designed a new car with which you could steer each front wheel 
independently, would that make you buy it?  It's another feature that the 
car has that other cars don't.  Who cares if it's useful, its another 
*feature*!  Sometimes a good design is not that a feature is included but 
that a feature is *not* included.

> If you omit [] then you've burnt the bridge (so to speak) and your only 
> recourse is to add a standardized "getNth()" to every single collection 
> which clutters the interface, hinders integration with third-party 
> collections and algorithms, and is likely to still suffer from idiots who 
> think that "get Nth element" is always better than O(n) (see below).

I'd reserve getNth for linked lists only, if I implemented it at all.  It is 
a useless feature.  The only common feature for all containers should be 
iteration, because 'iterate next element' is always an O(1) operation 
(amortized in the case of trees).

>> In general, I'd say if you are using lists and frequently looking up the 
>> nth value in the list, you have chosen the wrong container for the job.
>>
>
> If you're frequently looking up random elements in a list, then yes, 
> you're probably using the wrong container. But that's beside the point. 
> Even if you only do it once: If you have a collection with a natural 
> order, and you want to get the nth element, you should be able to use the 
> standard "get element at index X" notation, [].

I respectfully disagree.  For the reasons I've stated above.

> I don't care how many people go around using [] and thinking they're 
> guaranteed to get a cheap computation from it. In a language that supports 
> overloading of [], the [] means "get the element at key/index X". 
> Especially in a language like D where using [] on an associative array can 
> trigger an unbounded allocation and GC run. Using [] in D (and various 
> other languages) can be expensive, period, even in the standard lib (assoc 
> array). So looking at a [] and thinking "guaranteed cheap", is incorrect, 
> period. If most people think 2+2=5, you're not going to redesign 
> arithmetic to work around that mistaken assumption.

Your assumption is that 'get the Nth element' is the only expectation for 
opIndex interface.  My assumption is that opIndex implies 'get an element 
efficiently' is an important part of the interface.  We obviously disagree, 
and as I said above, neither of us is right or wrong, strictly speaking. 
It's a matter of what is intuitive to you.

Part of the problems I see with many bad designs is the author thinks they 
see a fit for an interface, but it's not quite there.  They are so excited 
about fitting into an interface that they forget the importance of leaving 
out elements of the interface that don't make sense.  To me this is one of 
them.  An interface is a fit IMO if it fits exactly.  If you have to do 
things like implement functions that throw exceptions because they don't 
belong, or break the contract that the interface specifies, then either the 
interface is too specific, or you are not implementing the correct 
interface.

>>> If you've got a linked list, and you want to get element N, are you 
>>> *really* going to go reaching for a function named "search"? How often 
>>> do you really see a generic function named "search" or "find" that takes 
>>> a numeric index as a the "to be found" parameter instead of something to 
>>> be matched against the element's value? I would argue that that would be 
>>> confusing for most people. Like I said in a different post farther down, 
>>> the implementation of a "getAtIndex()" is obviously going to work like a 
>>> search, but from "outside the box", what you're asking for is not the 
>>> same.
>>
>> If you are indexing into a tree, it is considered a binary search, if you 
>> are indexing into a hash, it is a search at some point to deal with 
>> collisions.  People don't think about indexing as being a search, but in 
>> reality it is.  A really fast search.
>>
>
> It's implemented as a search, but I'd argue that the input/output 
> specifications are different. And yes, I suppose that does put it into a 
> bit of a grey area. But I wouldn't go so far as to say that, to the 
> caller, it's the same thing, because there are differences. If you want 
> get an element based on it's position in the collection, you call one 
> function. If you want to get an element based on it's content instead of 
> it's position, that's another function. If you want to get the position of 
> an element based on it's content or it's identity, that's one or two more 
> functions (depending, of course, if the element is a value type or 
> reference type, respectively).

I disagree.  I view the numeric index of an ordered container as a 'key' 
into the container.  A keyed container has the ability to look up elements 
quickly with the key.

Take a quick look at dcollections' ArrayList.  It implements the Keyed 
interface, with uint as the key.  I have no key for LinkList, because I 
don't see a useful key.

>> And I don't think search would be the name of the member function, it 
>> should be something like 'getNth', which returns a cursor that points to 
>> the element.
>>
>
> Right, and outside of pure C, [] is the shorthand for and the standardized 
> name for "getNth". If someone automatically assumes [] to be a simple 
> lookup, chances are they're going to make the same assumption about 
> anything named along the lines of "getNth". After all, that's what [] 
> does, it gets the Nth.

I view [] as "getByIndex", index being a value that offers quick access to 
elements.  There is no implied 'get the nth element'.  Look at an 
associative array.  If I had a string[string] array, what would you expect 
to get if you passed an integer as the index?

So good luck with your linked-list-should-be-indexed battle.  I shall not be 
posting on this again.

-Steve