Why Strings as Classes?

Thu Aug 28 15:37:47 PDT 2008

"Steven Schveighoffer" <schveiguy at yahoo.com> wrote in message 
news:g96vmq$3cc$1 at digitalmars.com...
> "Nick Sabalausky" wrote
>> "Steven Schveighoffer" wrote in message 
>> news:g9650h$cp9$1 at digitalmars.com...
>>> Perhaps it is a mistake to assume it, but it is a common mistake. And 
>>> the expectation is intuitive.
>>
>> Why in the world would any halfway competent programmer ever look at a 
>> *linked list* and assume that the linked lists's [] (if implemented) is 
>> O(1)?
>
> You are writing this function:
>
> void foo(IOrderedContainer cont)
> {
>    ....
> }
>
> IOrderedContainer implements opIndex(uint).  The problem is that you can't 
> tell whether the object itself is a list or not, so you are powerless to 
> make the decision as to whether the container has fast indexing.  In that 
> case, your only choice (if speed is an issue) is to not use opIndex.
>

Ok, so you want foo() to be able to tell if the collection has fast or slow 
indexing. What are you suggesting that foo() does when the collection does 
have slow indexing?

1. Should it fail to compile because foo's implementation uses [] and the 
slow-indexing collection doesn't implement []?

Well then how does foo know that it's the most important, most frequent 
thing being done on the collection? Suppose foo is something that needs to 
access elements in a somewhat random order, ie the kind of thing that lists 
are poorly suited for. Further suppose that collection C is some set of data 
that *usually* just gets insertions and deletions at nodes that the code 
already has direct references to. Further suppose that foo does *need* to be 
run on the collection, *but* very infrequently. So, should I *really* be 
forced to make C a collection that trades good insertion/deletion complexity 
for good indexing complexity, just because the occasionally-run foo() 
doesn't like it? And what if I want to run benchmarks to test what 
collection works best, in real-world use, for C? Should foo's intolerance of 
slow-indexing collections really be able force me to exclude testing of such 
collections?

2. Should foo revert to an alternate branch of code that doesn't use []?

This behavior can be implemented via interfaces like I described. The 
benefit of that is that [] can still serve as the shorthand it's intended 
for (see below) and you never need to introduce the inconsistency of "Gee, 
how do I get the Nth element of a collection?" "Well, on some collections 
it's getNth(), and on other collections it's []."

3. Something else?

>> As for the risk that could create of accidentially sending a linked list 
>> to a "search" (ie, a "search for an element which contains data X") that 
>> uses [] internally instead of iterators (but then, why wouldn't it just 
>> use iterators anyway?): I'll agree that in a case like this there should 
>> be some mechanism for automatic choosing of an algorithm, but that 
>> mechanism should be at a separate level of abstraction. There would be a 
>> function "search" that, through either RTTI or template constraints or 
>> something else, says "does collection 'c' implement 
>> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
>> collection have attribute ForewardDirectionIndexingComplexity that is set 
>> equal to Complexity.Constant?", and based on that passes control to 
>> either IndexingSearch or IteratorSearch.
>
> To me, this is a bad design.  It's my opinion, but one that is shared 
> among many people.  You can do stuff this way, but it is not intuitive. 
> I'd much rather reserve opIndex to only quick lookups, and avoid the 
> possibility of accidentally using it incorrectly.
>

Preventing a collection from ever being used in a function that would 
typically perform poorly on that collection just smacks of premature 
optimization. How do you, as the collection author, know that the collection 
will never be used in a way such that *occasional* use in certain specific 
sub-optimal a manner might actually be necessary and/or acceptable?

If you omit [] then you've burnt the bridge (so to speak) and your only 
recourse is to add a standardized "getNth()" to every single collection 
which clutters the interface, hinders integration with third-party 
collections and algorithms, and is likely to still suffer from idiots who 
think that "get Nth element" is always better than O(n) (see below).

> In general, I'd say if you are using lists and frequently looking up the 
> nth value in the list, you have chosen the wrong container for the job.
>

If you're frequently looking up random elements in a list, then yes, you're 
probably using the wrong container. But that's beside the point. Even if you 
only do it once: If you have a collection with a natural order, and you want 
to get the nth element, you should be able to use the standard "get element 
at index X" notation, [].

I don't care how many people go around using [] and thinking they're 
guaranteed to get a cheap computation from it. In a language that supports 
overloading of [], the [] means "get the element at key/index X". Especially 
in a language like D where using [] on an associative array can trigger an 
unbounded allocation and GC run. Using [] in D (and various other languages) 
can be expensive, period, even in the standard lib (assoc array). So looking 
at a [] and thinking "guaranteed cheap", is incorrect, period. If most 
people think 2+2=5, you're not going to redesign arithmetic to work around 
that mistaken assumption.

>>
>> When people look at '+' they typically think "integer/float addition". 
>> Why would, for example, the risk of mistaking an O(n) "big int" addition 
>> for an O(1) integer/float addition be any worse than the risk of 
>> mistaking an O(n) linked list "get element at index" for an O(1) array 
>> "get element at index"?
>
> What good are integers that can't be added?  In this case, it is not 
> possible to have quick addition, no matter how you implement your 
> arbitrary-precision integer.  I think the time penalty is understood and 
> accepted.  With opIndex, the time penalty is not expected.  Like it or 
> not, this is how many users look at it.
>
>>
>> If you've got a linked list, and you want to get element N, are you 
>> *really* going to go reaching for a function named "search"? How often do 
>> you really see a generic function named "search" or "find" that takes a 
>> numeric index as a the "to be found" parameter instead of something to be 
>> matched against the element's value? I would argue that that would be 
>> confusing for most people. Like I said in a different post farther down, 
>> the implementation of a "getAtIndex()" is obviously going to work like a 
>> search, but from "outside the box", what you're asking for is not the 
>> same.
>
> If you are indexing into a tree, it is considered a binary search, if you 
> are indexing into a hash, it is a search at some point to deal with 
> collisions.  People don't think about indexing as being a search, but in 
> reality it is.  A really fast search.
>

It's implemented as a search, but I'd argue that the input/output 
specifications are different. And yes, I suppose that does put it into a bit 
of a grey area. But I wouldn't go so far as to say that, to the caller, it's 
the same thing, because there are differences. If you want get an element 
based on it's position in the collection, you call one function. If you want 
to get an element based on it's content instead of it's position, that's 
another function. If you want to get the position of an element based on 
it's content or it's identity, that's one or two more functions (depending, 
of course, if the element is a value type or reference type, respectively).

> And I don't think search would be the name of the member function, it 
> should be something like 'getNth', which returns a cursor that points to 
> the element.
>

Right, and outside of pure C, [] is the shorthand for and the standardized 
name for "getNth". If someone automatically assumes [] to be a simple 
lookup, chances are they're going to make the same assumption about anything 
named along the lines of "getNth". After all, that's what [] does, it gets 
the Nth.