Why Strings as Classes?

Steven Schveighoffer schveiguy at yahoo.com
Thu Aug 28 12:53:00 PDT 2008


"Nick Sabalausky" wrote
> "Steven Schveighoffer" wrote in message 
> news:g9650h$cp9$1 at digitalmars.com...
>> "Nick Sabalausky" wrote
>>> "Steven Schveighoffer" wrote in message 
>>> news:g95b89$1jii$1 at digitalmars.com...
>>>>
>>>> When someone sees an index operator the first thought is that it is a 
>>>> quick lookup.
>>>
>>> This seems to be a big part of the disagreement. Personally, I think 
>>> it's insane to look at [] and just assume it's a cheap lookup. That's 
>>> was true in pure C where, as someone else said, it was nothing more than 
>>> a shorthand for a specific pointer arithmentic operation, but carrying 
>>> that idea over to more modern languages (especially one that also uses 
>>> [] for associative arrays) is a big mistake.
>>
>> Perhaps it is a mistake to assume it, but it is a common mistake. And the 
>> expectation is intuitive.
>
> Why in the world would any halfway competent programmer ever look at a 
> *linked list* and assume that the linked lists's [] (if implemented) is 
> O(1)?

You are writing this function:

void foo(IOrderedContainer cont)
{
    ....
}

IOrderedContainer implements opIndex(uint).  The problem is that you can't 
tell whether the object itself is a list or not, so you are powerless to 
make the decision as to whether the container has fast indexing.  In that 
case, your only choice (if speed is an issue) is to not use opIndex.

> As for the risk that could create of accidentially sending a linked list 
> to a "search" (ie, a "search for an element which contains data X") that 
> uses [] internally instead of iterators (but then, why wouldn't it just 
> use iterators anyway?): I'll agree that in a case like this there should 
> be some mechanism for automatic choosing of an algorithm, but that 
> mechanism should be at a separate level of abstraction. There would be a 
> function "search" that, through either RTTI or template constraints or 
> something else, says "does collection 'c' implement 
> ConstantTimeForewardDirectionIndexing?" or better yet IMO "does the 
> collection have attribute ForewardDirectionIndexingComplexity that is set 
> equal to Complexity.Constant?", and based on that passes control to either 
> IndexingSearch or IteratorSearch.

To me, this is a bad design.  It's my opinion, but one that is shared among 
many people.  You can do stuff this way, but it is not intuitive.  I'd much 
rather reserve opIndex to only quick lookups, and avoid the possibility of 
accidentally using it incorrectly.

In general, I'd say if you are using lists and frequently looking up the nth 
value in the list, you have chosen the wrong container for the job.

>>  You don't see people making light switches that look like outlets, even 
>> though it's possible.  You might perhaps make a library where opIndex is 
>> a linear search in your list, but I would expect that people would not 
>> use that indexing feature correctly.  Just as if I plug my lamp into the 
>> light switch that looks like an outlet, I'd expect it to get power, and 
>> be confused when it doesn't.  Except the opIndex mistake is more subtle 
>> because I *do* get what I actually want, but I just am not realizing the 
>> cost of it.
>>
>>> The '+' operator means "add". Addition is typically O(1). But vectors 
>>> can be added, and that's an O(n) operation. Should opAdd never be used 
>>> for vectors?
>>
>> As long as it's addition, I have no problem with O(n) operation (and it 
>> depends on your view of n).  Indexing implies speed, look at all other 
>> cases where indexing is used.  For addition to be proportional to the 
>> size of the element is natural and expected.
>>
>
> When people look at '+' they typically think "integer/float addition". Why 
> would, for example, the risk of mistaking an O(n) "big int" addition for 
> an O(1) integer/float addition be any worse than the risk of mistaking an 
> O(n) linked list "get element at index" for an O(1) array "get element at 
> index"?

What good are integers that can't be added?  In this case, it is not 
possible to have quick addition, no matter how you implement your 
arbitrary-precision integer.  I think the time penalty is understood and 
accepted.  With opIndex, the time penalty is not expected.  Like it or not, 
this is how many users look at it.

>>>> You can force yourself to think differently, but the reality is that 
>>>> most people think that because of the universal usage of square 
>>>> brackets (except for VB, and I feel pity for anyone who needs to use 
>>>> VB) to mean 'lookup by key', and usually this is only useful on objects 
>>>> where the lookup is quick ( < O(n) ).  Although there is no 
>>>> requirement, nor enforcement, the 'quick' contract is expected by the 
>>>> user, no matter how much docs you throw at them.
>>>>
>>>> Look, for instance, at Tango's now-deprecated LinkMap, which uses a 
>>>> linked-list of key/value pairs (copied from Doug Lea's implementation). 
>>>> Nobody in their right mind would use link map because lookups are O(n), 
>>>> and it's just as easy to use a TreeMap or HashMap.  Would you ever use 
>>>> it?
>>>>
>>>>> If you *really* need that sort of guarantee (and I can imagine it may 
>>>>> be useful in some cases), then the implementation of the guarantee 
>>>>> does *not* belong in the realm of "implements vs doesn't-implement a 
>>>>> particular operator overload". Doing so is an abuse of operator 
>>>>> overloading, since operator overloading is there for defining 
>>>>> syntactic sugar, not for acting as a makeshift contract.
>>>>
>>>> I don't think anybody is asking for a guarantee from the compiler or 
>>>> any specific tool.  I think what we are saying is that violating the 
>>>> 'opIndex is fast' notion is bad design because you end up with users 
>>>> thinking they are doing something that's quick.  You end up with people 
>>>> posting benchmarks on your containers saying 'why does python beat the 
>>>> pants off your list implementation?'.  You can say 'hey, it's not meant 
>>>> to be used that way', but then why can the user use it that way?  A 
>>>> better design is to nudge the user into using a correct container for 
>>>> the job by only supporting operations that make sense on the 
>>>> collections.
>>>>
>>>> And as far as operator semantic meaning, D's operators are purposely 
>>>> named after what they are supposed to do.  Notice that the operator for 
>>>> + is opAdd, not opPlus.  This is because opAdd is supposed to mean you 
>>>> are performing an addition operation.  Assigning a different semantic 
>>>> meaning is not disallowed by the compiler, but is considered bad 
>>>> design. opIndex is supposed to be an index function, not a linear 
>>>> search.  It's not called opSearch for a reason.  Sure you can redefine 
>>>> it however you want semantically, but it's considered bad design. 
>>>> That's all we're saying.
>>>>
>>>
>>> Nobody is suggesting using [] to invoke a search (Although we have 
>>> talked about using [] *in the search function's implementation*). Search 
>>> means you want to get the position of a given element, or in other 
>>> words, "element" -> search -> "key/index". What we're talking about is 
>>> the reverse: getting the element at a given position, ie, "key/index" -> 
>>> [] -> "element". It doesn't matter if it's an array, a linked list, or a 
>>> super-duper-collection-from-2092: that's still indexing, not searching.
>>
>> It is a search, you are searching for the element whose key matches. 
>> When the key can be used to lookup the element efficiently, we call that 
>> indexing.  Indexing is a more constrained form of searching.
>>
>
> If you've got a linked list, and you want to get element N, are you 
> *really* going to go reaching for a function named "search"? How often do 
> you really see a generic function named "search" or "find" that takes a 
> numeric index as a the "to be found" parameter instead of something to be 
> matched against the element's value? I would argue that that would be 
> confusing for most people. Like I said in a different post farther down, 
> the implementation of a "getAtIndex()" is obviously going to work like a 
> search, but from "outside the box", what you're asking for is not the 
> same.

If you are indexing into a tree, it is considered a binary search, if you 
are indexing into a hash, it is a search at some point to deal with 
collisions.  People don't think about indexing as being a search, but in 
reality it is.  A really fast search.

And I don't think search would be the name of the member function, it should 
be something like 'getNth', which returns a cursor that points to the 
element.

-Steve 





More information about the Digitalmars-d mailing list