Why Strings as Classes?

Christopher Wright dhasenan at gmail.com
Fri Aug 29 17:34:26 PDT 2008


Don wrote:
> Steven Schveighoffer wrote:
>> "Nick Sabalausky" wrote
>>> "Don" <nospam at nospam.com.au> wrote in message 
>>> news:g95td3$2tu0$1 at digitalmars.com...
>>>> Nick Sabalausky wrote:
>>>>> "Don" <nospam at nospam.com.au> wrote in message 
>>>>> news:g95ks5$2aon$1 at digitalmars.com...
>>>>>> Nick Sabalausky wrote:
>>>>>>> "Dee Girl" <deegirl at noreply.com> wrote in message 
>>>>>>> news:g94j7a$2875$1 at digitalmars.com...
>>>>>>>> I appreciate your view point. Please allow me explain. The view 
>>>>>>>> point is in opposition with STL. In STL each algorithm defines 
>>>>>>>> what kind of iterator it operates with. And it requires what 
>>>>>>>> iterator complexity.
>>>>>>>>
>>>>>>>> I agree that other design can be made. But STL has that design. 
>>>>>>>> In my opinion is much part of what make STL so successful.
>>>>>>>>
>>>>>>>> I disagree that algorithm that knows complexity of iterator is 
>>>>>>>> concrete. I think exactly contrary. Maybe it is good that you 
>>>>>>>> read book about STL by Josuttis. STL algorithms are the most 
>>>>>>>> generic I ever find in any language. I hope std.algorithm in D 
>>>>>>>> will be better. But right now std.algorithm works only with array.
>>>>>>>>
>>>>>>>>> If an algoritm uses [] and doesn't know the
>>>>>>>>> complexity of the []...good! It shouldn't know, and it 
>>>>>>>>> shouldn't care. It's
>>>>>>>>> the code that sends the collection to the algoritm that knows 
>>>>>>>>> and cares.
>>>>>>>> I think this is mistake. Algorithm should know. Otherwise 
>>>>>>>> "linear find" is not "linear find"! It is "cuadratic find" 
>>>>>>>> (spell?). If you want to define something called linear find 
>>>>>>>> then you must know iterator complexity.
>>>>>>>>
>>>>>>> If a generic algorithm describes itself as "linear find" then I 
>>>>>>> know damn well that it's referring to the behavior of *just* the 
>>>>>>> function itself, and is not a statement that the function 
>>>>>>> *combined* with the behavior of the collection and/or a custom 
>>>>>>> comparison is always going to be O(n).
>>>>>>>
>>>>>>> A question about STL: If I create a collection that, internally, 
>>>>>>> is like a linked list, but starts each indexing operation from 
>>>>>>> the position of the last indexing operation (so that a "find 
>>>>>>> first" would run in O(n) instead of O(n*n)), is it possible to 
>>>>>>> send that collection to STL's generic "linear find first"? I 
>>>>>>> would argue that it should somehow be possible *even* if the 
>>>>>>> STL's generic "linear find first" guarantees a *total* 
>>>>>>> performance of O(n) (Since, in this case, it would still be O(n) 
>>>>>>> anyway). Because otherwise, the STL wouldn't be very extendable, 
>>>>>>> which would be a bad thing for a library of "generic" algorithms.
>>>>>> Yes, it will work.
>>>>>>
>>>>>>> Another STL question: It is possible to use STL to do a "linear 
>>>>>>> find" using a custom comparison? If so, it is possible to make 
>>>>>>> STL's "linear find" function use a comparison that just happens 
>>>>>>> to be O(n)? If so, doesn't that violate the linear-time 
>>>>>>> guarantee, too? If not, how does it know that the custom 
>>>>>>> comparison is O(n) instead of O(1) or O(log n)?
>>>>>> This will work too.
>>>>>>
>>>>>> IF you follow the conventions THEN the STL gives you the guarantees.
>>>>> I'm not sure that's really a "guarantee" per se, but that's 
>>>>> splitting hairs.
>>>>>
>>>>> In any case, it sounds like we're all arguing more or less the same 
>>>>> point:
>>>>>
>>>>> Setting aside the issue of "should opIndex be used and when?", 
>>>>> suppose I have the following collection interface and find function 
>>>>> (not guaranteed to compile):
>>>>>
>>>>> interface ICollection(T)
>>>>> {
>>>>>     T getElement(index);
>>>>>     int getSize();
>>>>> }
>>>>>
>>>>> int find(T)(ICollection(T) c, T elem)
>>>>> {
>>>>>     for(int i=0; i<c.size(); i++)
>>>>>     {
>>>>>  if(c.getElement(i) == elem)
>>>>>             return i;
>>>>>     }
>>>>> }
>>>>>
>>>>> It sounds like STL's approach is to do something roughly like that 
>>>>> and say:
>>>>>
>>>>> "find()'s parameter 'c' should be an ICollection for which 
>>>>> getElement() is O(1), in which case find() is guaranteed to be O(n)"
>>>>>
>>>>> What I've been advocating is, again, doing something like the code 
>>>>> above and saying:
>>>>>
>>>>> "find()'s complexity is dependant on the complexity of the 
>>>>> ICollection's getElement(). If getElement()'s complexity is O(m), 
>>>>> then find()'s complexity is guaranteed to be O(m * n). Of course, 
>>>>> this means that the only way to get ideal complexity from find() is 
>>>>> to use an ICollection for which getElement() is O(1)".
>>>>>
>>>>> But, you see, those two statements are effectively equivilent.
>>>> They are. But...
>>>> if you don't adhere to the conventions, your code gets really hard 
>>>> to reason about.
>>>>
>>>> "This class has an opIndex which is in O(n). Is that OK?" Well, that 
>>>> depends on what it's being used for. So you have to look at all of 
>>>> the places where it is used.
>>>>
>>>> It's much simpler to use the convention that opIndex _must_ be fast; 
>>>> this way the performance requirements for containers and algorithms 
>>>> are completely decoupled from each other. It's about good design.
>>>>
>>> Taking a slight detour, let me ask you this... Which of the following 
>>> strategies do you consider to be better:
>>>
>>> //-- A --
>>> value = 0;
>>> for(int i=1; i<=10; i++)
>>> {
>>>    value += i*2;
>>> }
>>>
>>> //-- B --
>>> value = sum(map(1..10, {n * 2}));
>>>
>>> Both strategies compute the sum of the first 10 multiples of 2.
>>>
>>> Strategy A makes the low-level implementation details very clear, but 
>>> IMO, it comes at the expense of high-level clarity. This is because 
>>> the code intermixes the high-level "what I want to accomplish?" with 
>>> the low-level details.
>>>
>>> Strategy B much more closely resembles the high-level desired result, 
>>> and thus makes the high-level intent more clear. But this comes at 
>>> the cost of hiding the low-level details behind a layer of abstraction.
>>>
>>> I may very well be wrong on this, but from what you've said it sounds 
>>> like you (as well as the other people who prefer [] to never be O(n)) 
>>> are the type of coder who would prefer "Strategy A". In that case, I 
>>> can completely understand your viewpoint on opIndex, even though I 
>>> don't agree with it (I'm a "Strategy B" kind of person).
>>>
>>> Of course, if I'm wrong on that assumption, then we're back to square 
>>> one ;)
>>
>> For me at least, you are wrong :)  In fact, I view it the other way, 
>> you shouldn't have to care about the underlying implementation, as 
>> long as the runtime is well defined.  If you tell me strategy B may or 
>> may not take up to O(n^2) to compute, then you bet your ass I'm not 
>> going to even touch option B, 'cause I can always get O(n) time with 
>> option A :)  Your solution FORCES me to care about the details, it's 
>> not so much that I want to care about them.
> 
> I agree.  It's about _which_ details do you want to abstract away. I 
> don't care about the internals. But I _do_ care about the complexity of 
> them.

We all agree about this. What we disagree about is how to find out about 
the complexity of an operation -- by whether it overloads an operator or 
by some metadata.

In terms of code, the difference is:
/* Operator overloading */
void foo(T)(T collection)
{
	static if (is (typeof (T[0]))) { ... }
}

/* Metadata */
void foo(T)(ICollection!(T) collection)
{
	if ((cast(FastIndexedCollection)collection) !is null) { ... }
}


You do need a metadata solution, whichever you choose. Otherwise you 
can't differentiate at runtime.



More information about the Digitalmars-d mailing list