Why Strings as Classes?

Wed Aug 27 08:44:18 PDT 2008

Dee Girl wrote:

>> What if you maintain a linked list of small arrays? Say each node in the
>> list contains around log(n) of the elements in the entire list. Wouldn't
>> that bring random access down to O(log n)? Of course, this would also
>> bring insertions up to O(log n).
>> 
>> And what if you save an index/pointer pair after each access. Then with
>> each new access request, you can choose from three locations to start
>> walking: * The start of the list.
>> * The end of the list.
>> * The last access-point of the list.
>> 
>> In a lot of practical cases a new access is close to the last access. Of
>> course, the general case would still be O(n).
> 
> Michiel-san, this is new data structure very different from list! If I
> want list I never use this structure. It is like joking. Because when you
> write this you agree that list is not vector.

Yes, the first 'trick' makes it a different datastructure. The second does
not. Would you still be opposed to using opIndex if its time-complexity is
O(log n)?

>> That's simple. a[i] looks much nicer than a.nth(i).
> 
> It is not nicer. It is more deceiving (correct spell?). If you look at
> code it looks like array code.
> 
> foreach (i; 0 .. a.length)
> {
>     a[i] += 1;
> }
> 
> For array works nice. But for list it is terrible! Many operations for
> incrementing only small list.

With that second trick the loop would have the same complexity for lists.

But putting that aside for the moment, are you saying you would allow
yourself to be deceived by a syntax detail? No, mentally attaching O(1) to
the *subscripting operator* is simply a legacy from C, where it is
syntactic sugar for pointer arithmetic.

>> By the way, I suspect that if opIndex is available only on arrays and
>> nth() is available on all sequence types, algorithm writers will forget
>> about opIndex and use nth(), to make their algorithm more widely
>> compatible. And I wouldn't blame them, though I guess you would.
> 
> I do not agree with this. I am sorry! I think nobody should write find()
> that uses nth().

Of course not. Find should be written with an iterator, which has optimal
complexity for both data-structures. My point is that an algorithm should
be generic first and foremost. Then you use the operations that have the
lowest complexity over all targeted data-structures if possible.

>> >> * A list has faster insertions and growth.
>> > 
>> > o(1) insertion if u have the insertion point.  both list and vector
>> > have o(1) growth.
>> 
>> Yeah, but dynamic arrays have to re-allocate once in a while. Lists
>> don't.
> 
> Lists allocate memory each insert. Array allocate memory some time. With
> doubling cost of allocation+copy converge to zero.

Lists allocate memory for bare nodes, but never have to copy their elements.
Arrays have to move their whole content to a larger memory location each
time they are outgrown. For more complex data-types that means potentially
very expensive copies.

>> But if time-complexity is not taken into consideration, the
>> sequential-access interface and the random-access interface are
>> equivalent, no?
> 
> I think it is mistake to not taken into consideration time complexity. For
> basic data structures specially. I do not think you can call them
> equivalent.

I did say 'if'. You have to agree that if you disregard complexity issues
(for the sake of argument), the two ARE equivalent.

>> I do think it's a good idea for algorithms to support the interface with
>> the weakest constraints (sequential-access). As long as they specify
>> their time-complexity in terms of the complexities of the interface
>> operations, not in absolute terms.
>> 
>> Then, when a programmer writes 'somealgorithm(someinterfaceobject)', a
>> hypothetical analysis tool could tell him the time-complexity of the the
>> resulting operation. The programmer might even assert a worst-case
>> complexity at that point and the compiler could bail out if it doesn't
>> match.
> 
> The specification I think is with types. If that works tool is the
> compiler.

But don't you understand that if this tool did exist, and the language had a
standard notation for time/space-complexity, I could simply write:

sequence<T> s;
/* fill sequence */
sort(s);

And the compiler (in cooperation with this 'tool') could automatically find
the most effective combination of data-structure and algorithm. The code
would be more readable and efficient.

-- 
Michiel