Why Strings as Classes?

Tue Aug 26 21:19:09 PDT 2008

"superdan" <super at dan.org> wrote in message 
news:g92drj$h0u$1 at digitalmars.com...
> Nick Sabalausky Wrote:
>
>> "superdan" <super at dan.org> wrote in message
>> news:g921nb$2qqq$1 at digitalmars.com...
>> > Nick Sabalausky Wrote:
>> >
>> >> "Denis Koroskin" <2korden at gmail.com> wrote in message
>> >> news:op.ugie28dxo7cclz at proton.creatstudio.intranet...
>> >> > On Tue, 26 Aug 2008 23:29:33 +0400, superdan <super at dan.org> wrote:
>> >> > [snip]
>> >> >>> The D spec certainly doesn't make any guarantees about
>> >> >>> the time/memory complexity of opIndex; it's up to the implementing
>> >> >>> class
>> >> >>> to do so.
>> >> >>
>> >> >> it don't indeed. it should. that's a problem with the spec.
>> >> >>
>> >> >
>> >> > I agree. You can't rely on function invokation, i.e. the following
>> >> > might
>> >> > be slow as death:
>> >> >
>> >> > auto n = collection.at(i);
>> >> > auto len = collection.length();
>> >> >
>> >> > but index operations and properties getters should be real-time and
>> >> > have
>> >> > O(1) complexity by design.
>> >> >
>> >> > auto n = collection[i];
>> >> > auto len = collection.length;
>> >>
>> >> I disagree. That strategy strikes me as a very clear example of 
>> >> breaking
>> >> encapsulation by having implementation details dictate certain aspects 
>> >> of
>> >> the API. At the very least, that will make the API overly rigid,
>> >> hindering
>> >> future changes that could otherwise have been non-breaking,
>> >> behind-the-scenes changes.
>> >
>> > take this:
>> >
>> > auto c = new Customer;
>> > c.loadCustomerInfo("123-12-1234");
>> >
>> > that's all cool. there's no performance guarantee other than some best
>> > effort kinda thing `we won't sabotage things'. if you come with faster 
>> > or
>> > slower methods to load customers, no problem. coz noone assumes any.
>> >
>> > now take sort. sort says. my input is a range that supports indexing 
>> > and
>> > swapping independent of the range size. if you don't have that just let 
>> > me
>> > know and i'll use a totally different method. just don't pretend.
>> >
>>
>> Choosing a sort method is a separate task from the actual sorting.
>
> thot you were the one big on abstraction and encapsulation and all those 
> good things. as a user i want to sort stuff. i let the library choose 
> what's best for the collection at hand.
>
> sort(stuff)
> {
>    1) figure out best algo for stuff
>    2) have at it
> }
>
> i don't want to make that decision outside. for example i like one sort 
> routine. not quicksort, heapsort, quicksort_with_median_of_5, or god 
> forbid bubblesort.
>

I never said that shouldn't be available. In fact, I did say it should be 
there. But just not forced.

>> Any sort
>> that expects to be able to index will still work correctly if given a
>> collection that has O(n) or worse indexing, just as it will still work
>> correctly if given an O(n) or worse comparison delegate.
>
> i disagree but now that christopher gave me a black eye guess i have to 
> shut up.
>

That's not a matter of agreeing or disagreeing, it's a verifiable fact. Grab 
a working sort function that operates on collection classes that implement 
indexing-syntax and a length property, feed it an unsorted linked list that 
has opIndex overloaded to return the nth node and a proper length property, 
and when it returns, the list will be sorted.

Or are you maybe talking about a sort function that's parameterized 
specifically to take an "array" instead of "a collection that implements 
opIndex and a length property"? Because that might make a difference 
depending on the language (not sure about D offhand).

>> It's up to the
>> caller of the sort function to know that an O(n log n) sort (for 
>> instance)
>> is only O(n log n) if the indexing and comparison are both O(1). And then
>> it's up to them to decide if they still want to send it a linked list or 
>> a
>> complex comparison. The sort shouldn't crap out at compile-time (or 
>> runtime)
>> just because some novice might not know that doing a generalized bubble 
>> sort
>> on a linked list scales poorly.
>
> it should coz there's an obvious good choice. there's no good tradeoff in 
> that. there never will be a case to call the bad sort on the bad range.
>

No matter what type of collection you're using, the "best sort" is still 
going to vary depending on factors like the number of elements to be sorted, 
whether duplicates might exist, how close the collection is to either 
perfectly sorted, perfectly backwards or totally random, how likely it is to 
be random/sorted/backwards at any given time, etc. And then there can be 
different variations of the same basic algorithm that can be better or worse 
for certain scenarios. And then there's the issue of how does the 
algorithm-choosing sort handle user-created collections, if at all.

>> If you want automatic choosing of an appropriate sort algorithm (which is
>> certainly a good thing to have), then that can be done at a separate,
>> optional, level of abstraction using function overloading, template
>> specialization, RTTI, etc. That way you're not imposing arbitrary
>> restrictions on anyone.
>
> i think u missed a lil point i was making.
>
> All collections: implement nth()
> Indexable collections: implement opIndex
>
> is all. there is no restriction. just use nth.
>

If you want opIndex to be reserved for highly scalable indexing, then I can 
see how that would lead to what you describe here. But I'm in the camp that 
feels opIndex means "indexing", not "cheap/scalable indexing", in which case 
it becomes unnecessary to also expose the separate "nth()" function.

>> > with that indexing and swapping complexity ain't implementation detail.
>> > they're part of the spec. guess stepanov's main contribution was to
>> > clarify that.
>> >
>>
>> When I called indexing an implementation detail, I was referring to the
>> collection itself. The method of indexing *is* an implementation detail 
>> of
>> the collection.
>
> not when it gets composed in higher level ops.
>

It's not? If a collection's indexing isn't implemented by the collection's 
own class (or the equivalent functions in non-OO), then where is it 
implemented? Don't tell me it's the sort function, because I know that I'm 
not calling a sort function every time I say "collection[i]". The method of 
indexing is implemented by the collection class, therefore, it's an 
implementation detail of that method/class, not the functions that call it. 
Claiming otherwise is like saying that all of the inner working of printf() 
are implementation details of main().

>> It should not be considered an implementation detail of the
>> sort algorithm since it's encapsulated in the collection and thus hidden
>> away from the sort algorithm. If you make a rule that collections with 
>> cheap
>> indexing are indexed via opIndex and collections with expensive indexing 
>> are
>> indexed via a function, then you've just defined the API in terms of the
>> collection's implementation (and introduced an unnecessary inconsistency
>> into the API).
>
> no. there is consistency. nth() is consistent across - o(n) or better 
> indexing.
>
> i think u have it wrong when u think of "cheap" as if it were "fewer 
> machine instructions". no. it's about asymptotic complexity and that does 
> matter.
>

Since we're talking about algorithmic complexity, I figured "cheap" and 
"expensive" would be understood as being intended in the same sense. So yes, 
I'm well aware of that.

>> If a sort function is desired that only accepts collections with O(1)
>> indexing, then that can be accomplished at a higher level of abstraction
>> (using function overloading, RTTI, etc.) without getting in the way when
>> such a guarantee is not needed.
>
> exactly. nth() and opIndex() fit the bill. what's there not to love?
>

The "nth()" deviates from standard indexing syntax. I consider 
"collection[i]" to mean "indexing", not "low-complexity indexing".

>> >> For realtime code, I can see the benefit to what you're saying. 
>> >> Although
>> >> in
>> >> many cases only part of a program needs to be realtime, and for the 
>> >> rest
>> >> of
>> >> the program's code I'd hate to have to sacrifice the encapsulation
>> >> benefits.
>> >
>> > realtime has nothin' to do with it.
>>
>> For code that needs to run in realtime, I agree with Denis Koroskin that 
>> it
>> could be helpful to be able to look at a piece of code and have some sort 
>> of
>> guarantee that there is no behind-the-scenes overloading going on that is
>> any more complex than the operators' default behaviors. But for code that
>> doesn't need to finish within a maximum amount of time, that becomes less
>> important and the encapsulation/syntactic-consistency gained from the use 
>> of
>> such things becomes a more worthy pursuit. That's what I was saying about
>> realtime.
>
> i disagree but am in a rush now. guess i can't convince u.
>
>> > encapsulation ain't broken by making complexity part of the reqs. any 
>> > more
>> > than any req ain't breakin' encapsulation. if it looks like a problem 
>> > then
>> > encapsulation was misdesigned and needs change.
>> >
>> > case in point. all containers should provide 'nth' say is it's o(n) or
>> > better. then there's a subclass of container that is indexed_container.
>> > that provides opIndex and says it's o(log n) or better. it also 
>> > provides
>> > 'nth' by just forwarding to opIndex. faster than o(n) ain't a problem.
>> >
>> > but forcing a list to blurt something for opIndex - that's just bad
>> > design.
>>
>> I agree that not all collections should implement an opIndex. Anything
>> without a natural sequence or mapping should lack opIndex (such as a tree 
>> or
>> graph). But forcing the user of a collection that *does* have a natural
>> sequence (like a linked list) to use function-call-syntax instead of
>> standard indexing-syntax just because the collection is implemented in a 
>> way
>> that causes indexing to be less scalable than other collections - that's 
>> bad
>> design.
>
> no. it's great design. because it's not lyin'. you want o(1) indexing you 
> say a[n]. you are ok with o(n) indexing you say a.nth(n). this is how 
> generic code works, with consistent notation. not with lyin'.
>

Number of different ways to index a collection:
One way: Consistent
Two ways: Not consistent

>> The way I see it, "group[n]" means "Get the nth element of group". Not 
>> "Get
>> the element at location group.ptr + (n * sizeof(group_base_type)) or
>> something else that's just as scalable."
>
> no need to get that low. just say o(1) and understand o(1) has nothing to 
> do with the count of assembler ops.
>

Here:

"group.ptr + (n * sizeof(group_base_type))..."

One multiplication, one addition, one memory read, no loops, no recursion: 
O(1)

"...or something else that's just as scalable"

Something that's just as scalable as O(1) must be O(1). So yes, that's what 
I said.

>> In plain C, those are one and the
>> same. But when you start talking about generic collections, encapsulation
>> and interface versus implementation, they are very different: the former 
>> is
>> interface, the latter is implementation.
>
> so now would u say stl has a poor design? because it's all about stuff 
> that you consider badly designed.

I abandoned C++ back when STL was still fairly new, so I can't honestly say. 
I seem to remember C++ having some trouble with certain newer language 
concepts, it might be that STL is the best than can be reasonably done given 
the drawbacks of C++. Or there might be room for improvement.