We need better documentation for functions with ranges and templates
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Tue Dec 15 06:08:18 PST 2015
On 12/15/15 9:03 AM, rumbu wrote:
> On Tuesday, 15 December 2015 at 12:28:02 UTC, ZombineDev wrote:
>> On Tuesday, 15 December 2015 at 11:26:04 UTC, rumbu wrote:
>>>
>>> Looking at the .net source code, the Count extension method is also
>>> doing a best effort "count" by querying the ICollection interface.
>>
>> Yes, I have looked at the source code, before writing this, so I knew
>> exactly how it worked. In short : terrible, because it relies only on
>> OOP. But that's not the point. Why should anyone need to look at the
>> source code, to see what this function does? I thought this is what
>> the docs were supposed to tell.
>>
>>>
>>> public static int Count<TSource>(this IEnumerable<TSource> [...]
>>>
>>> The Remarks section clearly states the same thing:
>>>
>>> "If the type of source implements ICollection<T>, that implementation
>>> is used to obtain the count of elements. Otherwise, this method
>>> determines the count."
>>>
>>>
>>> And personally, I found the MS remark more compact and more user
>>> friendly than:
>>> [...]
>>
>> If you look at table at the beginning of page
>> (https://dlang.org/phobos/std_range_primitives.html) you can clearly
>> see a nice concise description of the function. Even if you don't know
>> complexity theory there's the word "Compute" which should give you an
>> idea that the function performs some non-trivial amount of work. Unlike:
>>
>>> Returns the number of elements in a sequence.
>>
>> Which implies that it only returns a number - almost like an ordinary
>> getter property. I am scared to think that if back then C# got
>> extension properties, it might have been implemented as such.
>>
>>> Not everybody is licensed in computational complexity theory to
>>> understand what O(n) means.
>>
>> LOL. Personally, I would never want to use any software written by a
>> programmer, who can't tell the difference.
>>
>> Well ok, let's consider a novice programmer who hasn't studied yet
>> complexity theory.
>>
>> Option A: They look at the documentation and see there's some strange
>> O(n) thing that they don't know. They look it up in google and find
>> the wonderful world of complexity theory. They become more educated
>> and are grateful the people who wrote the documentation for describing
>> more accurately the requirements of the function. That way they can
>> easily decide how using such function would impact the performance of
>> their system.
>>
>> Option B: They look at the documentation and see that there's some
>> strange O(n) thing that they don't know. They decide that it's
>> extremely inhumane for the docs to expect such significant knowledge
>> from the reader and they decide to quit. Such novices that do not want
>> to learn are better off choosing a different profession, than
>> inflicting their poor written software on the world.
>
> We are talking about a better documentation, not about the C# vs D
> performance, we already know the winner. Since C# is an OOP-only
> language, there is only one way to do reflection - using OOP,
> (voluntarily ignoring the fact that NGen will reduce this call to a
> simple memory read in case of arrays).
>
> Your affirmation:
>
>> the docs don't even bother to mention that it is almost always O(n),
>> because non of the > Enumerable extention methods preserve the
>> underlying ICollection interace
>
> was false and you don't need to look to the source code to find out, the
> Remarks section is self-explanatory:
>
> "If the type of source implements ICollection<T>, that implementation is
> used to obtain the count of elements. Otherwise, this method determines
> the count."
>
> This is a *good* documentation:
> - "Count" is a better name than "walkLength"; every other programming
> language will use concepts similar to count, cnt, length, len.
> - You don't need to understand computer science terms to find out what a
> function does;
> - If you are really interested about more than finding out the number of
> elements, there is a performance hint in the Remarks section.
> - Links are provided to concepts: even the return type (int) has a link.
> - It clearly states what's happening if the range is not defined
> - It clearly states what's happening if the range contains more than
> int.max elements
>
> On the contrary, the D documentation, introduces a bunch of non-linked
> concepts, but it tells me that it's possible to perform O(n) evaluations:
> - isInputRange
> - isInfiniteRange
> - hasLength
> - empty
> - popFront
>
> There is no indication what happens if the range is undefined in D docs.
> In fact, inconsistent behavior:
> - it will return 0 in case of null arrays;
> - it will throw AccessViolation for null ranges (or probably segfault on
> Linux);
>
> There is no indication what happens if the range contains more than
> size_t.max elements:
> - integer overflow;
This is a great collection of clear points to improve. Thanks!
Andrei
More information about the Digitalmars-d
mailing list