We need better documentation for functions with ranges and templates

ZombineDev via Digitalmars-d digitalmars-d at puremagic.com
Tue Dec 15 07:55:31 PST 2015


On Tuesday, 15 December 2015 at 14:03:50 UTC, rumbu wrote:
>
> We are talking about a better documentation, not about the C# 
> vs D performance, we already know the winner. Since C# is an 
> OOP-only language, there is only one way to do reflection - 
> using OOP, (voluntarily ignoring the fact that NGen will reduce 
> this call to a simple memory read in case of arrays).
>
> Your affirmation:
>
>> the docs don't even bother to mention that it is almost always 
>> O(n), because non of the Enumerable extention methods preserve 
>> the underlying ICollection interace
>
> was false and you don't need to look to the source code to find 
> out, the Remarks section is self-explanatory:
>
> "If the type of source implements ICollection<T>, that 
> implementation is used to obtain the count of elements. 
> Otherwise, this method determines the count."

Sorry, I do not know how to make this clear:

NONE OF THE System.Linq.Enumerable EXTENSION METHODS PRESERVE THE 
STRUCTURE (THE INTERFACES THEY IMPLEMENT) OF THE SEQUENCE THEY 
ARE OPERATING ON. DON'T BELIEVE THAT NGEN WILL AUTOMAGICALLY MAKE 
YOUR CODE FASTER. IT WILL NOT. UNICORNS DO NOT EXISTS. AT LEAST 
THEY DO NOT IN .NET OR JAVA. DO NOT BLINDLY BELIEVE. TEST!

See for yourself: https://ideone.com/L5FatQ

This what I got on my machine:
00:00:00.0011011 for N = 1 -> 1, List<int>.Select(..).Count()
00:00:00.0000017 for N = 2 -> 2, List<int>.Select(..).Count()
00:00:00.0000009 for N = 4 -> 4, List<int>.Select(..).Count()
00:00:00.0000011 for N = 8 -> 8, List<int>.Select(..).Count()
00:00:00.0000012 for N = 16 -> 16, List<int>.Select(..).Count()
00:00:00.0000026 for N = 32 -> 32, List<int>.Select(..).Count()
00:00:00.0000018 for N = 64 -> 64, List<int>.Select(..).Count()
00:00:00.0000032 for N = 128 -> 128, List<int>.Select(..).Count()
00:00:00.0000059 for N = 256 -> 256, List<int>.Select(..).Count()
00:00:00.0000098 for N = 512 -> 512, List<int>.Select(..).Count()
00:00:00.0000190 for N = 1024 -> 1024, 
List<int>.Select(..).Count()
00:00:00.0000369 for N = 2048 -> 2048, 
List<int>.Select(..).Count()
00:00:00.0000750 for N = 4096 -> 4096, 
List<int>.Select(..).Count()
00:00:00.0002185 for N = 8192 -> 8192, 
List<int>.Select(..).Count()
00:00:00.0003551 for N = 16384 -> 16384, 
List<int>.Select(..).Count()
00:00:00.0005826 for N = 32768 -> 32768, 
List<int>.Select(..).Count()
00:00:00.0015252 for N = 65536 -> 65536, 
List<int>.Select(..).Count()
00:00:00.0024139 for N = 131072 -> 131072, 
List<int>.Select(..).Count()
00:00:00.0049246 for N = 262144 -> 262144, 
List<int>.Select(..).Count()
00:00:00.0096537 for N = 524288 -> 524288, 
List<int>.Select(..).Count()
00:00:00.0194600 for N = 1048576 -> 1048576, 
List<int>.Select(..).Count()
00:00:00.0422573 for N = 2097152 -> 2097152, 
List<int>.Select(..).Count()
00:00:00.0749799 for N = 4194304 -> 4194304, 
List<int>.Select(..).Count()
00:00:00.1511740 for N = 8388608 -> 8388608, 
List<int>.Select(..).Count()
00:00:00.3004764 for N = 16777216 -> 16777216, 
List<int>.Select(..).Count()
00:00:00.6018954 for N = 33554432 -> 33554432, 
List<int>.Select(..).Count()
00:00:01.2064069 for N = 67108864 -> 67108864, 
List<int>.Select(..).Count()
00:00:02.6716092 for N = 134217728 -> 134217728, 
List<int>.Select(..).Count()
00:00:05.1524452 for N = 268435456 -> 268435456, 
List<int>.Select(..).Count()
00:00:09.6481144 for N = 536870912 -> 536870912, 
List<int>.Select(..).Count()
00:00:00.0005440 for N = 1 -> 1, Array<int>.Select(..).Count()
00:00:00.0000010 for N = 2 -> 2, Array<int>.Select(..).Count()
00:00:00.0000008 for N = 4 -> 4, Array<int>.Select(..).Count()
00:00:00.0000009 for N = 8 -> 8, Array<int>.Select(..).Count()
00:00:00.0000013 for N = 16 -> 16, Array<int>.Select(..).Count()
00:00:00.0000015 for N = 32 -> 32, Array<int>.Select(..).Count()
00:00:00.0000020 for N = 64 -> 64, Array<int>.Select(..).Count()
00:00:00.0000035 for N = 128 -> 128, Array<int>.Select(..).Count()
00:00:00.0000061 for N = 256 -> 256, Array<int>.Select(..).Count()
00:00:00.0000107 for N = 512 -> 512, Array<int>.Select(..).Count()
00:00:00.0000209 for N = 1024 -> 1024, 
Array<int>.Select(..).Count()
00:00:00.0000424 for N = 2048 -> 2048, 
Array<int>.Select(..).Count()
00:00:00.0000822 for N = 4096 -> 4096, 
Array<int>.Select(..).Count()
00:00:00.0001633 for N = 8192 -> 8192, 
Array<int>.Select(..).Count()
00:00:00.0003263 for N = 16384 -> 16384, 
Array<int>.Select(..).Count()
00:00:00.0006503 for N = 32768 -> 32768, 
Array<int>.Select(..).Count()
00:00:00.0013024 for N = 65536 -> 65536, 
Array<int>.Select(..).Count()
00:00:00.0026130 for N = 131072 -> 131072, 
Array<int>.Select(..).Count()
00:00:00.0052041 for N = 262144 -> 262144, 
Array<int>.Select(..).Count()
00:00:00.0103705 for N = 524288 -> 524288, 
Array<int>.Select(..).Count()
00:00:00.0207945 for N = 1048576 -> 1048576, 
Array<int>.Select(..).Count()
00:00:00.0418217 for N = 2097152 -> 2097152, 
Array<int>.Select(..).Count()
00:00:00.0829522 for N = 4194304 -> 4194304, 
Array<int>.Select(..).Count()
00:00:00.1658241 for N = 8388608 -> 8388608, 
Array<int>.Select(..).Count()
00:00:00.3304377 for N = 16777216 -> 16777216, 
Array<int>.Select(..).Count()
00:00:00.6636190 for N = 33554432 -> 33554432, 
Array<int>.Select(..).Count()
00:00:01.3255121 for N = 67108864 -> 67108864, 
Array<int>.Select(..).Count()
00:00:02.6572163 for N = 134217728 -> 134217728, 
Array<int>.Select(..).Count()
00:00:05.2761961 for N = 268435456 -> 268435456, 
Array<int>.Select(..).Count()
00:00:10.7154543 for N = 536870912 -> 536870912, 
Array<int>.Select(..).Count()

> This is a *good* documentation:
> - "Count" is a better name than "walkLength"; every other 
> programming language will use concepts similar to count, cnt, 
> length, len.

"Length" is right there in walkLength. I don't see this as an 
obstacle for anybody that would want to use this function.
"walk" in walkLength informs you of the performance consequence, 
which for me is critical information.

> - You don't need to understand computer science terms to find 
> out what a function does;

A major disadvantage, if you ask me :D

BTW, Array.Length uses the same notation:
https://msdn.microsoft.com/en-us/library/system.array.length(v=vs.110).aspx

> - If you are really interested about more than finding out the 
> number of elements, there is a performance hint in the Remarks 
> section.

No the Remarks section is so vague that it is no hint:
"... Otherwise, this method determines the count"

Determines how? I really dislike this the style of documentation 
written just for the sake documentation.

> - Links are provided to concepts: even the return type (int) 
> has a link.
> - It clearly states what's happening if the range is not defined
> - It clearly states what's happening if the range contains more 
> than int.max elements
>
> On the contrary, the D documentation, introduces a bunch of 
> non-linked concepts, but it tells me that it's possible to 
> perform O(n) evaluations:
> - isInputRange
> - isInfiniteRange
> - hasLength
> - empty
> - popFront

I agree that Phobos needs to do a lot better at introducing 
concepts such as ranges.

> There is no indication what happens if the range is undefined 
> in D docs. In fact, inconsistent behavior:
> - it will return 0 in case of null arrays;

This is logical and doesn't need to be documented.

> - it will throw AccessViolation for null ranges (or probably 
> segfault on Linux);

Again, this is expected, as it is a result of a programmer error. 
Do you document that calling methods on a null object will 
segfault for every single method? null.Length is not documented 
for System.Array and doesn't need to be.


> There is no indication what happens if the range contains more 
> than size_t.max elements:
> - integer overflow;

Practically impossible on 64-bit and unlikely that someone will 
use walkLength with files on 32-bit. It's called *walk*Length for 
a reason.

This is more of an issue in .NET because the length / count is 
fixed as System.Int32, regardless of the actual amount of 
addressable memory.

> Like someone said: D has genius programmers, but worst 
> marketers.

Contrary to LINQ which had super good marketing and very bad 
implementation. I read the blog of one of the guys that worked on 
Parallel LINQ that there were some instances were you could good 
get almost 4X improvement with the Parallel version on a 4 CPU 
system, and then get 100X with a single threaded C implementation 
:D


More information about the Digitalmars-d mailing list