Am I reading this wrong, or is std.getopt *really* this stupid?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sat Mar 24 12:27:48 UTC 2018


On 03/24/2018 01:55 AM, Chris Katko wrote:
> On Saturday, 24 March 2018 at 03:04:41 UTC, Andrei Alexandrescu wrote:
>> On 3/23/18 7:29 PM, H. S. Teoh wrote:
>>> Well, looking at the implementation of std.getopt turned up the
>>> disturbing fact that the program's argument list is actually scanned
>>> *multiple times*, one for each possible option(!).  Besides the bogonity
>>> that whether or not searchPaths will be set prior to finding -l depends
>>> on the order of arguments passed to getopt(), this also represents an
>>> O(n*m) complexity in scanning program arguments, where n = number of
>>> arguments and m = number of possible options.
>>>
>>> And this is not to mention the fact that getoptImpl is *recursive
>>> template*.  Why, oh why?
>>>
>>> Am I the only one who thinks the current implementation of getopt() is
>>> really stupid??  Can somebody please talk some sense into me, or point
>>> out something really obvious that I'm missing?
>>
>> Affirmative. The implementation is quadratic (including a removal of 
>> the option from the string). This is intentional, i.e. understood and 
>> acknowledged while I was working on it. Given that the function is 
>> only called once per run and with a number of arguments at most in the 
>> dozens, by the time its complexity becomes an issue the function is 
>> long beyond its charter.
>>
>> This isn't the only instance of quadratic algorithms in Phobos. 
>> Quicksort uses an insertion sort - a quadratic algorithm - for 25 
>> elements or fewer. That algorithm may do 600 comparisons in the worst 
>> case, and it's potentially that many for each group of 25 elements in 
>> a large array.
>>
>> Spending time on improving the speed of getopt is unrecommended. Such 
>> work would add no value.
>>
>>
>> Andrei
> 
> Is there a possibility of improving this function?

Most likely it can be improved in terms of features.

>   - While quadratic, for low N, quadratic isn't a big deal. So at what 
> point does quadratic for this function become "a problem"?

At a point where a realistic benchmarks shows a need. Without a 
motivating measurement, making getopt faster would be a waste of time.

I mentioned another function: sort. For that, YES, the are ways of 
improving it. In fact, right after posting my message, I couldn't sleep 
thinking of a number of ways to improve the short sort part. We have 
precise benchmarks measuring the number of comparisons and swaps 
performed by our implementation of sort. Improving its performance lifts 
a lot of boats - many high-level algorithms use sort as an essential 
building block. There's a world of difference in impact of the speed of 
sort vs. speed of getopt.

Here are a few ideas for improving the small array sort part (the last 
mile of sort):

* Currently we switch to short sort when the number of elements to sort 
is smaller than max(32, 1024 / Elem.sizeof). Probably a better choice 
can be found through experimentation.

* Insertion sort does linear search in the already-sorted portion. 
Probably galloping search would fare better.

* Insertion sort starts from the end and grows the sorted portion down. 
Starting from the middle of the array and growing left and right 
simultaneously would slash the number of comparisons and swaps in half.

* There could be other algorithms better for short arrays, such as 
specialized versions of heapsort or smoothsort.

>   - If it is a problem, what's stopping someone from improving it?

Hopefully very little.

> Last question though, is there any kind of list of features, and minor 
> features and fixes that can or need to be done? Perhaps it already 
> exists, but it seems like it'd be great to have a wiki of contribution 
> sites (like this function) that someone could just browse and go "Hey, I 
> know how to do X, maybe I'll take a crack at it." That way, devs who 
> don't have time to improve something "low on the list" could still 
> outsource it in a clear list instead of people who just happen to see it 
> on the forum at the right place right time.

Seb gave a great answer - thanks!


Andrei


More information about the Digitalmars-d mailing list