code review: splitIds from DConf '22 day 3: saving a sort and "getting performance"

Fri Aug 5 13:45:41 UTC 2022

On Thursday, 4 August 2022 at 13:18:40 UTC, kdevel wrote:
> At DConf '22 day 3 Robert Schadek presented at around 07:22:00 
> in the YT video the function `splitIds`. Given an HTML page 
> from bugzilla containing a list of issues `splitIds` aims at 
> extracting all bug-ids referenced within a specific url context:
>
> ```
> long [] splitIds (string page)
> {
>    enum re = ctRegex!(`"show_bug.cgi\?id=[0-9]+"`);
>    auto m = page.matchAll (re);
>
>    return m
>       .filter!(it => it.length > 0)                 // what is 
> this?
>       .map!(it => it.front)                         // whole 
> match, it[0]
>       .map!(it => it.find!(isNumber))               // searches 
> fist number
>       .map!(it => it.until!(it => !it.isNumber ())) // last 
> number
>       .filter!(it => !it.empty)                     // again an 
> empty check??? why?
>       .map!(it => it.to!long ())
>       .uniq                                         // .sort is 
> missing. IMHO saving at the wrong things?
>       .array;
> }
> ```
>
> `m` contains all matches. It is a "list of lists" as one would 
> say in Perl. The "inner lists" contains as first element 
> ("`front`") the string which matches the whole pattern. So my 
> first question is:
>
> What is the purpose of the first filter call? Since the element 
> of `m` is a match it cannot have a length of 0.
> [...]

I think that the first one is to prevent to call `front()` on an 
empty range, excepted that according to the regex that should not 
happen.

BTW I haven't washed the video but I suppose this is related to 
the migration of bugzilla to GH issues. I wonder why 
https://bugzilla.readthedocs.io/en/5.0/api/index.html#apis is not 
used instead.