code review: splitIds from DConf '22 day 3: saving a sort and "getting performance"
user1234
user1234 at 12.de
Fri Aug 5 13:45:41 UTC 2022
On Thursday, 4 August 2022 at 13:18:40 UTC, kdevel wrote:
> At DConf '22 day 3 Robert Schadek presented at around 07:22:00
> in the YT video the function `splitIds`. Given an HTML page
> from bugzilla containing a list of issues `splitIds` aims at
> extracting all bug-ids referenced within a specific url context:
>
> ```
> long [] splitIds (string page)
> {
> enum re = ctRegex!(`"show_bug.cgi\?id=[0-9]+"`);
> auto m = page.matchAll (re);
>
> return m
> .filter!(it => it.length > 0) // what is
> this?
> .map!(it => it.front) // whole
> match, it[0]
> .map!(it => it.find!(isNumber)) // searches
> fist number
> .map!(it => it.until!(it => !it.isNumber ())) // last
> number
> .filter!(it => !it.empty) // again an
> empty check??? why?
> .map!(it => it.to!long ())
> .uniq // .sort is
> missing. IMHO saving at the wrong things?
> .array;
> }
> ```
>
> `m` contains all matches. It is a "list of lists" as one would
> say in Perl. The "inner lists" contains as first element
> ("`front`") the string which matches the whole pattern. So my
> first question is:
>
> What is the purpose of the first filter call? Since the element
> of `m` is a match it cannot have a length of 0.
> [...]
I think that the first one is to prevent to call `front()` on an
empty range, excepted that according to the regex that should not
happen.
BTW I haven't washed the video but I suppose this is related to
the migration of bugzilla to GH issues. I wonder why
https://bugzilla.readthedocs.io/en/5.0/api/index.html#apis is not
used instead.
More information about the Digitalmars-d-learn
mailing list