std.regexp vs std.regex [Re: RegExp.find() now crippled]

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Tue Nov 16 11:14:44 PST 2010


On 11/16/10 10:16 AM, Steve Teale wrote:
> Andrei Alexandrescu Wrote:
>
>>
>> I am sorry for the inadvertent change, it wasn't meant to change
>> semantics of existing code. I'm not sure whether one of my unrelated
>> 64-bit changes messed things up. You may want to file a bug report.
>>
>> There are a number of good reasons for which I was compelled to split
>> std.regex from std.regexp. I'm sure you or others would have found them
>> just as compelling if you saw things the same way.
>>
>> Phobos 1 has experimented in std.string and std.regexp with juxtaposing
>> APIs of various languages (PHP, Ruby, Python). The reasoning was that
>> people familiar with either of those languages could feel right at home
>> by using APIs with similar nomenclatures and semantics. The result was
>> some strange bedfellows in std.string such as "column" or "capwords" and
>> an outright mess in std.regexp. The interface of std.regexp is without a
>> doubt the worst I've ever seen, by a long shot. I have never been able
>> to use it without poring through the documentation _several times_ and
>> without confirming to myself via a small test case that I'm doing the
>> right thing.
>>
>> The simplest problem is this: std.regexp uses the words "exec", "find",
>> "match", "search", and "test" - all to mean regular expression matching.
>> There is absolutely no logic to how meanings are ascribed to words, and
>> there is absolutely no recourse than rote memorization of various
>> arbitrary decisions.
>>
>> The resulting FrankenAPI is likely familiar to anyone except those
>> who've actually spent time learning it, in spite of it trying to be
>> familiar to anyone.
>>
>> So I spawned std.regex in an attempt to sanitize the API (I made minor,
>> if any, changes to the engine; I am in fact having significant trouble
>> maintaining it). The advantages of std.regex are:
>>
>> * No more class definition. Nobody is supposed to inherit RegExp anyway
>> so it's useless to brand the object as a class.
>>
>> * Engine is separated from matches, which means that engines can be
>> memoized for efficiency. Currently regex() only memoizes the last engine.
>>
>> * The new engine works with any character size.
>>
>> * Simpler API: create a regex, call match() against that regex and a
>> string, look at the resulting RegexMatch object.
>>
>> If this all annoys you more than the old API, I will need to disagree.
>> If you have suggestions on how std.regex can be improved, I'm all ears.
>>
>>
>> Andrei
>
> Andrei,
>
> Maybe it is time that the structure of the standard library became
> more generalized. At the moment we have std... and core...
>
> Perhaps we need another branch in the hierarchy, like ranges... Then
> there could be a std.range module that was the gateway into ranges...
> The library could then expand in an orderly fashion, with a wider
> range of users becoming responsible for the maintenance of of
> different branches against changes in the language, not against
> changes in fashion.
>
> Then you could have ranges.regex, that suits you, and the people who
> were happy with the status quo, could continue to use std.regexp,
> which should continue to behave like it did in DMD2.029 or whatever
> it was when I wrote my 'legacy' code.

I think that's not a good design. Ranges are a cross-cutting 
abstraction. One wouldn't put all code using exception under 
std.exceptions or code using floating point under std.floating_point. 
Better, ranges, exceptions, or floating point should be used wherever it 
makes sense to use them.

> The current system, where modules of the library can get arbitrarily
> deprecated and at some point removed because they are unfashionable,
> is very unfriendly.

I agree we need to have a rather long deprecation schedule. Fashionable 
has, however, little to do with the rationale for deprecation. You may 
want to tune to the Phobos developers' mailing list for more details.

> I recognize that you are young, hyper-intelligent, and motivated
> toward fame.

I have enumerated a list of technical reasons for which std.regexp is 
inadequate, followed by a list of improvements brought about by 
std.regex. Ranges are nowhere on that list, nor is being fashionable. 
It's all good old design stuff that I'm sure you have down better than 
me: make an API small and simple, separate concerns (engine/matches), 
use the right tool for the job (struct not class), generalize within 
reason (character width).

Would have been great to have a discussion along those lines. Instead, I 
see you chose to ignore all technical arguments and go with a 
presupposition, no matter how assuming and stereotypical.

> But there are other users, like me, who are older, but not senile,
> and have more conservative attitudes, including the desire to use
> code they wrote in the past at some point in the future.

Backward compatibility is indeed important, and again we need to have a 
long deprecation schedule. At the same time, I think there are much more 
many users in D's future than in its past, and I cannot inflict 
std.regexp on them.


Andrei


More information about the Digitalmars-d mailing list