Investigation: downsides of being generic and correct
Jonathan M Davis
jmdavisProg at gmx.com
Fri May 17 02:41:21 PDT 2013
On Friday, May 17, 2013 11:15:24 Dicebot wrote:
> On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
> > 1. In general, if you want to operate on ASCII, and you want
> > your code to be
> > fast, use immutable(ubyte)[], not immutable(char)[]. Obviously,
> > that's not
> > gonig to work in this case, because the function is in
> > std.string, but maybe
> > that's a reason for some std.string functions to have ubyte
> > overloads which
> > are ASCII-specific.
>
> I was thinking exactly about that. Only thing I want to be
> advised on - is it better to add those overloads in std.string or
> separate module is better from the point of self-documentation?
I'm not sure. My first inclination would be to simply put them as overloads in
the same module, but that probably merits some discussion. And while I think
that having ubyte overloads for strings for ASCII is something that we should
at least explore, it probably merits some discussion as well, as we haven't
really done a lot with handling ASCII outside of std.ascii at this point
(which currently only operates on characters, not strings). My first
inclination is to handle ASCII where necessary by accepting arrays of ubytes,
but others here may have other ideas about that (which may or may not be
better).
A side note of that is that we might want to consider is having a function
called assumeASCII which casts from string to immutable(ubyte)[] (similar to
assumeUnique). I think that that might have been suggested before, but even if
it has, we've never actually added it.
> > 2. We actually discussed removing all of the pattern stuff
> > completely and
> > replacing it with regexes.
>
> Is is kind of pre-approved? I am willing to add this to my TODO
> list together with needed benchmarks, but had some doubts that
> std.string depending on std.regex will be tolerated.
AFAIK, there would be no problem with doing so. Maybe Dmitry would have
something to say about it, since he's the regex guru, but IIRC, the last time
it was discussed, it was pretty clear that we wanted those functions to be
using std.regex instead of patterns. So, if you did the work and did it at the
appropriate quality level, I expect that it would be merged in. And we might
or might now deprecate the pattern functions at that point (that was
originally my intention and is why I never fixed their names, but we're not
deprecating much now, so I don't know if we'll want to in this case).
> I understand that. What I tried to bring attention to is how big
> difference it may be for someone who just picks random functions
> and writes some simple code. It is very tempting to just say
> "Phobos (D) sucks" and don't get into details. In other words I
> consider it more of informational/marketing issue than a
> technical one.
We need to do more to optimize Phobos, but given our stance of correctness by
default, we're kind of stuck with string functions taking a performance hit in
a number of common cases simply due to the necessary decoding of code points.
We can do better at making them fast, and reduce problems like this, but
ultimately, if you want fast ASCII-only operations, you almost certainly need
to operate on something like ubyte[] rather than string, and that requires
educating people. It's one of the costs of trying to be both correct and
performant.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list