"foreach(i, dchar c; s)" vs "decode"

Sun Nov 25 23:43:29 PST 2012

On Sunday, 25 November 2012 at 21:51:42 UTC, Jonathan M Davis 
wrote:
> On Sunday, November 25, 2012 22:37:24 monarch_dodra wrote:
>> I got these results on 2.061 alpha release, with phobos in
>> release and both -inline and without inline.
>
> You should also be testing with -O if you're benchmarking, but 
> I still would
> have thought that the compiler would be faster. Apparently not. 
> I believe that
> definite work has been put into improving the decode, stride, 
> popFront, etc. in
> Phobos over the past year or two, so they've definitely been 
> improving. I
> suspect that whatever the compiler is doing hasn't been touched 
> in ages, and I
> have no idea what improvements could or couldn't be done. It 
> _is_ the sort of
> thing that I'd kind of expect to be sitting somewhere in 
> druntime though. If
> it is, maybe foreach and Phobos' implemenations can be made to 
> share in some
> way. I don't know (though IMHO speed should be more important 
> here than
> reducing code duplication).
>
> The speed of foreach's decoding definitely matters, but in the 
> code that I've
> really been trying to make fast, I don't generally use it, 
> because it's often
> the case that some portion of what I'm doing can be made faster 
> by skipping
> decoding for some portion of the characters (like explicitly 
> handling the code
> units for paraSep and lineSep in code that cares about the end 
> of lines).
> Making string processing fast should definitely be one of our 
> performance
> priorities though IMHO given how big an impact that can have on 
> many programs
> and how unfriendly ranges generally are to efficient string 
> processing.
>
> - Jonathan M Davis

Well, "-release -O" went without saying, but you are right to 
mention it, you never know.

Looking at 2.060 to 2.061, std.utf has changed a lot. I'll bench 
my algo using the old implementation of 2.060 to see if the 
change of performance could be related to that.

As you said, I found how some a "rt.util.utf" module in druntime, 
  I was looking in the dmd tree. However, it is pretty much an old 
version of std.utf, verbatim...

Also, druntime has a *radically* different approach to striding 
UTF-8. I'll try to see which approach is faster.

I'd have suggested we try some sort of code sharing, but now that 
"std.utf" supports range, the code has "forked" and I'm not sure 
is shareable anymore... Not without duplicating code inside 
std.utf, or adding range support (or at least code) for decoding 
ranges in druntime.

Well, I'll see what I can uncover, and update dmd utf in the 
meantime...