Should this work?

Manu turkeyman at gmail.com
Thu Jan 9 22:18:20 PST 2014


On 10 January 2014 12:48, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:

> On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
> > On 10 January 2014 06:27, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:
> >
> > > On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
> > > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
> [...]
> > > > >I also find the names of the generic algorithms are often
> > > > >unrelated to the name of the string operation.  My feeling is,
> > > > >everyone is always on about how cool D is at string, but other
> > > > >than 'char[]', and the builtin slice operator, I feel really
> > > > >unproductive whenever I do any heavy string manipulation in D.
> > >
> > > Really?? I find myself much more productive, because I only have to
> > > learn one set of generic algorithms, and I can use them not just for
> > > strings but for all sorts of other stuff that implement the range
> > > API.
> > >
> >
> > That sounds good in theory, but if any time you try and actually use
> > D's generic algorithms you end up with many of the kind of errors you
> > refer to in your prior paragraph, then that basically undermines the
> > whole experience.
>
> Really? I only encounter those kinds of errors once in a while. They
> *are* extremely annoying when they happen, but on the whole, they're
> relatively rare. You must be doing something wrong if you're seeing them
> all the time.
>

I think not really knowing quite what you need to do in advance elevates
the probability of doing something wrong ;)
The quality of these range error messages needs to be improved somehow if
basic string operations are supposed to use them comfortably.


> I don't like wasting my time, and I don't like pushing my way through
> > learning something that I feel is obtuse to begin with, so I usually
> > take a side path and work around it (most things can be done easily
> > with a couple of nested foreach-es). So, perhaps embarrassingly,
> > despite my 3+ years spent hanging around here, part of the problem is
> > that I barely know/use phobos. Call me lazy, but I don't think it's an
> > unrealistic experience for any end-user. If it saves me time/headache
> > (and bloat) not using it, why would I?
> >
> > ** Yes, it's the 'standard' library, and I like that concept in
> > essence, and feel like I should make use of it on principle... but
> > it's like, you need to already know phobos intimately to think it's
> > awesome, which creates a weird barrier to entry. And the docs don't
> > help a lot.
>
> I think you're tainted by your experience with C. :-) Using Phobos
> effectively requires that you take the time to understand and use
> ranges; or, as somebody else said, stick with std.string. But if that
> doesn't do what you need, then you need to ... er, understand and use
> ranges. :-P  Expecting to use things the same way as in C is probably
> the root cause for your frustrations.
>

I don't agree that something like ranges shouldn't be more or less
intuitive. C doesn't have ranges, so I don't think I'm really transposing C
baggage when considering how to debug my mistakes in range based code in
this case.
Like most things, once you know your way around it, it's fine, but is there
opportunities (mostly in trivial things like better naming
conventions/standards and improved error messages) to make it a whole lot
more intuitive?


> > Whereas in languages like C, sure you get familiar with
> > > string-specific functions, but then when you need a
> > > similar-operating function for an array of ints, you have to name it
> > > something else, and then basically the same algorithm reimplemented
> > > for linked lists, called by yet another name, etc.. Added together,
> > > it's many times more mental load than just learning a single set of
> > > generic algorithms that work on (almost) everything.
> > >
> > > The composability of generic algorithms also allow me to think on a
> > > more abstract level -- instead of thinking about manipulating
> > > individual chars, I can figure out OK, if I split the string by ","
> > > then I can filter for the strings I'm looking for, then join them
> > > back again with another delimiter. Since the same set of algorithms
> > > work with other ranges too, I can apply exactly the same thought
> > > process for working with arrays, linked lists, and other containers,
> > > without having to remember 5 different names of essentially the same
> > > algorithm but applied to 5 different types.
> > >
> >
> > See, I get that idea about composability. Maybe it's just baggage from
> > C, but I just don't think that way. Maybe that's a large part of why I
> > always go wrong with phobos.
>
> Yes, the baggage is slowing you down. Cast it overboard and lighten the
> boat, man. ;-)
>
>
> > I would never think of doing something fundamental like string
> > processing with a sequence of generic algorithm. I'd freak out about
> > the relatively unknown performance characteristics.
>
> I think your caution is misplaced. Things like std.algorithm.find are
> actually quite efficient -- don't be misled by the verbose layers of
> template abstractions surrounding the code; for the common cases, it
> translates to a simple loop. And recently, certain cases even translate
> straight to C's strchr / memchr, and so are on par with C.
>

Surely it can't do that if the operation requires any composition? How do
you specialise a composed sequence of operations?

> Algorithms are usually a lot simpler when performed on strings of
> > bytes than they are performed on strings of objects with any
> > imaginable copying mechanisms and allocations patterns.
>
> Phobos also has lots of template specializations that take advantage of
> strings and arrays.
>

Again, I'm talking WRT composition specifically here.


> Unless I wrote something myself, I can never have faith that the sort
> > of concessions required to make it generic also make it fast in the
> > case it happens to be performed in a byte array.
>
> Well, if you're going to insist on NIH syndrome, then you might as well
> write your own standard library instead of fighting with Phobos. :)
>
>
> > There's an argument that you can specialise for string types, which is
> > true within single functions, but if you're 'composing' a function
> > with generic parts, then you can't specialise for strings anymore...
> > There's no way to specialise a call to a.b.c() as a compound
> > operation.
>
> And how exactly does the C compiler specialize strchr(strcat(a,b),c) as
> a single compound operation?
>

That's equally a composed statement. It's the same as the concern I raise.
I was refering to cases where D requires a composed statement as opposed to
cases where other languages may have some explicit function that does a
single complex thing.

And I'm not talking about specifics, I was illustrating the nature of my
psychological baggage :) .. I have an unreasonable distrust towards
requiring composed statements to do very simple things.
It's not a specific criticism, it's a comment.


If you want a single-pass compound operation on a string, you'd have to
> write it out manually in C... and in D, you could write it out manually
> too, just use a for loop over the string -- same effort, same
> performance. Or you could save yourself the trouble and compose two
> algorithms from std.algorithm, the result of which is *also* single-pass
> (because ranges are lazy). Sure you can object that there's overhead
> introduced by using ranges, but since .front translates to just *ptr and
> .popFront translates to just ++ptr, the only overhead is just a few
> function calls if the compiler doesn't inline them. Which, for functions
> that small, it probably does.
>

Surely it can't be *ptr and ++ptr as you say, otherwise none of it would be
unicode safe...?


> Like I say, it's probably psychological baggage, but I tend to
> > unconsciously dismiss/reject that sort of thing without a second
> > though...  or maybe experience learned me my lesson (*cough* STL).
>
> OK, let's get one thing straight here. Comparing Phobos to STL is truly
> unfair. I spent almost 2 decades writing C++, and wrote code both using
> STL and without (from when STL didn't exist yet), and IME, Phobos's
> range algorithms are *orders* of magnitude better than STL in terms of
> usability. At least. In STL, you have to always manage pointer pairs,
> which become a massive pain when you need to pass multiple pairs around
> (very error-prone, transpose one argument, and you have a nice segfault
> or memory corruption bug).  Then you have stupid verbose syntax like:
>
>         // You can't even write the for-loop conditions in a single
>         // line!
>         for (std::vector<MyType<Blah> >::iterator it =
>                 myContainer.start();
>                 it != myContainer.end();
>                 it++)
>         {
>                 // What's with this (*smartPtr)->x nonsense everywhere?
>                 doSomething((*((*it)->impl)->myDataField);
>
>                 // What, I can't even write a simple X != Y if-condition
>                 // in a single line?! Not to mention the silly
>                 // redundancy of having to write out the entire chain of
>                 // dereferences to exactly the same object twice.
>                 if (find((*(*it)->impl)->mySubContainer, key) ==
>                         (*(*it)->impl)->mySubContainer.end())
>                 {
>                         // How I long for D's .init!
>                         std::vector<MyTypeBlah> >::iterator empty;
>                         return empty;
>                 }
>         }
>
> Whereas in D:
>
>         foreach (item; myContainer) {
>                 doSomething(item.impl.myDataField);
>                 if (!item.mySubContainer.canFind(key))
>                         return ElementType!MyContainer.init;
>         }
>
> There's no comparison, I tell you. No comparison at all.
>

Yes, I'm aware that it's syntactically superior, but the quality of the
error messages isn't much better than STL.
I also find things easier to find and/or more logically named (probably
biased from past exposure, i know) in the STL than in phobos.


> > > I actually feel a lot more productive in D than in C++ with
> > > > strings.  Boost's string algorithms library helps fill the gap
> > > > (and at least you only have one place to look for documentation
> > > > when you are using it) but overall I prefer my experience working
> > > > in D with pseudo-member chains.
> > >
> > > I found that what I got out of taking the time to learn
> > > std.algorithm and std.range was worth far more than the effort
> > > invested.
> > >
> >
> > Perhaps you're right. But I think there's ***HUGE*** room for
> > improvement.  The key in your sentence is, it shouldn't require
> > 'effort'; if it's not intuitive to programmers with decades of
> > experience, then there are probably some fundamental design (or
> > documentation/accessibility) deficiencies that needs to be
> > prioritised. How is any junior programmer meant to take to D?
>
> No offense, but IME, junior programmers tend to pick up these things
> much faster than experienced programmers with lots of baggage from other
> languages, precisely because they don't have all that baggage to slow
> them down. Old habits die hard, as they say.
>

Maybe you're right, but I can't imagine many juniors that would be capable
of tracking down what went wrong when they inevitably made a mistake and
get met with weird errors relating to ranges and template constraints and
all that good stuff... Maybe they'd be doing it differently in the first
place though? Who knows.


That's not to say that the D docs don't need improvement, of course. But
> given all your objections about Phobos algorithms despite having barely
> *used* Phobos, I think the source of your difficulty lies more in the
> baggage than in the documentation. :)
>

I already said that myself. But I'd like to think the experience could be
smoother, more helpful, and more intuitive. I don't think you can say it's
perfect, or even particularly 'good'. It's acceptable, it does seem to
work, but it's not an easy learning curve, and it's hard to take in small
steps, or to absorb via osmosis.
Every time I try and repeat something that 'I kinda remember seeing a few
months ago' and 'it was kinda like this...', it takes me AGES to get right.
Always finicky little details that take the most time, and I often find the
phobos source code more helpful than the docs, which isn't a good sign.

That's my general point. I think there's a lot of room for case study, and
improvement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20140110/4410ae4b/attachment-0001.html>


More information about the Digitalmars-d mailing list