Should this work?

Thu Jan 9 18:48:44 PST 2014

On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
> On 10 January 2014 06:27, H. S. Teoh <hsteoh at quickfur.ath.cx> wrote:
> 
> > On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
> > > On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
> > > >I also find the names of the generic algorithms are often
> > > >unrelated to the name of the string operation.  My feeling is,
> > > >everyone is always on about how cool D is at string, but other
> > > >than 'char[]', and the builtin slice operator, I feel really
> > > >unproductive whenever I do any heavy string manipulation in D.
> >
> > Really?? I find myself much more productive, because I only have to
> > learn one set of generic algorithms, and I can use them not just for
> > strings but for all sorts of other stuff that implement the range
> > API.
> >
> 
> That sounds good in theory, but if any time you try and actually use
> D's generic algorithms you end up with many of the kind of errors you
> refer to in your prior paragraph, then that basically undermines the
> whole experience.

Really? I only encounter those kinds of errors once in a while. They
*are* extremely annoying when they happen, but on the whole, they're
relatively rare. You must be doing something wrong if you're seeing them
all the time.

> I don't like wasting my time, and I don't like pushing my way through
> learning something that I feel is obtuse to begin with, so I usually
> take a side path and work around it (most things can be done easily
> with a couple of nested foreach-es). So, perhaps embarrassingly,
> despite my 3+ years spent hanging around here, part of the problem is
> that I barely know/use phobos. Call me lazy, but I don't think it's an
> unrealistic experience for any end-user. If it saves me time/headache
> (and bloat) not using it, why would I?
>
> ** Yes, it's the 'standard' library, and I like that concept in
> essence, and feel like I should make use of it on principle... but
> it's like, you need to already know phobos intimately to think it's
> awesome, which creates a weird barrier to entry. And the docs don't
> help a lot.

I think you're tainted by your experience with C. :-) Using Phobos
effectively requires that you take the time to understand and use
ranges; or, as somebody else said, stick with std.string. But if that
doesn't do what you need, then you need to ... er, understand and use
ranges. :-P  Expecting to use things the same way as in C is probably
the root cause for your frustrations.

> > Whereas in languages like C, sure you get familiar with
> > string-specific functions, but then when you need a
> > similar-operating function for an array of ints, you have to name it
> > something else, and then basically the same algorithm reimplemented
> > for linked lists, called by yet another name, etc.. Added together,
> > it's many times more mental load than just learning a single set of
> > generic algorithms that work on (almost) everything.
> >
> > The composability of generic algorithms also allow me to think on a
> > more abstract level -- instead of thinking about manipulating
> > individual chars, I can figure out OK, if I split the string by ","
> > then I can filter for the strings I'm looking for, then join them
> > back again with another delimiter. Since the same set of algorithms
> > work with other ranges too, I can apply exactly the same thought
> > process for working with arrays, linked lists, and other containers,
> > without having to remember 5 different names of essentially the same
> > algorithm but applied to 5 different types.
> >
> 
> See, I get that idea about composability. Maybe it's just baggage from
> C, but I just don't think that way. Maybe that's a large part of why I
> always go wrong with phobos.

Yes, the baggage is slowing you down. Cast it overboard and lighten the
boat, man. ;-)

> I would never think of doing something fundamental like string
> processing with a sequence of generic algorithm. I'd freak out about
> the relatively unknown performance characteristics.

I think your caution is misplaced. Things like std.algorithm.find are
actually quite efficient -- don't be misled by the verbose layers of
template abstractions surrounding the code; for the common cases, it
translates to a simple loop. And recently, certain cases even translate
straight to C's strchr / memchr, and so are on par with C.

> Algorithms are usually a lot simpler when performed on strings of
> bytes than they are performed on strings of objects with any
> imaginable copying mechanisms and allocations patterns.

Phobos also has lots of template specializations that take advantage of
strings and arrays.

> Unless I wrote something myself, I can never have faith that the sort
> of concessions required to make it generic also make it fast in the
> case it happens to be performed in a byte array.

Well, if you're going to insist on NIH syndrome, then you might as well
write your own standard library instead of fighting with Phobos. :)

> There's an argument that you can specialise for string types, which is
> true within single functions, but if you're 'composing' a function
> with generic parts, then you can't specialise for strings anymore...
> There's no way to specialise a call to a.b.c() as a compound
> operation.

And how exactly does the C compiler specialize strchr(strcat(a,b),c) as
a single compound operation?

If you want a single-pass compound operation on a string, you'd have to
write it out manually in C... and in D, you could write it out manually
too, just use a for loop over the string -- same effort, same
performance. Or you could save yourself the trouble and compose two
algorithms from std.algorithm, the result of which is *also* single-pass
(because ranges are lazy). Sure you can object that there's overhead
introduced by using ranges, but since .front translates to just *ptr and
.popFront translates to just ++ptr, the only overhead is just a few
function calls if the compiler doesn't inline them. Which, for functions
that small, it probably does.

> Like I say, it's probably psychological baggage, but I tend to
> unconsciously dismiss/reject that sort of thing without a second
> though...  or maybe experience learned me my lesson (*cough* STL).

OK, let's get one thing straight here. Comparing Phobos to STL is truly
unfair. I spent almost 2 decades writing C++, and wrote code both using
STL and without (from when STL didn't exist yet), and IME, Phobos's
range algorithms are *orders* of magnitude better than STL in terms of
usability. At least. In STL, you have to always manage pointer pairs,
which become a massive pain when you need to pass multiple pairs around
(very error-prone, transpose one argument, and you have a nice segfault
or memory corruption bug).  Then you have stupid verbose syntax like:

	// You can't even write the for-loop conditions in a single
	// line!
	for (std::vector<MyType<Blah> >::iterator it =
		myContainer.start();
		it != myContainer.end();
		it++)
	{
		// What's with this (*smartPtr)->x nonsense everywhere?
		doSomething((*((*it)->impl)->myDataField);

		// What, I can't even write a simple X != Y if-condition
		// in a single line?! Not to mention the silly
		// redundancy of having to write out the entire chain of
		// dereferences to exactly the same object twice.
		if (find((*(*it)->impl)->mySubContainer, key) ==
			(*(*it)->impl)->mySubContainer.end())
		{
			// How I long for D's .init!
			std::vector<MyTypeBlah> >::iterator empty;
			return empty;
		}
	}

Whereas in D:

	foreach (item; myContainer) {
		doSomething(item.impl.myDataField);
		if (!item.mySubContainer.canFind(key))
			return ElementType!MyContainer.init;
	}

There's no comparison, I tell you. No comparison at all.

> > > I actually feel a lot more productive in D than in C++ with
> > > strings.  Boost's string algorithms library helps fill the gap
> > > (and at least you only have one place to look for documentation
> > > when you are using it) but overall I prefer my experience working
> > > in D with pseudo-member chains.
> >
> > I found that what I got out of taking the time to learn
> > std.algorithm and std.range was worth far more than the effort
> > invested.
> >
> 
> Perhaps you're right. But I think there's ***HUGE*** room for
> improvement.  The key in your sentence is, it shouldn't require
> 'effort'; if it's not intuitive to programmers with decades of
> experience, then there are probably some fundamental design (or
> documentation/accessibility) deficiencies that needs to be
> prioritised. How is any junior programmer meant to take to D?

No offense, but IME, junior programmers tend to pick up these things
much faster than experienced programmers with lots of baggage from other
languages, precisely because they don't have all that baggage to slow
them down. Old habits die hard, as they say.

That's not to say that the D docs don't need improvement, of course. But
given all your objections about Phobos algorithms despite having barely
*used* Phobos, I think the source of your difficulty lies more in the
baggage than in the documentation. :)

T

-- 
Give me some fresh salted fish, please.