The new std.process is ready for review

Mon Feb 25 12:06:18 PST 2013

On Monday, 25 February 2013 at 15:07:47 UTC, Vladimir Panteleev 
wrote:
> On Monday, 25 February 2013 at 06:41:32 UTC, Lars T. 
> Kyllingstad wrote:
>> Sure, I can think of another example.  But I wouldn't read too 
>> much into this one; it was never meant as a demonstration of 
>> the "correct" way to open a web page.   It was just a simple 
>> example of spawnProcess() usage that uses a cross-platform 
>> application everyone's heard of.
>>
>> After all, you *could* argue this way about almost any kind of 
>> application which wasn't just invented for the sake of the 
>> example.  (In the last one, shouldn't we open the user's 
>> preferred word processor, etc?)
>
> The question is, what is the intent? Is it to just open some 
> URL, or to specifically start Firefox? The same applies to the 
> word processor case - if the document is in a file format 
> understood by several applications, is the intent to simply 
> open the document, or to open the document in that specific 
> application?
>
> Now, the documentation clearly says that the example 
> specifically launches Firefox. However, that doesn't mean that 
> someone won't reach out for that example when hastily putting 
> together an application that needs to open an URL. After all, 
> it's at the top of the file, and they may not even know about 
> the existence of the browse function which actually does what 
> they intend.
>
> How about using "lynx -dump http://dlang.org/"? Dumping a text 
> representation of a webpage is a feature specific to lynx, so 
> the intent is clearer.

That is also incredibly obscure.  I'd venture a guess that only 
~10% of D's user base have even heard of Lynx.  Everyone knows 
firefox, and will understand what the example is supposed to 
illustrate.  (I admit that the ls/grep examples will also be 
rather incomprehensible to someone not familiar with the *NIX 
command line, and I will replace them with something else.  The D 
toolchain, as you suggest below, is a very good idea.)

BTW, browse() should never have been added to std.process, in my 
opinion.  Maybe to some other utility module, but then it should 
at least be done right, and be properly documented.  What does it 
actually do?  There is no way to tell unless you read the source. 
  (And then, it turns out that it spawns a new process for the 
browser and returns immediately, but it does not return a process 
ID that you can poll or wait for.  Bah.)

>>> 2. (Nitpick) The grep example uses a POSIX quoting syntax 
>>> (single quotes). Would be better to use double quotes, or 
>>> pass as array to avoid one more needless OS-specific element.
>>
>> Actually, the quotes can just be removed altogether.
>
> OK, and now it's worse: your example uses syntax that's 
> specific to std.process2. If you type that command in the 
> shell, you'll get different behavior (the backslash will escape 
> the . as a shell escape, not a RE escape).

[I am going to let slip here that you almost have me convinced 
with many of your arguments below, but I am still going to play 
devil's advocate for a bit.]

It is not worse.  It is a lot simpler, because the programmer 
does not need to know anything about the underlying platform.  
They only need to know one rule:  If your arguments contain 
spaces, use the array functions.  I don't think the generic 
process-spawning functions in std.process should be bound by, or 
tied to, the syntax of whatever shell the programmer (or the end 
user) prefers.

[...]

>> Personally, I don't think they should be part of the public 
>> API.  They are inherently platform-specific, and we've tried 
>> to keep the module as platform-agnostic as possible.
>
> Constructing scripts is bound to be platform-specific. The 
> current module version allows constructing batch files on POSIX.
>
> Here's a practical use case example for this feature: DMD uses 
> the same syntax for response files on all platforms, and it 
> follows the Windows command-line parsing rules. Currently, rdmd 
> uses escapeWindowsArgument to build that response file on all 
> platforms.

Point taken.

>> Besides, they are not really usable with any of the other 
>> functions, and I am afraid it will be interpreted that way if 
>> we make them public.
>
> This is actually a design problem in the new module, which I 
> haven't discussed yet. Have a look at the very last example in 
> the current std.process docs. How do you accomplish that 
> correctly in the new version, without manually piping the 
> inputs yourself? You can't.

I grudgingly admit that this is true.

>> [...]
>>
>> The way it is now, the rules (if you can call it that) are 
>> exceedingly simple, and they are the same on all platforms.  
>> This has the added benefit of discouraging platform-dependent 
>> client code.
>
> OK, then picture the following situation.
>
> A user of the new module starts using the module, and invokes a 
> specific command using the spawnProcess overload that takes it 
> as a single string. Convenient, right? Then, as the program 
> evolves, the string becomes an enum, then a config variable, 
> which the user can adjust.
>
> Then, a end-user tries setting the config variable to a path 
> that contains spaces, and everything breaks. Wrapping the path 
> in quotes does not help either. Due to the way the function is 
> designed, it is IMPOSSIBLE for the end-user to configure the 
> application to launch a program located at a path containing 
> spaces. To end-users, this comes off as a classical problem in 
> badly written applications that don't handle command-line 
> escaping properly.

Exposing the specifics of whatever programming language you are 
using to the end user?  I would just call that bad application 
programming.  If anything, you should be using one of the 'shell' 
functions in this case, not spawnProcess.

> This problem is as with any case of an interface which works in 
> simple cases, but behaves unexpectedly in more complicated 
> cases: it is bad design (convenience or not), and must be 
> avoided.
>
> I suggest that either the overloads which take a single string 
> be removed, or that they spawn a shell instead, and let the 
> shell do the command-line splitting. Together with my command 
> and filename escaping functions, they should allow the user to 
> achieve any combination of executing commands with arbitrary 
> punctuation in the program path or arguments, as well as 
> redirecting the output to files (again, with correctly-escaped 
> filenames) or other programs using the existing shell syntax 
> present on both platforms.

You almost have me convinced that the single-string non-shell 
functions must go.  In the case of pipeProcess() and execute(), 
pipeShell() and shell() do the same job, with (arguably, still) 
less surprises.  Maybe it would then be a good idea to add a 
spawnShell() function to go with spawnProcess().

The escape*() functions need to be better documented.  Am I 
correct that they quote according to 'cmd.exe' rules on Windows 
and 'sh' rules on POSIX?

Lars