I hate new DUB config format

Sönke Ludwig via Digitalmars-d digitalmars-d at puremagic.com
Mon Nov 30 07:12:14 PST 2015


Am 27.11.2015 um 16:23 schrieb Walter Bright:
> On 11/26/2015 11:08 PM, Sönke Ludwig wrote:
>>> This looks like it's creeping towards inventing a new script programming
>>> language. Adding loops, switch statements, functions, etc., can't be far
>>> off. Before you get too far down this path, consider:
>>
>> Actually, no! Conditionals and loops are the only constructs - switch
>> is a
>> possibility, but basically nothing else. There will also never be
>> variables,
>> just constants. There is a definitive limit, namely when it becomes
>> impossible
>> to reason about the code in a generic way, without "executing" it, so in
>> particular anything that would make it touring complete is a no-go - no
>> recursion, no loop flow control statements, no goto. In fact, there
>> are no
>> "statements" at all, these are all purely declarative "directives".
>
> I would say to that: "famous last words". As Exhibit A, I submit 'static
> if', which has been getting increasing pressure to augment with loops.

It's hard to make guarantees, true. But at least "static foreach" has 
always been a relatively obvious candidate, and at the same time there 
is a well defined limit in case of the package recipe format.

>>> 1. JSON has a superset programming language - Javascript - which has
>>> conventional syntax rather than the DEP4 proposal for odd syntax like
>>>
>>>      if dub-version="<0.9.24"
>>>
>>> which I would strongly recommend against. And, we already have a
>>> Javascript engine written in D:
>>>
>>>      https://github.com/DigitalMars/DMDScript
>>>
>>> 2. D itself can be used as a scripting language (when # is the first
>>> character in the source code). Having DUB use this might be quite
>>> interesting.
>>
>> On one hand that means that now you have to take care of security
>> issues (holes
>> in the scripting engine/compiler or DoS attacks of various sorts) when
>> you want
>> to use this on a server (code.dlang.org).
>
> You have to deal with that even if just plain json or sdl. After all,
> the implementation of those formats could be susceptible to buffer
> overflow or DoS as well. But this is less likely with json, because
> you'd be using a well-used json parser rather than your own sdl parser
> that is only used for Dub.

The important difference is that a JSON/SDL parser has a vastly lower 
complexity than a scripting engine and, more importantly, the source 
file is just parsed in a linear fashion, without any arbitrary runtime 
execution. So when just parsing the format, making sure that the file is 
below a certain maximum size is enough to prevent typical DoS vectors.

For scripts, you'd at least have to be able to terminate after a certain 
time (but even with a relatively low timeout, say 5 seconds, it would be 
easy to bring the system down temporarily, by e.g. publishing a bunch of 
package versions at once). And if things like file system or network 
access are possible, the execution would realistically have to be moved 
to a sandbox (VM/chroot) environment to be safe.

> (Yes I saw later that you use it in some
>> other projects, but does it see use outside of your own things?)

The current version of the sdlang-d package has been downloaded 83 times 
(DUB not counted) and there are GitHub issues opened by about 13 
different people, so it's definitely used for other projects, even if 
not yet hugely popular.

> Javascript can only interact with its environment using the DOM. If Dub
> presented its own DOM to the js engine, there isn't much the js code can
> do other than go into an infinite loop, recursive overflow, or exploit a
> buffer overflow.

This is where I'd see a similar problem to the "static foreach" one 
above. I'm pretty sure that people would start to ask for functions to 
access the file system, or to run arbitrary commands (which is fine on a 
local developer machine). It will be hard to argue against adding 
features that are so straight forward to implement.

>> Once there are big numbers of
>> packages, this could also mean that the hardware eventually needs to
>> be upgraded
>> when it would have done fine for a long time with a tiny declarative
>> parser.
>
> I would think these problems have all been solved with Javascript, since
> it is used so extensively. Javascript is also a lightweight scripting
> language.

If the script is just a linear setup of the same fields as the current 
JSON/SDLang recipe then yes. But it's hard to predict what people will 
do with it. They might well go crazy and generate source code or other 
things that could take quite some time. It's just speculation, but the 
risk is there that this might considerably increase the load in the long 
run.

>> On the other hand, it's not possible with a script to make general
>> predictions
>> of how a package would behave, for example the script could select a
>> dependency
>> based on some environment variable or a file that is only defined on
>> the target
>> system.
>
> That goes back to restricting the DOM.

True, but the pressure to add more power to the DOM will most likely be 
high.

>> Finally, it's always possible to switch from declarative to script
>> without
>> loosing expressive power, but not necessarily the other way around.
>
> True, but consider this. JSON is a subset of Javascript. That means you
> could add a subset of Javascript to JSON, i.e. just the if statements.
> You'll have a clear design for it, and a clear path for how to do
> further enhancements.

The fundamental difference is that JSON just describes a single value, 
while a JS file describes a program. So while a subset of JS would be an 
option, it would still mean a completely different appearance for the 
package recipe files. And of course this really is inventing a new 
language ("why doesn't ... work if this is JS?").

>>> "With a standard json parser in Phobos, zip zap boom you're done. You
>>> don't have to design it, argue about it, build it, document it, debug
>>> it, test it, optimize it, explain it, deal with bug requests, deal with
>>> enhancement requests, deal with legacy compatibility, build a converter,
>>> build a gui tool for it, etc."
>>
>> Let's say this isn't really an argument anymore now that it has
>> already been
>> done,
>
> The existence of the DEPs suggest otherwise,

The SDLang format is just affected as a side-effect of 2 of those DEPs - 
just like the JSON format is. So, of course all supported formats have 
to be maintained and extended over time, but those are really quite rare 
occasions and the big majority of work is agnostic to the file format.

> the number of posts in this
> thread suggest otherwise,

The number of posts in this thread has multiple reasons, I'd argue that 
it's questionable to draw conclusions from that. Also, you need to 
contrast this to the amount of posts that complained about JSON, or 
those that would have happened for a different format choice.

> the calls for a gui editor suggest otherwise,

That has nothing to do with SDLang or not (at least as far as I 
understand it).

> the customer "should I use json or sdl" makes for an ongoing support
> problem,

I can't remember that that has happened. The current situation is that 
SDLang is endorsed as the recommended format and those who are used to 
the JSON one can just continue to use it if they want.

> no current means to convert between the two, etc.

That's really trivial to add, though. There is a generic internal 
representation and the only thing missing is the conversion back to SDLang.

>> but it wouldn't have been a strong argument anyway, because the SDLang
>> parser is actually in use for other projects as well, so it has to be
>> maintained
>> anyway. There really is very little investment necessary
>> development-wise, I
>> think it took me maybe three to four hours total to implement it,
>> including the
>> support on code.dlang.org. Creating the sdlang-d library itself (by Nick
>> Sabalausky) was of course a bigger task, as were the discussions and
>> the design
>> process.
>
> The time for JSON was zero. You're a key developer here, and your time
> is very valuable. I can't tell you what to work on, but I can't be quiet
> about spending time on things with such marginal utility (and yes, I
> waste time, too). By using sdl, though, you're also spending other
> peoples' time, even if it's just "which format should I use for my
> project?" and then the D forum members have to advise them.

Again, I haven't seen that question so far if I remember right. But this 
also leaves out the reason why SDLang support was added in the first 
place: To improve the experience of working with package recipes. A lot 
more people are going to do that a lot more frequently, so that even a 
small amount of reduced friction is going to be likely to save overall time.

And sometimes small things can have a great impact. (The original) D is 
a good example, a lot of its appeal came from seemingly trivial syntax 
changes, but those actually often make a big difference in readability 
and developer focus.

Of course that computation may not hold if we just compare the time that 
it saves/costs the D contributors alone. But I wonder how many new 
features in general will actually save overall time if you just look at 
the core contributors.

>> But apart from that, finding a format that a) allows (real) comments
>> and b) has
>> less syntax noise was necessary in any case. Sure, JSON *works*, but
>> it becomes
>> really unpleasant with more complicated files, and the whole
>> {"comment": "..."}
>> approach is nothing but an ugly and highly inconvenient hack, both
>> when writing
>> and when reading it.
>
> I'm not accepting the "ugly and highly inconvenient hack" argument in
> the light of the DEP4 proposal for conditional syntax that I already
> commented on. And, as mentioned before, I use $(COMMENT ...) in Ddoc and
> it works out quite nicely, even though Ddoc has no syntax for comments.

True, DEP4 definitely pushes the boundary of what is naturally 
representable with SDLang. But JSON files generally already have such a 
convoluted appearance that it simply becomes painfully involved and 
error prone to maintain them starting from a certain size. Since 
comments are mainly useful for larger documents, in the form of 
"comment" fields, they would make it even harder to read and maintain those.

If we are talking about how DEP4 looks for SDLang, just imagine how it 
would look for JSON...

>
> And if comments were the only reason to use sdl, and a solid case was
> made for them vs my suggestion, I'd vastly prefer adding /**/ to the
> json support rather than switching to an apparently dead format.

It just has to be clear that it's not JSON anymore what we use then 
(interoperability). Of course comments are not the only reason, but I 
think it's safe to say that they are one of the two most important ones. 
The other one is that the XML-like structure of SDLang lends itself much 
better for the task (unfortunately XML is even more involved to 
read/write than JSON).

>> And the fact is that no matter which other format we would
>> have chosen (JSON with comments is also another language) we'd have these
>> bikeshedding discussions.
>
> Sticking with json would enable you to simply ignore it. But you've been
> pretty much forced to engage in this one.

Maybe it would have, maybe certain actions would still require to react. 
I don't have absolute numbers, but the complaints against JSON so far 
can probably easily rival those against SDL.


The way I see it:
  - It's clear that no solution will make everybody happy
  - The number of opponents for each format has shown to be in the same 
order of magnitude
  - The number of proponents is always hard to judge, because most of 
them usually stay quiet
  - Talking about purely declarative formats, popularity is hardly a 
strong argument anyway, because most people will still have to learn a 
new format (outside of JSON or XML)
  - SDLang is so simple and intuitive to C-family developers that there 
is almost nothing to learn

Based on this I'd rather concentrate at how well a format is suited for 
the particular task. Ideally that will result in a good format gaining 
some popularity (seems to be the case with TOML and Rust).


More information about the Digitalmars-d mailing list