dub: JSON, SDL, YAML, TOML, what color should we paint it ?

Witold witold.baryluk+dlang at gmail.com
Wed Mar 8 00:11:30 UTC 2023


On Tuesday, 28 February 2023 at 14:29:28 UTC, Mathias LANG wrote:
> Obviously such a change would not happen overnight, and would 
> need broad support from the community. Opinions ?

Pure JSON. And only it.

Reason? It is rather simple, and super compatible. Easy to parse 
from anywhere, JavaScript, Python, Ruby, PHP, C++, C, syntax 
highliters on website, web frameworks, editors, formatters, 
command line tools (jq), etc. Do not assume dub files will only 
be consumed by dub.

YAML is horrible in my opinion. I use it a lot, and from a 
distance it looks nicer than JSON (comments, less verbosity, less 
quoting, etc), but it is not good in the long run. 1) YAML Parser 
are super complex. 2) backreferences are complicated. 3) multiple 
ways of doing same thing (strings, arrays, dicts), so you cannot 
easily read and write it back programatically, without likely 
messing diffs, 4) too many damn ways to write strings, 5) no 
quoting on values, causes issues when that value accidentally is 
integer/float-like or boolean-like (including word `no`). I hate 
this, and I just quote absolutely everything because of this, 
which defeats a big part of yaml. 6) multi-document feature is an 
anti-feature. 7) slow. 8) slow. 9) There is a lot of 
implementations, and in reality they all differ a bit in minor 
details. 10) slow. 11) Some extra features are just broken (like 
date parsing).

I use YAML a lot, in Ansible, Prometheus, Kubernetes, Github, 
Docker, and few other frequently used project. I never use it for 
personally build projects, because I do not like its complexity.

Only thing that would be nice in JSON to have: comments, trailing 
commas. JSON5 you say. I say no. Why? Compatibility. One can live 
without commas. Comments can be emulated using object keys, i.e. 
starting with underscore, and ignore them during processing. 
Multi-document can be easily done by just having JSON after JSON 
in one file (most parsers will just parse one at the time, and 
allow you to parse next object). Not that dub needs this feature 
anyway.

I do not like JSON either, but I do not like YAML, JSON5 and TOML 
even more. SDL is too XML-like (with attributes), but do not map 
nicely to processing in most programming languages (i.e. it is 
not just a dict / aa), and often require akward XML-like / 
DOM-like parsing, which is also more complex than it needs to be.

I would say YAML is okish, if the files are not too big, and you 
edit them literally every day. But the ones in dub, you edit only 
few very few times. So its human friendliness isn't really a good 
selling point to me.

But YAML spec is so big, has (or had) so many bugs, and issues, 
that I consider it horrible language.

Every few weeks I have some YAML issues, be it in Python, Go, 
Ansible. We even had few production outages caused by YAML 
idiotic parsing rules.

Simple, fast, and universal, is better than complex, slow and 
niche.

> But JSON is a terrible format to write configurations in, given 
> how verbose it is, and it lacking support for comments.

I do not agree with this statement.

You want comments. Just add them as underscore-prefixed keys. Or 
use `//`, which is rather easy to strip before passing to other 
tools.

If you want to comment some part of the config temporarily, then 
just remove it. Most people use version control. It will be in 
their history.

I would not consider JSON really a configuration language. It is 
more of a storage and data transfer format. Configuration 
languages are different things, there few decent ones out there, 
like jsonnet, Hashicorp's HCL, Dhall, and few more. The 
interoperability issue of them is not an issue, as: 1) there are 
actually few implementations, 2) they are not used directly by 
any system, rather they are passed through processing, and simple 
(flat and dump) format is used as output (usually JSON) to be 
consumed by programs. I like proper configuration languages like 
jsonnet, because otherwise you end up in some horrible templating 
like jinja in Ansible, or craziness of Helm Charts, K8s 
Kustomize, which are all just horrible hacks with poor usability. 
But, for dub using proper configuration language, would be an 
overkill in my opinion.

Personally I would use text-encoded protocol buffers. Schema 
based validation and typing out of the box. I use protocol 
buffers and text-encoded ones for configs in most of my personnel 
projects (Go, D, C++, Python), but it does come with some other 
tradeoffs (proto buffer definition files, extra compilation step, 
which is easy to automate, but a barrier for some).

But adding anything new is just not a good idea long term. You 
will need to support all the formats for years.

I think way more important than format of config is better 
documentation and tooling. Nodejs and TypeScript / JavaScript 
people use huge JSON files as config to build system, and it all 
just works fine.


Cheers.


More information about the Digitalmars-d mailing list