std.jgrandson

Sun Aug 3 02:38:05 PDT 2014

A few thoughts based on my experience with vibe.data.json:

1. No decoding of strings appears to mean that "Value" also always 
contains encoded strings. This seems the be a leaky and also error prone 
leaky abstraction. For the token stream, performance should be top 
priority, so it's okay to not decode there, but "Value" is a high level 
abstraction of a JSON value, so it should really hide all implementation 
details of the storage format.

2. Algebraic is a good choice for its generic handling of operations on 
the contained types (which isn't exposed here, though). However, a 
tagged union type in my experience has quite some advantages for 
usability. Since adding a type tag possibly affects the interface in a 
non-backwards compatible way, this should be evaluated early on.

2.b) I'm currently working on a generic tagged union type that also 
enables operations between values in a natural generic way. This has the 
big advantage of not having to manually define operators like in 
"Value", which is error prone and often limited (I've had to make many 
fixes and additions in this part of the code over time).

3. Use of "opDispatch" for an open set of members has been criticized 
for vibe.data.json before and I agree with that criticism. The only 
advantage is saving a few keystrokes (json.key instead of json["key"]), 
but I came to the conclusion that the right approach to work with JSON 
values in D is to always directly deserialize when/if possible anyway, 
which mostly makes this is a moot point.

This approach has a lot of advantages, e.g. reduction of allocations, 
performance of field access and avoiding typos when accessing fields. 
Especially the last point is interesting, because opDispatch based field 
access gives the false impression that a static field is accessed.

The decision to minimize the number of static fields within "Value" 
reduces the chance of accidentally accessing a static field instead of 
hitting opDispatch, but there are still *some* static fields/methods and 
any later addition of a symbol must now be considered a breaking change.

3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and 
"assume" certainly look like they could be used with UFCS, but 
opDispatch destroys that possibility.

4. I know the stance on this is often "The D module system has enough 
facilities to disambiguate" (which is not really a valid argument, but 
rather just the lack of a counter argument, IMO), but I highly dislike 
the choice to leave off any mention of "JSON" or "Json" in the global 
symbol names. Using the module either requires to always use a renamed 
import or a manual alias, or the resulting source code will always leave 
the reader wondering what kind of data is actually handled. Handling 
multiple "value" types in a single piece of code, which is not uncommon 
(e.g. JSON + BSON/ini value/...) would always require explicit 
disambiguation. I'd certainly include the "JSON" or "Json" part in the 
names.

5. Whatever happens, *please* let's aim for a module name of 
std.data.json (similar to std.digest.*), so that any data formats added 
later are nicely organized. All existing data format support (XML + CSV) 
doesn't follow contemporary Phobos style, so they will need to be 
deprecated at some point anyway, freeing the way for a clean an 
non-breaking transition to a more organized module hierarchy.

6. (Possibly compile time optional) support for keeping track of 
line/column numbers is often important for better error messages, so 
that would be good to have included as part of the parser and in the 
"Token" type.

Sönke