std.jgrandson

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Sun Aug 3 08:14:45 PDT 2014


On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
> A few thoughts based on my experience with vibe.data.json:
>
> 1. No decoding of strings appears to mean that "Value" also always
> contains encoded strings. This seems the be a leaky and also error prone
> leaky abstraction. For the token stream, performance should be top
> priority, so it's okay to not decode there, but "Value" is a high level
> abstraction of a JSON value, so it should really hide all implementation
> details of the storage format.

Nonono. I think there's a confusion. The input strings are not UTF 
decoded for the simple need there's no need (all tokenization decisions 
are taken on the basis of ASCII characters/code units). The 
backslash-prefixed characters are indeed decoded.

An optimization I didn't implement yet is to use slices of the input 
wherever possible (when the input is string, immutable(byte)[], or 
immutable(ubyte)[]). That will reduce allocations considerably.

> 2. Algebraic is a good choice for its generic handling of operations on
> the contained types (which isn't exposed here, though). However, a
> tagged union type in my experience has quite some advantages for
> usability. Since adding a type tag possibly affects the interface in a
> non-backwards compatible way, this should be evaluated early on.

There's a public opCast(Payload) that gives the end user access to the 
Payload inside a Value. I forgot to add documentation to it.

What advantages are to a tagged union? (FWIW: to me Algebraic and 
Variant are also tagged unions, just that the tags are not 0, 1, ..., n. 
That can be easily fixed for Algebraic by defining operations to access 
the index of the currently-stored type.)

> 2.b) I'm currently working on a generic tagged union type that also
> enables operations between values in a natural generic way. This has the
> big advantage of not having to manually define operators like in
> "Value", which is error prone and often limited (I've had to make many
> fixes and additions in this part of the code over time).

I did notice that vibe.json has quite a repetitive implementation, so 
reducing it would be great.

The way I see it, good work on tagged unions must be either integrated 
within std.variant (either by modifying Variant/Algebraic or by adding 
new types to it). I am very strongly opposed to adding a tagged union 
type only for JSON purposes, which I'd consider essentially a usability 
bug in std.variant, the opposite of dogfooding, etc.

> 3. Use of "opDispatch" for an open set of members has been criticized
> for vibe.data.json before and I agree with that criticism. The only
> advantage is saving a few keystrokes (json.key instead of json["key"]),
> but I came to the conclusion that the right approach to work with JSON
> values in D is to always directly deserialize when/if possible anyway,
> which mostly makes this is a moot point.

Interesting. Well if experience with opDispatch is negative then it 
should probably not be used here, or only offered on an opt-in basis.

> This approach has a lot of advantages, e.g. reduction of allocations,
> performance of field access and avoiding typos when accessing fields.
> Especially the last point is interesting, because opDispatch based field
> access gives the false impression that a static field is accessed.

Good point.

> The decision to minimize the number of static fields within "Value"
> reduces the chance of accidentally accessing a static field instead of
> hitting opDispatch, but there are still *some* static fields/methods and
> any later addition of a symbol must now be considered a breaking change.

Right now the idea is that the only named member is __payload. Well then 
there's opXxxx as well. The idea is/was to add all other functionality 
as free functions.

> 3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and
> "assume" certainly look like they could be used with UFCS, but
> opDispatch destroys that possibility.

Yah, agreed.

The bummer is people coming from Python won't be able to continue using 
the same style without opDispatch.

> 4. I know the stance on this is often "The D module system has enough
> facilities to disambiguate" (which is not really a valid argument, but
> rather just the lack of a counter argument, IMO), but I highly dislike
> the choice to leave off any mention of "JSON" or "Json" in the global
> symbol names. Using the module either requires to always use a renamed
> import or a manual alias, or the resulting source code will always leave
> the reader wondering what kind of data is actually handled. Handling
> multiple "value" types in a single piece of code, which is not uncommon
> (e.g. JSON + BSON/ini value/...) would always require explicit
> disambiguation. I'd certainly include the "JSON" or "Json" part in the
> names.

Good point, I agree.

> 5. Whatever happens, *please* let's aim for a module name of
> std.data.json (similar to std.digest.*), so that any data formats added
> later are nicely organized. All existing data format support (XML + CSV)
> doesn't follow contemporary Phobos style, so they will need to be
> deprecated at some point anyway, freeing the way for a clean an
> non-breaking transition to a more organized module hierarchy.

I agree.

> 6. (Possibly compile time optional) support for keeping track of
> line/column numbers is often important for better error messages, so
> that would be good to have included as part of the parser and in the
> "Token" type.

Yah, saw that in vibe.d but forgot about it.


Thanks,

Andrei




More information about the Digitalmars-d mailing list