std.jgrandson
Sönke Ludwig via Digitalmars-d
digitalmars-d at puremagic.com
Sun Aug 3 11:03:52 PDT 2014
Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:
> On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
>> A few thoughts based on my experience with vibe.data.json:
>>
>> 1. No decoding of strings appears to mean that "Value" also always
>> contains encoded strings. This seems the be a leaky and also error prone
>> leaky abstraction. For the token stream, performance should be top
>> priority, so it's okay to not decode there, but "Value" is a high level
>> abstraction of a JSON value, so it should really hide all implementation
>> details of the storage format.
>
> Nonono. I think there's a confusion. The input strings are not UTF
> decoded for the simple need there's no need (all tokenization decisions
> are taken on the basis of ASCII characters/code units). The
> backslash-prefixed characters are indeed decoded.
>
> An optimization I didn't implement yet is to use slices of the input
> wherever possible (when the input is string, immutable(byte)[], or
> immutable(ubyte)[]). That will reduce allocations considerably.
Ah okay, *phew* ;) But in that case I'd actually think about leaving off
the backslash decoding in the low level parser, so that slices could be
used for immutable inputs in all cases - maybe with a name of
"rawString" for the stored data and an additional "string" property that
decodes on the fly. This may come in handy when the first comparative
benchmarks together with rapidjson and the like are done.
>> 2. Algebraic is a good choice for its generic handling of operations on
>> the contained types (which isn't exposed here, though). However, a
>> tagged union type in my experience has quite some advantages for
>> usability. Since adding a type tag possibly affects the interface in a
>> non-backwards compatible way, this should be evaluated early on.
>
> There's a public opCast(Payload) that gives the end user access to the
> Payload inside a Value. I forgot to add documentation to it.
I see. Suppose that opDispatch would be dropped, would anything speak
against "alias this"ing _payload to avoid the need for the manually
defined operators?
> What advantages are to a tagged union? (FWIW: to me Algebraic and
> Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
> That can be easily fixed for Algebraic by defining operations to access
> the index of the currently-stored type.)
The two major points are probably that it's possible to use "final
switch" on the type tag if it's an enum, and the type id can be easily
stored in both integer and string form (which is not as conveniently
possible with a TypeInfo).
> (...)
>
> The way I see it, good work on tagged unions must be either integrated
> within std.variant (either by modifying Variant/Algebraic or by adding
> new types to it). I am very strongly opposed to adding a tagged union
> type only for JSON purposes, which I'd consider essentially a usability
> bug in std.variant, the opposite of dogfooding, etc.
Definitely agree there.
An enum based tagged union design also currently has the unfortunate
property that the order of enum values and that of the accepted types
must be defined consistently, or bad things will happen. Supporting UDAs
on enum values would be a possible direction to fix this:
enum JsonType {
@variantType!string string,
@variantType!(JsonValue[]) array,
@variantType!(JsonValue[string]) object
}
alias JsonValue = TaggedUnion!JsonType;
But then there are obviously still issues with cyclic type references.
So, anyway, this is something that still requires some thought. It could
also be designed in a way that is backwards compatible with a pure
"Algebraic", so it shouldn't be a blocker for the current design.
More information about the Digitalmars-d
mailing list