stdx.data.json needs a layer on top

Tue Jun 23 12:22:54 PDT 2015

On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote:
>> As I understand it, there is a gap between what you can 
>> currently do
>> with std.json (and indeed vibed json) and what you can do with
>> stdx.data.json.  And the capability falls short of what can be 
>> done in
>> other standard libraries such as the ones for python.
>>
>> So since we are going for a nuclear-power station included 
>> approach,
>> does that not mean that we need to specify what this layer 
>> should do,
>> and somebody should start to work on it?
>
> One thing. which I consider the most important missing building 
> block, is Jacob's anticipated std.serialization module [1]*. 
> Skipping the data representation layer and going straight for a 
> statically typed access to the data is the way to go in a 
> language such as D, at least in most situations.

Thanks, Sonke.  I appreciate your taking the time to reply, and I 
hope I represented my understanding of things correctly.  I think 
often things get stuck in limbo because people don't know what's 
most useful, so I do think a central list of "things that need to 
be done" in D ecosystem might be nice, if it doesn't become 
excessively structured and bureaucratic.  (I ain't volunteering 
to maintain it, as I can't commit to it).

Thing is there are different use cases.  For example, I pull data 
from Quandl - the metadata is standard and won't change in format 
often; but the data for a particular series will.  For example if 
I pull volatility data that will have different fields to price 
or economic data.  And I don't know beforehand the total set of 
possibilities.  This must be quite a common use case, and indeed 
I just hit another one recently with a poorly-documented internal 
corporate database for securities.

Maybe it's fine to generate the static typing in response to 
reading the data, but then it ought to be easy to do so 
(ultimately).  Because otherwise you hack something up in Python 
because it's just easier, and that hack job becomes the basis for 
something larger then you ever intended or wanted and it's never 
worth rewriting given the other stuff you need.

But even if you prefer static typing generated on the fly (which 
maybe becomes useful via introspection a la Alexandrescu talk), 
sometimes one will prefer dynamic typing, and since it's easy to 
do in a way that doesn't destroy the elegance and coherence of 
the whole project, why not give people the option ?  It seems to 
me that Guido painted a target on Python by saying "it's fast 
enough, and you are usually I/O etc bound", because the numerical 
computing people have different needs.  So BLAS and the like may 
be part of that, but also having something like pandas - and the 
ability to get data in and out of it - would be an important part 
in making it easy and fun to use D for this purpose, and it's not 
so hard to do so, just a fair bit of work.  Not that it makes 
sense to undergo a death march to duplicate python functionality, 
but there are some things that are relatively easy that have a 
high payoff - like John Colvin's pydmagic.

(The link here, which may not be so obvious, is that in a way 
pandas is a kind of replacement for a spreadsheet, and being able 
to just pull stuff in without minding your 'p's and 'q's to get a 
quick result lends itself to the kind of iterative exploration 
that makes spreadsheets still overused even today.  And that's 
the link to JSON and (de)-serialization).

> Another part is a high level layer on top of the stream parser 
> that exists for a while (albeit with room for improvement), but 
> that I forgot to update the documentation for. I've now caught 
> up on that and it can be found under [2] - see the read[...] 
> and skip[...] functions.

Thank you for the link.
>
> Do you, or anyone else, have further ideas for higher level 
> functionality, or any concrete examples in other standard 
> libraries?

Will think it through and try to come up with some simple 
examples.  Paging John Colvin and Russell Winder, too.

> * Or any other suitable replacement, if that doesn't work out 
> for some reason. The vibe.data.serialization module to me is 
> not a suitable candidate as it stands, because it lacks some 
> features of Jacob's solution, such as proper handling of 
> (duplicate/interior) references. But it's a perfect fit for my 
> own class of problems, so I currently can't justify to put work 
> into this either.

Is it worth you or someone else trying to articulate well what it 
does well that is missing from stdx.data.json?