stdx.data.json needs a layer on top
Laeeth Isharc via Digitalmars-d
digitalmars-d at puremagic.com
Tue Jun 23 12:22:54 PDT 2015
On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote:
>> As I understand it, there is a gap between what you can
>> currently do
>> with std.json (and indeed vibed json) and what you can do with
>> stdx.data.json. And the capability falls short of what can be
>> done in
>> other standard libraries such as the ones for python.
>>
>> So since we are going for a nuclear-power station included
>> approach,
>> does that not mean that we need to specify what this layer
>> should do,
>> and somebody should start to work on it?
>
> One thing. which I consider the most important missing building
> block, is Jacob's anticipated std.serialization module [1]*.
> Skipping the data representation layer and going straight for a
> statically typed access to the data is the way to go in a
> language such as D, at least in most situations.
Thanks, Sonke. I appreciate your taking the time to reply, and I
hope I represented my understanding of things correctly. I think
often things get stuck in limbo because people don't know what's
most useful, so I do think a central list of "things that need to
be done" in D ecosystem might be nice, if it doesn't become
excessively structured and bureaucratic. (I ain't volunteering
to maintain it, as I can't commit to it).
Thing is there are different use cases. For example, I pull data
from Quandl - the metadata is standard and won't change in format
often; but the data for a particular series will. For example if
I pull volatility data that will have different fields to price
or economic data. And I don't know beforehand the total set of
possibilities. This must be quite a common use case, and indeed
I just hit another one recently with a poorly-documented internal
corporate database for securities.
Maybe it's fine to generate the static typing in response to
reading the data, but then it ought to be easy to do so
(ultimately). Because otherwise you hack something up in Python
because it's just easier, and that hack job becomes the basis for
something larger then you ever intended or wanted and it's never
worth rewriting given the other stuff you need.
But even if you prefer static typing generated on the fly (which
maybe becomes useful via introspection a la Alexandrescu talk),
sometimes one will prefer dynamic typing, and since it's easy to
do in a way that doesn't destroy the elegance and coherence of
the whole project, why not give people the option ? It seems to
me that Guido painted a target on Python by saying "it's fast
enough, and you are usually I/O etc bound", because the numerical
computing people have different needs. So BLAS and the like may
be part of that, but also having something like pandas - and the
ability to get data in and out of it - would be an important part
in making it easy and fun to use D for this purpose, and it's not
so hard to do so, just a fair bit of work. Not that it makes
sense to undergo a death march to duplicate python functionality,
but there are some things that are relatively easy that have a
high payoff - like John Colvin's pydmagic.
(The link here, which may not be so obvious, is that in a way
pandas is a kind of replacement for a spreadsheet, and being able
to just pull stuff in without minding your 'p's and 'q's to get a
quick result lends itself to the kind of iterative exploration
that makes spreadsheets still overused even today. And that's
the link to JSON and (de)-serialization).
> Another part is a high level layer on top of the stream parser
> that exists for a while (albeit with room for improvement), but
> that I forgot to update the documentation for. I've now caught
> up on that and it can be found under [2] - see the read[...]
> and skip[...] functions.
Thank you for the link.
>
> Do you, or anyone else, have further ideas for higher level
> functionality, or any concrete examples in other standard
> libraries?
Will think it through and try to come up with some simple
examples. Paging John Colvin and Russell Winder, too.
> * Or any other suitable replacement, if that doesn't work out
> for some reason. The vibe.data.serialization module to me is
> not a suitable candidate as it stands, because it lacks some
> features of Jacob's solution, such as proper handling of
> (duplicate/interior) references. But it's a perfect fit for my
> own class of problems, so I currently can't justify to put work
> into this either.
Is it worth you or someone else trying to articulate well what it
does well that is missing from stdx.data.json?
More information about the Digitalmars-d
mailing list