dataframe implementations
Laeeth Isharc via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Nov 18 09:06:46 PST 2015
On Tuesday, 17 November 2015 at 13:56:14 UTC, Jay Norwood wrote:
> I looked through the dataframe code and a couple of comments...
>
> I had thought perhaps an app could read in the header info and
> type info from hdf5, and generate D struct definitions with
> column headers as symbol names. That would enable faster
> processing than with the associative arrays, as well as support
> the auto-completion that would be helpful in writing
> expressions.
Yes - I think that one will want to have a choice between this
kind of approach and using associative arrays. Because for some
purposes it's not convenient to have to compile code every time
you open a strange file, and on the other hand the hit with an AA
sometimes will matter.
The situation at the moment for me is that I have very little
time to work on a correct general solution for this problem
myself (yet its important for D that we do get to one). I also
lack the experience with D to do it very well very quickly. I do
have a couple of seasoned people from the community helping me
with things, but dataframes won't be the first thing they look
at, and it could be a while before we get to that. If we
implement for our own needs,then I will open source it as it is
commercially sensible as well as the right thing to do. But that
could be a year away.
Vlad Levenfeld was also looking at this a bit.
> The csv type info for columns could be inferred, or else stated
> in the reader call, as done as an option in julia.
>
> In both cases the column names would have to be valid symbol
> names for this to work. I believe Julia also expects this, or
> else does some conversion on your column names to make them
> valid symbols. I think the D csv processing would also need to
> check if the
>
> The jupyter interactive environment supports python pandas and
> Julia dataframe column names in the autocompletion, and so I
> think the D debugging environment would need to provide similar
> capability if it is to be considered as a fast-recompile
> substitute for interactive dataframe exploration.
Well we don't need to get there in a single bound - already just
being able to do this at all is a big improvement, and I am
already using D with jupyter to do things.
> It seems to me that your particular examples of stock data
> would eventually need to handle missing data, as supported in
> Julia dataframes and python pandas. They both provide ways to
> drop or fill missing values. Did you want to support that?
Yes - we should do so eventually, and there's much more that
could be done. But maybe a sensible basic implementation is a
start and we can refine after that.
I wrote the dataframe in a couple of evenings, so I am sure it
can be improved, and even rearchitected. Pull requests welcomed,
and maybe we should set up a Trello to organise ideas ? Let me
know if you are in.
More information about the Digitalmars-d-learn
mailing list