[GSoC] Dataframes for D

Prateek Nayak lelouch.cpp at gmail.com
Thu May 30 03:38:50 UTC 2019


On Wednesday, 29 May 2019 at 18:41:28 UTC, jmh530 wrote:
> On Wednesday, 29 May 2019 at 18:00:02 UTC, Prateek Nayak wrote:
>> [snip]
>
> Glad to see progress being made on this!
>
> Somewhat tangentially on the interoperability, I have also made 
> quite a bit of use with R's data.frames. One difference between 
> that and what I have seen of the implementation is that R's 
> data.frames allow for different columns to be different types. 
> This makes certain kinds of analysis of groups very easy. For 
> instance, right now I'm working with a dataset whose columns 
> are doubles, dates, integers, bools, and strings. I can do the 
> equivalent of groupby on the strings as "factors" in R and it's 
> pretty straightforward to get everything working nicely.

The DataFrame currently uses Mir's ndslice at the core of it 
which allows for homogeneous data to be stored within it.
Right now, we are considering operable data to be homogeneous 
keeping the API simpler.
I'm not sure how something like Variant will play out in this 
scenario. It may allow for data to be flexible but parsing will 
probably require an assertion library.


More information about the Digitalmars-d mailing list