[GSoC] Dataframes for D

Prateeek Nayak lelouch.cpp at gmail.com
Thu Jul 18 15:34:32 UTC 2019


On Thursday, 18 July 2019 at 10:55:55 UTC, jmh530 wrote:
> On Thursday, 18 July 2019 at 05:03:38 UTC, Prateeek Nayak wrote:
>> [snip]
>
> Thanks for the update. I'm glad you're still making good 
> progress.
>
> I'm just looking over the readme.md. I noticed the "at" 
> function has a signature like at!(row, column)(). Because it 
> uses a template, doesn't that imply that the row and column 
> parameters must be known at compile-time? What if we want 
> run-time access using a function style instead of like df[0, 
> 0]? mir's ndslice also has a set of select functions that are 
> also useful for access.
>
> There's also a typo in the GroupBy text:
> "Group DataFrame based on na arbitrary number of columns."
>
> I noticed that you make a lot of use of static foreach's over 
> RowType in dataframe.d. Does that this means that this means 
> there isn't any extra cost if you use a homogeneous dataframe 
> with RowType.length == 1? If you can advertise that it doesn't 
> have any additional overhead for working with homogeneous, then 
> that's probably a win. You might also add a trait for 
> isHomogeneous that checks if RowType.length == 1.

* "at" was for a fast access to element. It's only necessary to 
know one of the two argument at compile time to be honest but 
df[i1, i2] has to be written as at!(i2)(i1) which reverses the 
two position hence I thought at!(i1, i2) could reduce some mishap 
that position reversal can cause.
I agree a method to access the element at runtime. I will 
overload at for that.

* Sorry about the typo, will fix it soon (^_^)

* The data in DataFrame is stored as TypeTuple which requires the 
column index to be known statically. When trying to do a runtime 
operation on data, I was forced to traverse the tuple statically 
to find the particular index. Homogeneous DataFrame defined as 
DataFrame!(int, 5) will give RowType as (int, int, int, int, int).
For now that overhead still exists but I think isHomogeneous 
template can open some new door for optimization. I will 
definitely look into this over the next week. Thanks for bringing 
it to my notice.



More information about the Digitalmars-d mailing list