[GSoC] Dataframes for D

Prateek Nayak lelouch.cpp at gmail.com
Wed Jul 3 05:04:20 UTC 2019

On Saturday, 29 June 2019 at 04:39:39 UTC, Prateek Nayak wrote:
> [snip]

Weekly Update

I caught a flu this week which was really unfortunate. However 
I'm getting better and the work is going forward :)

What happened last week

I mostly dealt with the internal structure of `Group` - the 
structure that is returned during groupBy operation.
First I thought an array of DataFrames might be a good idea but 
soon dropped the idea as it would mean some parts - like the 
column index remain same but need to be copied to every DataFrame 
structure in the array and its just a waste of space at that 
The implementation now looks somewhat similar to the DataFrame 
structure itself - there is an `Index` and `data`. Indexes are 
sorted based on the groups formed as a result of groupBy.

There are few places where optimizations can be made [mostly wrt 
space used] and I'll work on it this week.

Some of the functionality added to `Group` so far:
* display - User can choose to display a singe group or multiple 
* combine - returns a DataFrame combining the groups user would 

At this point there was a need fora function in DataFrame which 
could convert a level of indexing to a column of operable data if 
required. This is because combine on groupBy doesn't remember the 
position from where the data was extracted. Hence if a level of 
data is used for groupBy, it would automatically be converted to 
a level of index in the result of combine. Hence `indexToData` 
was added to revert the result of this if the user desires so.

There were a few minor updates here and there. Nothing major. 
They include a new argument for `extends` in `Index` which can 
now insert the index at the position of user's choice. The other 
was stripping of trailing white spaces which appeared in display.

What will happen this week
This week will deal with optimizations of Group, add binary 
operations to `Group` which may be helpful. Document the changes 
once stability is reached. Start work on aggregate/join.

I can't spot any major roadblocks up ahead. Work should go 
smoothly this week :)

-> Thank you jmh530 for sharing your work. This should help in 
improving the functionality of DataFrames further.

More information about the Digitalmars-d mailing list