[GSoC] Dataframes for D

Prateek Nayak lelouch.cpp at gmail.com
Wed Jul 3 05:04:20 UTC 2019


On Saturday, 29 June 2019 at 04:39:39 UTC, Prateek Nayak wrote:
> [snip]

-------------
Weekly Update
-------------

I caught a flu this week which was really unfortunate. However 
I'm getting better and the work is going forward :)

-----------------------
What happened last week
-----------------------

I mostly dealt with the internal structure of `Group` - the 
structure that is returned during groupBy operation.
First I thought an array of DataFrames might be a good idea but 
soon dropped the idea as it would mean some parts - like the 
column index remain same but need to be copied to every DataFrame 
structure in the array and its just a waste of space at that 
point.
The implementation now looks somewhat similar to the DataFrame 
structure itself - there is an `Index` and `data`. Indexes are 
sorted based on the groups formed as a result of groupBy.

There are few places where optimizations can be made [mostly wrt 
space used] and I'll work on it this week.

Some of the functionality added to `Group` so far:
* display - User can choose to display a singe group or multiple 
groups
* combine - returns a DataFrame combining the groups user would 
like

At this point there was a need fora function in DataFrame which 
could convert a level of indexing to a column of operable data if 
required. This is because combine on groupBy doesn't remember the 
position from where the data was extracted. Hence if a level of 
data is used for groupBy, it would automatically be converted to 
a level of index in the result of combine. Hence `indexToData` 
was added to revert the result of this if the user desires so.

There were a few minor updates here and there. Nothing major. 
They include a new argument for `extends` in `Index` which can 
now insert the index at the position of user's choice. The other 
was stripping of trailing white spaces which appeared in display.


--------------------------
What will happen this week
--------------------------
This week will deal with optimizations of Group, add binary 
operations to `Group` which may be helpful. Document the changes 
once stability is reached. Start work on aggregate/join.

----------
Roadblocks
----------
I can't spot any major roadblocks up ahead. Work should go 
smoothly this week :)



-> Thank you jmh530 for sharing your work. This should help in 
improving the functionality of DataFrames further.


More information about the Digitalmars-d mailing list