[GSoC] Dataframes for D
Prateek Nayak
lelouch.cpp at gmail.com
Wed Jul 3 05:04:20 UTC 2019
On Saturday, 29 June 2019 at 04:39:39 UTC, Prateek Nayak wrote:
> [snip]
-------------
Weekly Update
-------------
I caught a flu this week which was really unfortunate. However
I'm getting better and the work is going forward :)
-----------------------
What happened last week
-----------------------
I mostly dealt with the internal structure of `Group` - the
structure that is returned during groupBy operation.
First I thought an array of DataFrames might be a good idea but
soon dropped the idea as it would mean some parts - like the
column index remain same but need to be copied to every DataFrame
structure in the array and its just a waste of space at that
point.
The implementation now looks somewhat similar to the DataFrame
structure itself - there is an `Index` and `data`. Indexes are
sorted based on the groups formed as a result of groupBy.
There are few places where optimizations can be made [mostly wrt
space used] and I'll work on it this week.
Some of the functionality added to `Group` so far:
* display - User can choose to display a singe group or multiple
groups
* combine - returns a DataFrame combining the groups user would
like
At this point there was a need fora function in DataFrame which
could convert a level of indexing to a column of operable data if
required. This is because combine on groupBy doesn't remember the
position from where the data was extracted. Hence if a level of
data is used for groupBy, it would automatically be converted to
a level of index in the result of combine. Hence `indexToData`
was added to revert the result of this if the user desires so.
There were a few minor updates here and there. Nothing major.
They include a new argument for `extends` in `Index` which can
now insert the index at the position of user's choice. The other
was stripping of trailing white spaces which appeared in display.
--------------------------
What will happen this week
--------------------------
This week will deal with optimizations of Group, add binary
operations to `Group` which may be helpful. Document the changes
once stability is reached. Start work on aggregate/join.
----------
Roadblocks
----------
I can't spot any major roadblocks up ahead. Work should go
smoothly this week :)
-> Thank you jmh530 for sharing your work. This should help in
improving the functionality of DataFrames further.
More information about the Digitalmars-d
mailing list