[GSoC] Dataframes for D

Prateek Nayak lelouch.cpp at gmail.com
Wed May 29 18:00:02 UTC 2019


Hello everyone,

I have began work on my Google Summer of Code 2019 project 
DataFrame for D.

-----------------
About the Project
-----------------

DataFrames have become a standard while handling and manipulating 
data. They give a neat representation, access and power to 
modulate the data in way user wants.
This project aims at bringing native DataFrame to D one which 
brings with it:

* A User Friendly API
* Multi - Indexing
* Writing to CSV and parsing from CSV
* Column binary operation in the form: df["Index1"] = 
df["Index2"] + df["Index3"];
* groupBy on an arbitrary number of columns
* Data Aggregation

Disclaimer: The entire structuring was inspired by Pandas, the 
most popular DataFrame library in Python and hence most of the 
usage will look very similar to the ones in Pandas.

Main focus of this project is user-friendliness of the API while 
also maintaining fair amount of speed and power.
The preliminary road map can be viewed here -> 
https://docs.google.com/document/d/1Zrf_tFYLauAd_NM4-UMBGt_z-fORhFMrGvW633x8rZs/edit?usp=sharing

The core developments can be seen here -> 
https://github.com/Kriyszig/magpie


-----------------------------
Brief idea of what is to come
-----------------------------

This month
----------
* Finish up with structure of DataFrame
* Finish Terminal Output (What good is data which cannot be seen)
* Finish writing to CSV
* Parsing DataFrame from CSV (Both single and multi-indexed)
* Accessing Elements
* Accessing Rows and Columns
* Assignment of element, an entire row or column
* Binary operation on rows and columns

Next Month
----------
* groupBy
* join
* Begin writing ops for aggregation


-----------
Speed Bumps
-----------

I am relatively new to D and hail from functional C background. 
Sometimes (most of the times) my code can start to look more C 
than D.
However I am adapting thanks to my mentors Nicholas Wilson and 
Ilya Yaroshenko. They have helped me a ton - whether it be with 
debugging errors or me falling back to my functional C past, they 
have always come for my rescue and I am grateful for their 
support.


-------------------------------------
Addressing Suggestions from Community
-------------------------------------

This suggestion comes from Laeeth Isharc
Source: 
https://github.com/dlang/projects/issues/15#issuecomment-495831750

Though this is not on my current road map, I would love to pursue 
this idea. Adding an easy way to inter operate with other 
libraries would be very beneficial.
Although I haven't formally addressed this in the road map, I 
would love to implement a msgpack based I/O as I continue to 
develop the library. Also JSON I/O was something on my mind to 
implement after the data aggregation part.  (I had prioritised 
JSON as I believed there were much more datasets as JSON compared 
to any other format)


More information about the Digitalmars-d mailing list