[GSoC] Dataframes for D
Prateek Nayak
lelouch.cpp at gmail.com
Wed May 29 18:00:02 UTC 2019
Hello everyone,
I have began work on my Google Summer of Code 2019 project
DataFrame for D.
-----------------
About the Project
-----------------
DataFrames have become a standard while handling and manipulating
data. They give a neat representation, access and power to
modulate the data in way user wants.
This project aims at bringing native DataFrame to D one which
brings with it:
* A User Friendly API
* Multi - Indexing
* Writing to CSV and parsing from CSV
* Column binary operation in the form: df["Index1"] =
df["Index2"] + df["Index3"];
* groupBy on an arbitrary number of columns
* Data Aggregation
Disclaimer: The entire structuring was inspired by Pandas, the
most popular DataFrame library in Python and hence most of the
usage will look very similar to the ones in Pandas.
Main focus of this project is user-friendliness of the API while
also maintaining fair amount of speed and power.
The preliminary road map can be viewed here ->
https://docs.google.com/document/d/1Zrf_tFYLauAd_NM4-UMBGt_z-fORhFMrGvW633x8rZs/edit?usp=sharing
The core developments can be seen here ->
https://github.com/Kriyszig/magpie
-----------------------------
Brief idea of what is to come
-----------------------------
This month
----------
* Finish up with structure of DataFrame
* Finish Terminal Output (What good is data which cannot be seen)
* Finish writing to CSV
* Parsing DataFrame from CSV (Both single and multi-indexed)
* Accessing Elements
* Accessing Rows and Columns
* Assignment of element, an entire row or column
* Binary operation on rows and columns
Next Month
----------
* groupBy
* join
* Begin writing ops for aggregation
-----------
Speed Bumps
-----------
I am relatively new to D and hail from functional C background.
Sometimes (most of the times) my code can start to look more C
than D.
However I am adapting thanks to my mentors Nicholas Wilson and
Ilya Yaroshenko. They have helped me a ton - whether it be with
debugging errors or me falling back to my functional C past, they
have always come for my rescue and I am grateful for their
support.
-------------------------------------
Addressing Suggestions from Community
-------------------------------------
This suggestion comes from Laeeth Isharc
Source:
https://github.com/dlang/projects/issues/15#issuecomment-495831750
Though this is not on my current road map, I would love to pursue
this idea. Adding an easy way to inter operate with other
libraries would be very beneficial.
Although I haven't formally addressed this in the road map, I
would love to implement a msgpack based I/O as I continue to
develop the library. Also JSON I/O was something on my mind to
implement after the data aggregation part. (I had prioritised
JSON as I believed there were much more datasets as JSON compared
to any other format)
More information about the Digitalmars-d
mailing list