[GSoC] Dataframes for D

bioinfornatics bioinfornatics at fedoraproject.org
Fri Aug 9 08:08:39 UTC 2019


On Thursday, 8 August 2019 at 16:49:09 UTC, Prateeek Nayak wrote:
> -----------
> Update Time
> -----------
>
> Pardon me for the delay, my university just started and it has 
> been a busy first week. However I have some good news
>
> * Aggregate implementation is under review - The preliminary 
> implementation restricted the set of operations that aggregate 
> could do but then Mr. Wilson suggested there should be a way to 
> expand it's usability so we worked on a revamp which takes the 
> function you desire as input and operates them on row/column of 
> DataFrame
> * There is a new way set index using index operation
> * to_csv supports setting precision for floating point numbers 
> - this was a problem I knew existed but I hadn't addressed it 
> till now. Better late then never.
> * Homogeneous DataFrame don't use TypeTuple anymore
> * at overload coming soon
>
>
> --------------------
> What is to come next
> --------------------
>
> * The first few responses from the community were mostly 
> regarding bringing binary file I/O support because of their 
> lean size and fast read/write. I will explore more regarding 
> this.
> * Time Series is gaining importance with the rise of Machine 
> Learning. I would like to implement something along the lines 
> of time series functionality Pandas has.
> * Something you would line to see. I am open to suggestions 
> (^_^)
>
> --------------
> Problems faced
> --------------
>
> There remains a small implementation detail that remains - a 
> dispatch function. Given non-homogeneous cases still require 
> traversal to a column, a function to apply an alias statically 
> or non-statically depending on the DataFrame is under 
> discussion.
> This will reduce code redundancy however my preliminary 
> attempts to tackle this have ended in failure. I will try to 
> finish it by the weekend. If I cannot solve it by then, I will 
> seek your help in the Learn section (^_^)
> Thank you

Dear D community,

Thanks, Prateeek Nayak for your works.
As currently, I am working with pandas (python, dataframe ...) . 
They are an extra feature that I appreciate a lot, it is the IO 
tool part:

* SQL
method: read_sql and to_sql
Description: which allow to read and save from a DataBase. These 
methods combined with SqlAlchemy are awesome.

* Parquet
method: read_parquet and to_parquet
Description: In BigData environment Parquet is a file format 
often used

These abilities made Panda and its Dataframe API a core library 
to have. Using like this, allow standardizing data structured 
used into our application and in same time offer rich statistics 
API.

Indeed it is important for tho code maintainability. And the 
FairData point that an application is a set of input data + 
program's feature = result. Thus put data structured as the first 
component to think how to develop an application is important.
The application is more robust and flexible as we can handle 
multiple input data file format.

I hope to see such features in D.


Best regards

Source:
- 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
- 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
- 
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#parquet


More information about the Digitalmars-d mailing list